imron Posted February 24, 2020 at 05:04 AM Report Posted February 24, 2020 at 05:04 AM @roddy can you allow the duckduckgo spider in robots.txt? Google search is becoming more and more unbearable, and while I switched off it as my primary search engine a while back, it's the still the only major off-site search engine that appears to have access to the site. Normally I'd use Duckduckgo, but when I do a site specific search, the results tell me that its spider has been disallowed so it can't find much content. 1 Quote
roddy Posted February 24, 2020 at 06:48 AM Report Posted February 24, 2020 at 06:48 AM Don’t think it’s specifically banned in robots.txt, but it might have picked up an htaccess user agent / IP ban over the years, will take a look. Quote
889 Posted February 24, 2020 at 07:26 AM Report Posted February 24, 2020 at 07:26 AM robots.txt: User-agent: * Disallow: /admin Disallow: /profile Disallow: /applications/core/interface/file/ Disallow: /notifications/options/ Disallow: /followed/ Disallow: /discover/followed-content/ User-agent: Baiduspider User-agent: Baiduspider-video User-agent: Baiduspider-image User-agent: Sogou web spider User-agent: MJ12bot User-agent: dotbot User-agent: Exabot User-agent: Wordpress/MU User-agent: msrbot User-agent: VB Project User-agent: NaverBot User-agent: Yeti User-agent: moget User-agent: ichiro User-agent: Yandex User-Agent: Charlotte User-Agent: YoudaoBot User-agent: sogou spider User-Agent: bingbot Disallow: / https://help.duckduckgo.com/duckduckgo-help-pages/results/duckduckbot/ The specific message I get is, "We would like to show you a description here but the site won't allow us." Quote
imron Posted February 24, 2020 at 08:38 AM Author Report Posted February 24, 2020 at 08:38 AM 1 hour ago, 889 said: The specific message I get is, "We would like to show you a description here but the site won't allow us." Yep that's the one I get too. I figured it was from robots.txt, but that robots.txt doesn't look like it blocks it. Quote
roddy Posted February 24, 2020 at 08:57 AM Report Posted February 24, 2020 at 08:57 AM Will take a look later. Having done some inefficient research while on mobile, it looks like DDG gets its search engine results via APIs to other indexes, one of which is Yandex, which is a historically badly-behaved Russian search engine. But I see similar on Yahoo, while Google looks fine. 1 Quote
roddy Posted February 24, 2020 at 11:02 AM Report Posted February 24, 2020 at 11:02 AM Ok, so... Duckduckgo seems to crawl for ranking purposes, but not for indexing. For indexing, it pulls data from other sources - Bing/Yahoo (same thing now?) and Yandex. I had Bing and Yandex both blocked from waaaaaaaaaaay back. I've removed those blocks. Yandex seems to be better-behaved now. I think Bing still had access to the sitemap, so it could see urls and titles, and include them in the index, but not the content, and that's what was turning up in Duckduckgo. I've also allowed Baidu back in. If I remember, once I see that's all working better I'll submit the site for a DDG !bang search. However, there's no guarantee of how quickly or how completely we get indexed. 3 Quote
roddy Posted March 3, 2020 at 08:28 AM Report Posted March 3, 2020 at 08:28 AM Descriptions are turning up on *some* bing / ddg searches, so that seems to be working. As I say though, no idea how complete and quick the process will be. Quote
889 Posted March 3, 2020 at 11:57 AM Report Posted March 3, 2020 at 11:57 AM "As I say though, no idea how complete and quick the process will be." Don't the access logs tell you precisely what's been spidered and when? Quote
roddy Posted March 3, 2020 at 12:42 PM Report Posted March 3, 2020 at 12:42 PM Theoretically yes, but I haven’t looked at a raw access log for maybe a decade. And there’s likely a delay between spidering and inclusion in the index, and I don’t know if DDG has real-time access to that index, and a search engine looking at a page doesn’t mean it makes it into the index, so... Quote
Jan Finster Posted March 3, 2020 at 07:29 PM Report Posted March 3, 2020 at 07:29 PM On 2/24/2020 at 12:02 PM, roddy said: For indexing, it pulls data from other sources - Bing/Yahoo (same thing now?) and Yande Are you saying DuckDuckGo is just metacrawling other search engines to get its search results? ? If so, then I stay with google.... Quote
roddy Posted March 3, 2020 at 08:21 PM Report Posted March 3, 2020 at 08:21 PM Not metacrawling, as such, as I understand it they use a Bing API (and lots of other sources). But I've only looked at it briefly. Quote
imron Posted March 3, 2020 at 08:52 PM Author Report Posted March 3, 2020 at 08:52 PM 1 hour ago, Jan Finster said: If so, then I stay with google.... It's not an "either/or" situation, it's "support both". If you still use google, enabling Bing/Yahoo/DDG searches won't affect you in any way. It will however make a big difference to people who don't use google search. Personally, I can't stand the new look of the google search results page, and that was the driver to switching almost all my searches to DDG. Previously I was about 60/40, with DDG being 60. Now it's like 95/5. Quote
roddy Posted May 14, 2020 at 05:32 PM Report Posted May 14, 2020 at 05:32 PM That's about 30 visits a day from Bing-bot search engines now, up from 3 at the start of the year. Still in the region of 1%-2% of search engine traffic, but all to the good. Thanks for raising it. I've submitted for a !bang search, but not sure if it'll get approved or not. 1 Quote
roddy Posted May 26, 2020 at 09:18 AM Report Posted May 26, 2020 at 09:18 AM Although... anyone using non-Google should bear in mind that Bing et al don't have so much indexed. Bing's webmaster tools are showing me 6k-8k pages indexed depending on what date in the last six months you pick (and no real upward trend). Google reports 50k pages indexed. What is going up is the number of people clicking through from Bing. Hopefully as that continues it'll lead to more indexing. In other spider news, Huawei seems to be desperately gorging itself on our pages as it gears up for a world without Google. Generally, it isn't making itself any friends. But our server is humming along quite nicely, it seems, so let it gorge. Quote
imron Posted May 26, 2020 at 11:22 AM Author Report Posted May 26, 2020 at 11:22 AM 2 hours ago, roddy said: anyone using non-Google should bear in mind that Bing et al don't have so much indexed. This matches with my experience. There are some well known posts/threads of mine and others on here that I can find in Google with a few choice keywords that DDG fails to pick up on (both searches limited to site:chinese-forums.com). It's getting better, but Google often still wins out for a site search of specific content. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.