Subscribe Via Web Feed Subscribe with Google Add to My Yahoo! Subscribe with Bloglines Add to netvibes Subscribe with Live.com

« Yahoo Rolls Out Quality-Based Pricing In UK | Main | SearchCap: The Day In Search, November 15, 2007 »

Nov. 15, 2007 at 1:19pm Eastern by Barry Schwartz

Robots.txt Study Shows Webmasters Favor Google; BotSeer Robots.txt Search Engine Released

The Pennsylvania State University conducted a study that showed webmasters favored Google over other search engines in terms of allowing access to their web sites. An associated BotSeer search engine that allows searching across a collection of robots.txt files was also released.

The study looked at which robots or crawlers were listed in a web site's robots.txt file, and Google was listed more often than any other search engine. The paper is named Determining Bias to Search Engines from Robots.txt (PDF) (it may be slow, so here is a local copy) and showed some interesting details.

The most commonly used user agent is the "universal robot," where 93.8 percent of sites with robots.txt files have a rule allowing any crawler to access the site. 72.4 percent of the robots.txt files mentioned specific robots by name.

The chart below shows that Google's robot, GoogleBot, is named more often than any other search engine:

Robots.txt Study

The chart below compares search engine market share to robot bias:

Robots.txt Study

The study also collects historical data on the increased usage of the robots.txt file by webmasters. It is definitely worth downloading and reading.

One more note: I mentioned this morning a quote from Eytan of Live Search:

One thing that we noticed for example while mining our logs is that there are still a fair number of sites that specifically only allow Googlebot and do not allow MSNBot.

This study confirms Eytan's statement.

Postscript From Danny: I skimmed the report and hope to look more later. However, saying Google is most favored by seeing if Googlebot is named with allow statements isn't conclusive. For example, Googlebot might include things like the Google AdSense crawler -- and allowing that while banning other spiders still might be banning Google itself. That said, I have no doubt site owners think more about Google than other search engines when crafting their files.

Like The Story? Vote For It On Yahoo Buzz!
Subscribe To Our Daily Search News Recap!
Your Email:
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds)
Subscribe To Our Search Feed!
Subscribe Via Web FeedSubscribe with GoogleAdd to My Yahoo!Subscribe with BloglinesAdd to netvibes
Subscribe with Live.comSubscribe in NewsGator OnlineSubscribe in RojoAdd to My AOL
Share & Bookmark This Story!
By Barry Schwartz Permalink Jump To Comments See Related Stories In: SEO: Blocking Spiders, Search Engines: Other Search Engines, Stats: Popularity



Reader Comments

Search:

Search Marketing Expo

Save the date for:
SMX China (Nanjing) - Sept. 23-24
SMX Stockholm - Sept. 23-24: See who's speaking or register now.
SMX East (New York City) - Oct. 6-8: See the agenda or register today and save!
SMX London - Nov. 4-5: Pre-agenda rate now available. Click here.

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts:

Most Recent News Posts

About Search Engine Land

Stay Updated!

Get Our Search Newsletters:
Email:
Daily Monthly

Get Our Search Feed:
Subscribe Via Web FeedSubscribe with Google
Add to My Yahoo!Subscribe with Bloglines
Add to netvibesSubscribe with Live.com
Subscribe in NewsGator OnlineSubscribe in Rojo
Add to My AOL
More About Our Feeds & Newsletters

Add to Technorati Favorites

Track Us Socially:
Facebook: Our Search News App
Facebook: Search Engine Land Page
Facebook: Search Engine Land Group
Flickr: Search Engine Land
LinkedIn: Search Engine Land Group
Twitter: Search Engine Land Feed

Bragroll