Robots.txt to the Rescue
Posted on April 21, 2007
Filed Under Googlebot, SEO, Traffic |
Look at the sky, It a bird..no its a plan..no its Superman… hehehe..Just borrow words from old superman movies. What i want to share today is about robots.txt. Ia function as “superman” to deal with search engine spider. Search engine spider must meet them first before they crawl on your website.
You website got robots.txt?It suppose every site must has robots.txt file on the root directory.
Control your own Website
With robots.txt you can make search engine follow your order. You are the boss on your website not spider. You can “ask” the spider to keep away from certain area in your website. If you got something important file such as graphic and so on, you don’t want that file appear on the search engine with people can download it.
Fact about Robots.txt
At SES New York a robots.txt summit was held where major search engines (Ask, Google, Microsoft, Yahoo!) participated, sharing interesting information on this file. Here are some numbers.
According to Keith Hogan from Ask:
i) Less than 35% of websites have a robots.txt file
ii) The majority of robots.txt files are copied from others found online
iii) On many occasions robots.txt files are provided by your web hostÃÂng service
It look like most webmaster still not familiar with Robots.txt.
During the summit, all search engines announced they will identify (or autodiscover) sitemaps via the robots.txt file. In essence search engines are now able to discover your sitemap via a link in the following format:
Sitemap: <sitemap_location>, where <sitemap_location> is the complete URL of your Sitemap Index File (or your sitemap file, if you don’t have an index file).
Rescue from Google Banned
Robots.txt can help prevent you getting banned or being penalized by Google. In a move to eliminate search results pages because “web search results don’t add value to users” Google has recently added the following sentence to their terms of service:
- Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.
How to Implement a Robots.txt File
If your website doesn’t support a sitemap and you do not have any areas to exclude, include an empty robots.txt file in your root directory. By doing so you are acknowledging full spidering of your entire site.
Carefully review the robots exclusion protocol available at robotstxt.org. If you must exclude numerous areas of your website, build your file in a step by step manner and monitor spider behaviour with a log analyser tool.
If you don’t have a robots.txt file on your website set one up now. Use it to inform the crawlers on how your site is organized, and how often it is changing.
p/s: Your robots file is also subject to attention of hackers seeking sensitive objectives you might inadvertently lÃÂst: keeping out the robots while inviting the hackers – keep this in mind.
Popularity: 37% [?]
Comments
2 Responses to “Robots.txt to the Rescue”
Leave a Reply
























[…] page. That owesome!. Actually i’ve already wrote about robots.txt before. It’s about robots.txt to the rescue. I wish i can make money like him..Ganbate to […]
Robots.txt is really great and helpful to make sites searchable. Your tips will help others a lot.