Monday, December 12, 2016

Do I need a robots.txt File?

The robots.txt file

The robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.  

Here why it gets used; a robot wants to visits a Web site URL, say http://www.myorg.com/index/.  But, before it does so, it firsts checks for http://www.myorg.com/robots.txt, and find that it contains the following:

User-agent: googleboot
Allow: /
Allow: /images/
Allow: /css/
Allow: /js/
Disallow: /admin/
Disallow: /cgi-bin/
Disallow: /includes/
Disallow: /processors/
Disallow: /skins/

User-agent: bingbot
Allow: /
Allow: /images/
Allow: /css/
Allow: /js/
Disallow: /admin/
Disallow: /cgi-bin/
Disallow: /includes/
Disallow: /processors/
Disallow: /skins/

User-agent: *
Allow: /
Allow: /images/
Allow: /css/
Allow: /js/
Disallow: /admin/
Disallow: /cgi-bin/
Disallow: /includes/
Disallow: /processors/
Disallow: /skins/

sitemap: http://www.myorg.com/sitemap.xml

At MyOrg we break the file up to multiple sections


  1. For the GoogleBot - This allow us to give specific instruction to Google Bot Only.
  2. For the GoogleBot - This allow us to give specific instruction to Bing Bot Only.
  3. For all other Bots - This to gibe all other search bot instructions.
  4. Finally we include our Sitemap.  Using the sitemap directive you can tell search engines – specifically Bing, Yandex and Google – the location of your XML sitemap.


Explanation for why at MyOrg we setup our robots.txt file

The reason we would not just use the generic bot is because we may want to have Google or Bing index something that we don;t want othe sites to index.  This allows you great control of what get spidered.  And of course it always best to include the site,map so they know to index ever link you have in your sitemap.

To better understand why you use certain commands in the robots.txt file "vary.com" has already outlined a full explanation on robots.txt files I've ever read follow this link to vary.com's article titled "The robots,txt File"

A brief thought on why use proper site structure

When you are setting up your website if you create a structured site, then make proper use of the structure it will make your life easier.    Not only is it easier tot find things, but it also easier for the search engines to index your files,.  So instead pf just dumping all your files in the route directory or one sub folder; employ a ptroper site structure.

We at MyOrg find it easier to build folders for your Cascading Style Sheets (CSS), Javascript (JS)
Images, then add any additional folder you may need.  Examples of additional directries might be includes or processors or templates / skins.

I hope fiind this article  useful and give you a little insight on why you need a robots.txt file