According to Pew Research Center …

91% of search engine users say they always find the information they are seeking when they use search engines.


But, search engines can’t – and won’t – help you to expose your content if your site is not 100% accessible and understandable. And when we are talking about accessibility, the very first important factor is always the robots.txt file. So, let’s see …

What are we going to cover in this tutorial:

  • what is the purpose of the robots.txt file?
  • the 10 most common robots.txt mistakes that can ruin your SEO efforts

What is the purpose of the robots.txt file?

When your website is indexed by the search engines, basically, it’s crawled by robot programs called bots, crawlers or spiders (Googlebot, Bingbot, Yahoo Slurp, etc) in order to find and categorize all the content on your site. The bots will automatically index whatever they can find and “read”. If you have any sections or content pieces (for example, expired offers, duplicate content, non-public pages, etc) that you don’t want to get indexed, you’ll have to inform the crawlers about these “banned” areas. In order to do that, you are going to need a so-called robots.txt file.

So, what is the robots.txt file? To put it simply: it’s a simple text document placed in the root of your website that will tell the search engines (crawlers) what they can and what they cannot index while crawling your website. Additionally, if you want to save some bandwidth, you can use the robots.txt file to exclude javascript files, stylesheets or certain images from indexing.

When the spiders visits your site, the very first thing they do is to check out the existence and the content of your robots.txt file. If you have created a robots.txt file with your own rules, the crawlers will listen to your requests and won’t index the pages that you have disallowed. In theory, you could use the robots meta-tag too in order to keep away the spiders from certain files, pages, folders, etc, but not all search engines read meta-tags, so it’s always better to use the robots.txt file.

As I already said, the robots.txt file must be placed in the main root directory of your website. The spiders won’t search your site to find a document with that name. If they can’t find it in the main directory (www.yourdomain.com/robots.txt) they will simply assume that your site doesn’t have a robots.txt file, and as a result they will index everything along their way.

The structure and the syntax of a robots.txt file it’s extremely simple. Basically, it’s a simple list containing pairs of user-agents (crawlers) and disallowed or allowed files or directories. In addition to the “User-agent:”, “Disallow:” and “Allow” directives you can include any comments you want by putting the “#” sign at the beginning of the given line. Technically speaking the user-agent can be any party that requests web pages, including command line utilities, web browsers and of course, search engine spiders. If the “User-agent:” directive is followed by a wildcard operator – “*” – the given rule will apply to all the crawlers.




Join the Discussion
Write something…
Recent messages
TommyVTE Premium
great training need to sit down for this to see how and what for my site, thanks
Reply
smartketeer Premium
Thanks Tommy!
Reply
suzzziq Premium
This is totally Greek to me!!! I get the basic premise, but unsure how to implement. I'm flagging it for future reference, in case I ever get brave enough to try this! Thanks so much for the training:)
Blessings:)
Suzi
Reply
smartketeer Premium
Thanks for your time and your feedback Suzi!
Reply
FKelso Premium
Gee, where did you learn so much stuff?

Guess I have to go first and see if I have a robots.txt file.
Reply
smartketeer Premium
You have Fran ...

The question: what it contains?

Gee ... That's a LOOOOOOOOOONG story :)
Reply
FKelso Premium
You always give me a chuckle. Thanks.
Reply
lesabre Premium
Thanks again, got to save this and come back to it. Lot of information that can be very helpful to me. Got to answer all those e-mails first. All the best.
Reply
smartketeer Premium
Thanks Michael!

All the best!
Reply
dowj01 Premium
Your training certainly helps make a subject which as a newbie seemed beyond me, very clear. Thank you.
Justin
Reply
smartketeer Premium
Thanks Justin!
Reply
Top