About robots.txt

If you want the search engines to index everything in your site, you actually don't need this file. However, you only need a robots.txt file if you want to exclude access to any content that you don't want search engines to index.

A basic understanding of what this file does is all you need at this stage, so we'll cover just a few of the more important scenarios that you may encounter or wish to put in place.

The Job of robots.txt

This important file controls access to the pages/posts on your website.

As the automated search engine robots crawl the web, and before they access your site, they first look to see whether arobots.txt file exists that prevents them from accessing certain pages.

If you do nothing to this file or just leave it alone, the bots will continue to your website and search everything without restriction. On most sites this is exactly what you want, but there are times when you may need to restrict access to some areas of the server.

If you notice problems with bad links, or blocked URLs that Google hasn't been able to crawl, go to your Google Webmaster Tools and look at Blocked URLs in the Crawl section to have a look!



Join the Discussion
Write something…
Recent messages
Tabsmark Premium
Hi Rob,

Loved the training and understand this better. Please could you do a training on what we should and shouldn't be blocking. I have googled and it seems examples are given and then that's it. I still don't know what I should have in my robots.txt File. How will I know or where can I learn. I like the look of the generator, thanks for including that :) So far I have my sitemap. Any thoughts would be awesome!

Tracy
Reply
rob3 Premium
Hi Tracy,

You only need to block items that you don't want Google to crawl. I exclude pages such as Privacy and T&C's and any others that are private.

There are no set rules of what you should or should not block, it's all up to each individual webmaster.

The way to look at it from an overall point of view is to go through your folder and file structure, and ascertain which files and/or folders do not ultimately need to be indexed by Google. Those are the ones you can set to Disallow. Keep it simple!

Regards
Rob
Reply
Tabsmark Premium
Okay thanks Rob,

That does make sense. For your private policy, when you click no follow in the post itself, is that sufficient? Or is it important to also add that to the robot.tx

Thanks for getting back to me :)

Tracy
Reply
acoolmil Premium
Impressive Rob.
Reply
Trialynn Premium
You did a great job Rob, thanks!
Reply
cybridge Premium
Nice JOB!
Reply
Bill67 Premium
thanks for the heads up Rob. Nicely done.
Reply
KD6PAO Premium
Nice job Rob! After what I went through the other day this makes perfect sense to me!
Reply
rob3 Premium
Thanks, thought I'd elaborate to cover wider scenarios!
Reply
Top