What You Should Know About Robots txt

25
10.9K followers

Everything You Need To Know About Robots Txt And Then Some

Hi Folks

A member of wealthy affiliate was asking about messages they were getting from Google Webmasters about their Robots txt. I replied as well as I could given the nature of the question.

However it got me thinking that perhaps it would be a good idea to hunt around for some information on this subject. Hence this blog compiled from various sources.

The Robots.txt protocol, also called the "robots exclusion standard" is designed to lock out web spiders from accessing part of a website. It is a security or privacy measure, the equivalent of hanging a "Keep Out" sign on your door.

This protocol is used by web site administrators when there are sections or files that they would rather not be accessed by the rest of the world. This could include employee lists, or files that they are circulating internally. For example, the White House website uses robots.txt to block any inquiries on speeches by the Vice President, a photo essay of the First Lady, and profiles of the 911 victims.


How does the protocol work? It lists the files that shouldn't be scanned, and places it in the top-level directory of the website. The robots.txt protocol was created by consensus in June 1994 by members of the robots mailing list (robots-request@nexor.co.uk). There is no official standards body or RFC for the protocol, so it's difficult to legislate or mandate that the protocol be followed. In fact, the file is treated as strictly advisory, and does not have absolute guarantee that those contents won't be read.


In effect, robot.txt requires cooperation by the web spider and even the reader, since anything that is uploaded into the internet becomes publicly available. You aren't locking them out of those pages, you are just making it harder for them to get in. But it takes very little for them to ignore these instructions. Computer hackers can also easily penetrate the files and retrieve information. So the rule of thumb is-if it's that sensitive, it shouldn't be on your website to begin with.

Care, however, should be taken to ensure that the Robots.txt protocol doesn't block the website robots from other areas of the website. This will dramatically affect your search engine ranking, as the crawlers rely on the robots to count the keywords, review metatags, titles and crossheads, and even register the hyperlinks.

One misplaced hyphen or dash can have catastrophic effects. For example, the robots.txt patterns are matched by simple substring comparisons, so care should be taken to make sure that patterns matching directories have the final '/' character appended: otherwise all files with names starting with that substring will match, rather than just those in the directory intended.

To avoid these problems, consider submitting your site to a search engine spider simulator, also called search engine robot simulator. These simulators-which can be bought or downloaded from the internet- use the same processes and strategies of different search engines and give you a "dry run" of how they will read your site. They will tell you which pages are skipped, which links are ignored, and which errors are encountered. Since the simulators will also re-enact how the bots will follow your hyperlinks, you'll see if your robot.txt protocol is interfering with the search engine's ability to read through all the necessary pages.

It's also important to review your robot.txt files, which will enable you to spot any problems and correct them before you submit them to real search engines. Obviously the person who was getting those message from their Webmasters didn't.

There you have it folks and here is the link to a search engine spider simulator so you can run your own test.

http://theseotools.net/spider-simulator

Anyone with more info on this subject please share via a comment below.

Have a very nice festive New Years Eve and an even better 2017.

Robert Allan

Login
Create Your Free Wealthy Affiliate Account Today!
icon
4-Steps to Success Class
icon
One Profit Ready Website
icon
Market Research & Analysis Tools
icon
Millionaire Mentorship
icon
Core “Business Start Up” Training

Recent Comments

29

Thanks for the lesson! Now I know a little more about it

Hello Steen and thank you for stopping by.
Hope you understand a wee bit more about Robots.txt now.
Enjoy your Friday and the weekend to come.
Robert

Thanks Robert! Yes, I've got a little more insight here. and Happy New Year to you

This is a totally new area to me. Thanks so much for explaining it. Raising awareness is such an important contribution that you and others are making to us newbies.

Hello Rhian and thank you for reading the blog and leaving a comment.
Bookmark it to come back to as its always best to have reference info to fall back on.
Enjoy your Friday over there in France and hope you got your Amazon problems sorted out.
Maybe catch you later.
Robert

Interesting rrobot.text info Robert!

Hi Mike and thank you for stopping by.
I'm sure a man of your experience will understand just what the Robots.txt is all about.
If not then read again or bookmark to come back to.
Enjoy your Friday.
Robert

A wealth of info here Robert, thanks very much for sharing.

No probs Jeff and hope you were able to take it in.
If not just read the blog again and again until it sinks in.
Thank you for your comment and have a great Friday.
Robert

Hi Robert,
Thank you for the information. Your blog has a lot of info that never even new about.
I am going to read it for the third time.
I am trying to resolve a problem with my site and trying to read anything that can give me an idea to correct the issue.
Thanks again
Lisa

Hello Dear lady and how are you this fine Friday morning?
Well I hope and having a ball at this time of the year.
I personally am having a quiet time.
(I must be getting old lol)
Apart from that thank you for your comment and taking the time to read the blog 3 times.
For some people it will take a bit of digesting but persevere and come back to the blog often for a refresher.
Robert

Thanks! And I have a question - having followed the All In One SEO plug in set up from the training here, would this aspect have been attended to?

The All-In-One is set up with default settings but not necessarily to your advantage if you want to show some things but not others.
Its like Chris below says - if there is an internal link from an "allowed page" to a "banned" page , then that link to the "banned" page will be followed.
Same goes for an image see pic
Hope this answers your question.
Enjoy your day.
Robert

I see ! Thank you.

One minor aspect.

As you say, the "ban" can be ignored by the spider. But even if the "ban" is obeyed, it only applies to direct access.

In other words, if there is an internal link from an "allowed page" to a "banned" page , then that link to the "banned" page will be followed.

Yes indeed Chris and thank you for pointing that out.
I'm sure the membership will appreciate all extra tips and advice from those of use who have a grasp of this sometimes tricky aspect of creating a website.
Enjoy your Thursday and the festivities to come.
Robert

This is really great stuff, thanks Robert and have a Happy New year.

Mike

No probs Mike and hope you understood some of it.
Its real easy to get things wrong but read the blog again and again until you believe you have a handle on it.
Enjoy your day and the New year to come.
Robert

Thank you, Robert.... valuable info.

Thank you for your comment bJohn.
Hope all's clear to you.
Its really quite simple if you tick or untick the Robots.txt box in your Wordpress. (depending on your preferences of course)
Enjoy your day and the festivities to come.
Robert

More insight... Thanks again Robert : )

Thanks, Robert....

Hello Netta and thank you for your comment.
Hope you understood it all.
if not then go over the blog a wee bit at a time until you get a good grasps of the protocols involved.
Enjoy your evening.
Robert

See more comments

Login
Create Your Free Wealthy Affiliate Account Today!
icon
4-Steps to Success Class
icon
One Profit Ready Website
icon
Market Research & Analysis Tools
icon
Millionaire Mentorship
icon
Core “Business Start Up” Training