3. Unintentionally blocking unrelated pages

Check out the following example:

The “$” sign is a so-called end-of-string operator which basically tells to the spider that “this URL ends here”. As a result, the given directive will match “/custom”, but not “/customized-email-templates”.

The worst thing with this type of mistake is that usually goes unnoticed for a very long time. The page with your customized email templates won’t exist anymore …

4. Using incorrect type case in URL paths

The URL paths are case sensitive! “Disallow: /Temp” will not block “/temp” or “/TEMP”. If you have made the big mistake of using similar filenames or a confusing directory structure, you’ll have to block those pages or folders separately by using separate “Disallow:” lines for each.

5. Forgetting the user-agent directive

It may seem ridiculous, but it happens very often. If there is no user-agent directive before the usual “Disallow:” , “Allow:”, etc directives, nothing will actually happen!

6. Forgetting the slash character

Any URL path must start with a slash character! The “Disallow: any-page” directive won’t block anything. The correct syntax is this: “Disallow: /any-page”.

7. Using the robots.txt file to protect sensitive data

This is one of the biggest and most dangerous mistakes, which for obvious reasons will do much more harm than good. The only reliable way to protect your sensitive stuff is to use some sort of password-based security solution! If you have any files or directories that must be kept protected and hidden from the public, do not ever just put them in your robots.txt file with some “Disallow:” directives!

Why? Because you are going to give hostile crawlers a precise road-map to find the folders and files that you don’t want them to find! More than that, your robots.txt is publicly accessible! Anybody can – and will – see the things you’ve said you don’t want indexed, simply by typing yourdomain.com/robots.txt into their browser!



Join the Discussion
Write something…
Recent messages
TommyVTE Premium
great training need to sit down for this to see how and what for my site, thanks
Reply
smartketeer Premium
Thanks Tommy!
Reply
suzzziq Premium
This is totally Greek to me!!! I get the basic premise, but unsure how to implement. I'm flagging it for future reference, in case I ever get brave enough to try this! Thanks so much for the training:)
Blessings:)
Suzi
Reply
smartketeer Premium
Thanks for your time and your feedback Suzi!
Reply
FKelso Premium
Gee, where did you learn so much stuff?

Guess I have to go first and see if I have a robots.txt file.
Reply
smartketeer Premium
You have Fran ...

The question: what it contains?

Gee ... That's a LOOOOOOOOOONG story :)
Reply
FKelso Premium
You always give me a chuckle. Thanks.
Reply
lesabre Premium
Thanks again, got to save this and come back to it. Lot of information that can be very helpful to me. Got to answer all those e-mails first. All the best.
Reply
smartketeer Premium
Thanks Michael!

All the best!
Reply
dowj01 Premium
Your training certainly helps make a subject which as a newbie seemed beyond me, very clear. Thank you.
Justin
Reply
smartketeer Premium
Thanks Justin!
Reply
Top