The Linux Page

Google understands the "Allow:" keyword in robots.txt

At times you have to prevent users from seeing files under a certain folder such as the /admin/ or /wp-admin/ folder.

The easiest way to at least avoid having Google index those pages is to add a Disallow line in your robots.txt file. For example:

Disallow: /admin
Disallow: /wp-admin

This is great and 99% of the time it does exactly what you want. Only once in a while a programmer does it wrong and places a file that should be searchable under such folders. For example, maybe someone placed a style.css file which is access by pages other than just pages under /admin. In that case, the two lines above will prevent search engines from downloading that CSS file and as a consequence you may be penalized (because that search engine cannot be sure that the CSS is not going to hide the page entirely...)

To correct the problem, you have two solutions. Either add many Disallow that prevent all files from being indexed or you can use the Allow command.

IMPORTANT NOTE: the robots.txt website does not describe an Allow command. This is something that Google supports (and most certainly others who often follow Google's lead)

So for example, to disallow indexing of your PHP file but allow the CSS file, you would have something like this:

Disallow: /admin/*.php

This is easy if you know you just have files with the .php extension and the CSS file. If you have all sorts of files (.inc, .txt, etc.) then you would have to add a Disallow for each one of them as in;

Disallow: /admin/*.php
Disallow: /admin/*.inc
Disallow: /admin/*.txt

This becomes tedious, but not only that it means you take the risk of having a new file added with an extension which is not listed there and if that new file should not be indexed, that's not going to work unless you remember to add a Disallow in your robots.txt file (and long term that's pretty much impossible to do.)

Instead we want to use the Allow command to just allow what we think is allowed such as the CSS file. It could also be images (JPEG, GIF, PNG), movies, MP3, etc.

Disallow: /admin
Allow: /admin/*.css

These two lines will be interpreted by GoogleBot as, disallow anything under /admin, unless the full path and filename ends with .css.

This is really much better. Again, it will disallow everything except the few exceptions you introduce. That means when you add a new file with a new extension, you have absolultely nothing to do.