I did not write about Web Development for quite a while. After reading an article about the use of X-Robot Tags today, I though it's time, again.
You retrieve content from the Web by typing a Web address into your browser and the addressed Web server will send you the requested resource, like a regular HTML Web page, a PDF document, a JPEG image, a video or Flash Movie, a XML file, etc.
The browser and Web server communicate using he HTTP (Hypertext Transfer Protocol ↑) and before the requested data is actually sent, they exchange HTTP Request and HTTP Response headers with information about the document, the browser, and the server. The X-Robots-Tag Directive is an optional element (a directive) of such a HTTP Response Header. It was introduced by Google this year. Now, Yahoo has announced 2 weeks ago, that they support it, too.
You might recall that there is a (X)HTML Meta Tag that allows to restrict the access control for search engine. But they only work for (X)HTML documents. Now the X-Robots-Tag Directive allows the same for any non-(X)HTML resource like video-, audio-files, images, etc.
The X-Robots-Tag Directive explained
The following commands are supported:
The document will not show up in the search results.
The document will be indexed, but not cached. I.e. no local copy at the search engine.
No summary of the document in the search results page.
Links in the document will not be indexed (nofollow).
X-Robots-Tag: unavailable_after: 31 Dec 2007 16:30:00 GMT
The attributes are case-insensitive. “NOFOLLOW” is the same as “nofollow”. You can combine values in one line, e.g.
X-Robots-Tag: noarchive, nosnippet
The configuration for the HTTP header depends on your server. With Apache you can configure the X-Robots-Tag Directive in the main configuration files or within .htaccess files in each directory. Most Web authors might not have access to these configuration options. I don't go into the details, but misconfiguration can mess up your Web site completely. That leaves the X-Robots-Tag Directive as a tool for the advanced folks.
Another mechanism for controlling search engines access are the (X)HTML Robots Meta Tags. They share the same attributes as the X-Robots-Tag Directive. It looks like this:
<meta name="robots" content="noarchive">
<meta name="robots" content="nosnippet">
<meta name="robots" content="nofollow">
<meta name="robots" content="noindex">
Now you understand why the X-Robots-Tag has been introduced by Google. The (X)HTML Meta Tags work only for (X)HTML documents.
However there is another standard that is widely used: The robots.txt file. Its simple syntax allows you to exclude directories and files from being indexed. As an alternative to using the X-Robots-Tag Directive you can setup particular Web directories and put all your PDF, Audio, Video files that you don't want to have indexed there.
Depending on your Web host you might have access to the robots.txt file or the HTTP Header configuration via .htaccess files. No matter what. Now you have the choice, even with Yahoo.
Neither the X-Robots-Tag Directive, the (X)HTML Meta Tags, nor the robots.txt file will improve your rankings, but they make sure that you have a choice about what is being indexed.