Sunday, January 10, 2010

What is Robots Txt? Why We Need One?


Web Robots are also known as Bot, Crawlers and Spiders. Search engines such as Google, Yahoo Msn use them to index the web content; spammers use them to scan for email addresses, and many other uses.

Robots index the content of the web pages of your website and Robot.txt file set instructions for Robots. Spiders that obey the file, it provides a map for what they can, and cannot index. The file must be available in the root directory of your website. The web addresses i.e. URL path of your robots.txt file should be named as /robots.txt.


How to create robots.txt file?


User-agent: *
The asterisk (*) or wildcard represents a special value and means any robot. The asterisk (*) or wildcard in the User-agent field means "any Robot" and therefore is the only one needed until you fully understand how to set up different User-agents.


Disallow:
The Disallow: line without a / (forward slash) indicates the robots(crawlers) that they can index the all the WebPages of a website.
Any empty value indicates that all URLs can be fetched. At least one Disallow field should be there in a record without the / (forward slash) as shown above.


Disallow: /private/
It tells the robot that it cannot index the contents of that /private/ directory.


Syntax to create robots.txt file

 

To allow all robots complete access:
User-agent: *
Disallow:
To exclude all robots from the server:
User-agent: *
Disallow: /
To exclude all robots from parts of a server:
User-agent: *
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/
To exclude a single robot from the server:
User-agent: Named Bot
Disallow: /
To exclude a single robot from parts of a server:
User-agent: Named Bot
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/


If you want to Disallow a particular file within the directory:
Disallow: /private/nothing-to-display.htm
Keep in mind that using the above example excludes that specified page (nothing-to-display.htm) but will not exclude the entire /private/ directory.


The presence of an empty "/robots.txt" file has of no usage, it will be treated as if it was not present, i.e. all robots will consider invited automatically.


You should validate your robots.txt file. Enter the full URI to the robots.txt file on your server. The robots.txt file always resides at the root level of your web.

2 comments:

backpackers sydney said...

I am going to creat a website and i will definitely keep use of it .I will look foraward for more updates from here and will make my website more popular for sure,thanks!

r4 said...

Thanks for the valuable information on the seo i saw some other posts but still don't know what is the sem please share some details...