Pineapple Labs Blog: What is robot.txt file on a website?

Search engines generally crawl a website using a computer program known as bots. Like google search web sites using Googlebot. robot.txt file, restrict a boat to have access to all the folders which contains some confidential data or, unnecessary data.

Below ate the file format explained with example,

The same result can be accomplished with an empty or missing robots.txt file.
This example tells all robots to stay out of a website:

User-agent: *
Disallow: /

This example tells all robots that they can visit all files because the wildcard * stands for all robots and the Disallow directive has no value, meaning no pages are disallowed.

User-agent: *
Disallow:

This example tells all robots to stay away from one specific file:

User-agent: * Disallow: /directory/file.html

This example tells all robots not to enter three directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

Note that all other files in the specified directory will be processed.
This example tells a specific robot to stay out of a website:

User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot
Disallow: /

This example tells two specific robots not to enter one specific directory:

User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot
User-agent: Googlebot
Disallow: /private/

Example demonstrating how comments can be used:

# Comments appear after the "#" symbol at the start of a line, or after a directive
User-agent: * # match all bots
Disallow: / # keep them out

It is also possible to list multiple robots with their own rules. The actual robot string is defined by the crawler. A few sites, such as Google, support several user-agent strings that allow the operator to deny access to a subset of their services by using specific user-agent strings.
Example demonstrating multiple user-agents:

User-agent: googlebot        # all Google services
Disallow: /private/          # disallow this directory

User-agent: googlebot-news   # only the news service
Disallow: /                  # disallow everything

User-agent: *                # any robot
Disallow: /something/        # disallow this directory

1 comment:

Nakia Mosley24 November 2017 at 04:24
Thanks, for explanations are quite useful information in working for open txt file https://wikiext.com/txt well, I didn`t quite understand the difference between the doc. and txt. formats before, but to open most of the extensions
I use there is a universal file viewer on Windows, which I usually use.

Thursday, 20 July 2017

What is robot.txt file on a website?

1 comment: