Prevent search engines from indexing your websites


  • Applies to: Grid
    • Difficulty: Easy
    • Time Needed: 10
    • Tools Required: FTP client, plain text editor
  • Applies to: All DV
    • Difficulty: Easy
    • Time Needed: 10
    • Tools Required: FTP client, plain text editor

Overview

Web Robots, also known as Web Wanderers, Crawlers, or Spiders, are programs that traverse the web automatically. Search engines, such as Google or Yahoo, use them to index the web content of your site. However they can also be used inappropriately, such as spammers using them to scan for email addresses. A robots.txt file will tell robots who visit your sites how you wish them to behave.

READ ME FIRST

The publishing of this information does not imply support of this article. This article is provided solely as a courtesy to our customers. Please take a moment to review the Statement of Support.

READ ME FIRST

The publishing of this information does not imply support of this article. This article is provided solely as a courtesy to our customers. Please take a moment to review the Statement of Support.

Instructions

First, using a plain text editor, create a robots.txt document with your favorite text editor. Then simply upload it to a directory on your service. For details on all the rules you can create please visit: http://www.robotstxt.org/

The following is an example robots.txt file which you are free to use. You will need to upload this file to your webroot, such as /home/00000/domains/example.com/html/ /var/www/vhosts/example.com/httpdocs/. Remember to remove the # sign for any command you wish the robots to follow, but be sure not to un-comment the commands description.


# Example robots.txt from (mt) Media Temple
# Learn more at http://mediatemple.net
# (mt) Forums - http://forum.mediatemple.net/
# (mt) System Status - http://status.mediatemple.net
# (mt) Statement of Support - http://mediatemple.net/support/statement/

# How do I check that my robots.txt file is working as expected
# http://www.google.com/support/webmasters/bin/answer.pyanswer=35237

# For a list of Robots please visit: http://www.robotstxt.org/db.html

# Instructions
# Remove the "#" to uncomment any line that you wish to use, but be sure not to uncomment the Description.

# Grant Robots Access
#######################################################################################

# This example allows all robots to visit all files because the wildcard "*" specifies all robots:
#User-agent: *
#Disallow:

#To allow a single robot you would use the following:
#User-agent: Google
#Disallow:

#User-agent: *
#Disallow: /

# Deny Robots Access
#######################################################################################

# This example keeps all robots out:
#User-agent: *
#Disallow: /

# The next is an example that tells all crawlers not to enter into four directories of a website:
#User-agent: *
#Disallow: /cgi-bin/
#Disallow: /images/
#Disallow: /tmp/
#Disallow: /private/

# Example that tells a specific crawler not to enter one specific directory:
#User-agent: BadBot
#Disallow: /private/

# Example that tells all crawlers not to enter one specific file called foo.html
#User-agent: *
#Disallow: /domains/example.com/html//var/www/vhosts/example.com/httpdocs/