Don't
Forget Your Website's
Robots.txt file
When you create a website,
eventually each search
engine sends out a robot
(or "spider") to look over
and categorize what your
site is all about. It reads
through your text and follows
your link to learn and
understand what your site
has to offer. You can control
what each spider sees and
doesn't see on your website
by creating what's called
a "robots.txt" file.
The robots.txt file contains
instructions on what the
spiders are allowed (and
not allowed) to look through
on your site. They are
very simple to create and
implement but many people
are unsure as to how they
work.
First of all, you create
it using a basic text editor,
like Notepad in the Windows
environment. There are
only 3 main parts of the
file. The "allow" command,
the "disallow" command,
and the " * " wildcard.
Use the disallow to indicate
sections of your website
you do not want indexed.
As in this example:
Disallow: /workfiles/
This tells the spiders "don't
index any files in the
workfiles folder of my
site." The "allow" command
is basically redundant
since the spiders will
go through everything it
can get to that has not
been marked with a "disallow".
So you can safely just
use "disallow". The asterisk
(" * ") is a wildcard indicator
that indicates "all engines".
So in this example:
User-agent: *
Disallow: /My_Files/
We see that "all" spiders
are directed to not index
the My_Files folder.
You place the robots.txt
file in the root folder
of your web site. A robots.txt
file is a good idea since
it keeps the spiders from
looking through sections
of your web space you may
want to keep private, for
example you may have a
website that is in development
but not yet ready to go
live. So if you don't have
one, review your web hosting
directories and get typing
in your text editor. It
will only take a few minutes
but the benefits can be
substantial.
|