Search Engine Optimization for Photographers – How to control the Robots

by Geoffrey on September 20, 2011

In my last article, The Search Engine Process, I talked about how search engines see your website, and how they base your pages rank based on keywords, content, URL and meta descriptions. In this article I am going to talk about how you can control what content get’s indexed and the methods that are practiced in today’s industry.

 

What are the Robots

Search engine robots can also be referred to as spiders or crawlers. They comb over your site and index content based on several factors that are built into the search engine’s algorithm. A search engines algorithm is a group of programs that interpret the data that the robots acquire from your website and give your page a ranking based on keywords, site URL, meta description and content.

What do Robots Look For

When a search engine robot hits your website the first thing it looks for is a robots.txt file that is contained in the home (root) directory of your website. What you place in this robots.txt file will tell the search engine robot if it is allowed to index your site and what areas of your site you would like to keep it from visiting.

The second thing a robot looks for is <meta> tags contained within the <head> HTML tags of each of your individual pages. With <meta> tags you can tell the robot to follow the links on that page and index them, or not to. In most cases you want robots to have full access to your site so it can index more content, however <meta> tags are great to use if you have duplicate content that you do not want to get index, since duplicate content can affect your sites ranking.

The Robots.txt File

The robots.txt file is fairly simple and easy to create and should always be placed in the home (root) directory of your website. So if you are like me and don’t care what content the robots look at then you can make a robots.txt file look similar to this.

User-agent: *
Disallow:

The User-agent: * command let’s you specific which robot is allowed to do what on your site. For example if you wanted to block Google robots from looking at your content then you would do something like this.

User-agent: Google
Disallow:  /

Now if you really want to block Google from indexing your site then you shouldn’t expect to get a lot of visitors. So looking at my robots.txt you can see I allow all bots to access all portions of my website. Let’s say you have a members area of your site or an admin section that you would like to keep to yourself you could create an entry in your robots.txt like this.

User-agent: *
Disallow: /members-area/

There is also a way you can restrict access to your site, but allow access to a single area. This can done by using the “Allow” command followed by the content you want to be indexed. Search engine robots are defaulted on your sever to be allowed full access as this is best practice for any website.

The <meta> Tag

Meta tags are always placed in the <head> </head> HTML tags on your website. If they are in other places, search engine robots will not understand what they mean and most likely you will throw a compliancy error. Meta tags within your <head> tags tell the robots what to do with your page when they find it. Here is an example Meta tag that I currently use on my site.

<meta name="robots" content="noopd, noydir" />

What this Meta tag is telling robots is that I do not want them to use site descriptions that are found within the Open Directory Project as my title and description in search engine results. For example, if someone added an entry for my blog in the Open Directory Project as:

Geoffrey Rickaby – A photographer and web developer.

And you did a search for Geoffrey Rickaby, you would see a result that looks like this in Google:

Geoffrey Rickaby
A photographer and web developer.
www.geoffreyrickaby.com/blog/ – 40k – Sep 20, 2011

So by using the noopd in your meta tag it will pull either your meta site description or a blurb of content from your blog. The noydir is the same has noopd, execpt that it tells Yahoo not to use the description or titles found within their Open Directory.

Let’s say you have a page that you don’t want a search engine to index, what would the meta tags look like? Well its fairly simple, just add this to your header:

<meta name="robots" content="noindex, nofollow" />

Since robots are already defaulted to index and follow all links on your site, you do not need to use the index, follow commands within the meta tag.

To Sum It Up

Robots are what search engines use to find and index your content, without them your website would not exist in search engine results. So be sure to check out your website source code that you are using the correct robots meta tags and that you are not blocking robots from your site in your robots.txt file.

Coming up next, I will get into how to optimize your content so that search engines can effectively index your site using the keywords that you want.

Leave a Comment

Previous post:

Next post: