Google SiteMaps and You

shellspeare

Uninteresting
Joined
Feb 14, 2005
Messages
5,162
shellspeare submitted a new Article:

Google SiteMaps and You

Google SiteMaps and You
By Trevor Bauknight, web designer and writer with over 15 years of experience on the Internet. He specializes in the creation and maintenance of business and personal identity online and can be reached at trevor@tryid.com.
http://www.cafeid.com
source


A look at implementing Google Sitemaps, a technology developed by Google in order to help you define your site more effectively to the search-engine behemoth.

This is not a ticket to a higher Google ranking (at least not that we know about); but it is a useful tool that lets you apply RSS-like control to your website's interactions with the Googlebot.

RSS (Really Simple Syndication) is the current heavyweight of so-called "disruptive technologies" (loosely defined as those that have the effect, if not developed with the intention, of changing the way we use technology in general) and its use is skyrocketing among content providers looking for a way to get their content in front of more eyes and ears. But RSS originally stood for Rich Site Summary, a standard way of cataloging your site's content for third-party aggregators.

Google Sitemaps have a similar function, in that they are an XML-based way to describe website content in a standard, predictable way; but they differ in that Sitemaps are intended for the Googlebot's eyes only, rather than for any third-party. Think of them as an automated way to make sure Google knows about your site's content (please note, however, that Google does not guarantee inclusion of your content based solely on the presence of a Sitemap file).

This sounds like a very specific undertaking, but the importance of Google to getting your site's content noticed can simply not be overstated. And with Google's expanding reach into more and more areas of Web content presentation, chances are that you can be assured that the information your Sitemap provides will eventually find some use you haven't yet thought about. That's what disruptive technology is all about, and Google has become one of the more innovative champions of such technological advances.

Where To Start:

The first thing you should do as a website developer is create a Google Account for yourself or your company. This will allow you to do other things besides access the Sitemaps infrastructure; but we'll leave that for another...

Read more about this article here...
 
Last edited by a moderator:

ILTK

Adherent
Joined
Aug 8, 2005
Messages
388
I use the python version of the generator, works great.

You can install python for windows on your local machine if don't have shell access and then download your log files and run the generator locally and upload your xml sitemap.

One thing to be carefull about for forums and sites using cms systems when you use a generator based on your server log, is not to submit sitemaps without screening and filtering the urls.

The python version uses a config file where you can setup filters, drop or pass filters.

The way I do it, is make the filters work as a whitelist, set the last entry in wildcard mode to 'drop' '*' (everything) then before that entry I add in 'pass' filters so ony what I specifically allow will go into the sitemap.

For example on a vbulletin forum you'd want to let forumdisplay and showthread pass, but any url with 'goto' 'lastpost' 'sortorder' etc.. you'd want to filter out so you don't submit duplicate content urls, there's also no reason to submit links to admincp, register, etc..

There could also be retarded bots/scrapers hitting your sites trying to retrieve pages and adding parameters like showthread.phpt=345&bla?bla?bla?bla since a webserver will just throw away unused parameters, these will go into your sitemap because they won't generate 404s and you'l be submitting lots of duplicate urls.

The whilelist trick stops all that dead as long as you check your sitemap xml once in a while..

oh btw, there's lots of neat stats on the google sitemap page, like crawl stats, pages found for what keywords etc.. real nice.
 
Top