Mapping the Territory
While Alfred Korzybski’s assertion that, “the map is not the territory,” may be entirely relevant for man’s relativistic approach to the world around him, we all know that, when it comes to search engines, your XML sitemap is at least tantamount to your site’s “territory”. We’ve already discussed the importance of sitemaps, so now let’s take a look at just what goes into an effective XML sitemap and how to submit them.
Your XML sitemap is, obviously, constructed entirely in Extensible Markup Language. There are 3 main tags to keep in mind for building your sitemap. The tag references the protocol standard and must encapsulate the entire file. The tag is used as a parent tag for each URL entry and the tag is used to indicate specific URLs. These 3 tags are mandatory.
You may also wish to use other secondary, optional tags to allow the search engine to spider your site more effectively. Though you don’t have to use them to make your sitemap functional, it is a good idea to use theand tags, at the very least. The
tag allows you to designate the importance of a given page in relation to the other pages, expressed as a value between 0 and 1. Highly optimized pages should be ascribed a higher priority. lets you alert the search engine to how often the information on the page changes and how often it should be re-crawled. You can also use to indicate the last time the page was modified.
Some search engines, like Google, have restrictions on the number of URLs and maximum file sizes. For Google, the maximum file size is 10MB and there is a 50k URL limit. This means you may have to break your sitemap feed into multiple files. Each of the files should then be uploaded into your site root, which should look something like http://www.mywebsite.com/sitemap.xml. This root will be what you submit to the search engine via your robots.txt file in the “sitemap:” designation.
Now that we understand the basic setup for writing XML sitemaps, let’s take a look at how a listing for a URL should actually look:
Notice that there are two distinct pages in this file. The first page is the site homepage, last altered on Jan. 1, 2010. Note that the tag must always be formatted in W3C datetime format. The “always” designation for the tag tells the search engine that the information displayed on the page changes every time that the page is accessed. The 0.9 priority value indicates that the page is more relevant than the entry below.
The second entry is a mock up of an archive page. Note that its priority setting is much lower than that of the main page. The “never” designation on the tag shows that the information on this page will never change from what it currently is.
That pretty well sums up XML sitemap construction. You can get more information on XML formatting sitemaps at http://www.sitemaps.org/protocol.php. Next time we’ll look at some engine-specifics for alerting Google, Yahoo!, Ask and MSN to your XML sitemap feed.