An XML sitemap is a file that lists all the pages of your site, along with a little extra information about each one, to give search-engine spiders something to use to decide how to crawl your site.
Sitemaps are optional, and there is no assurance than Google or any other search engine will actually do anything with the information your provide, but it can't hurt, and sometimes it definitely helps.
The sitemap protocol is defined at www.sitemaps.org and is supported by Google, Microsoft, and Yahoo.
The structure of a sitemap is simple. Here is a sitemap that lists just two pages and includes all of the optional parameters:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.webvanta.com/</loc> <lastmod>2012-07-27</lastmod> <changefreq>weekly</changefreq> <priority>0.8</priority> </url> <url> <loc>http://www.webvanta.com/about</loc> <lastmod>2012-07-01</lastmod> <changefreq>monthly</changefreq> <priority>0.5</priority> </url> </urlset>
Only the loc element is required for each URL. This must be a full, not relative, URL.
Although the sitemap structure is simple, if you have lots of pages, it can be time-consuming to create and keep up-to-date manually. You certainly can simply hand-write the entire sitemap file, but we don't recommend this for any but the simplest sites.
You can use third-party tools, such as http://www.xml-sitemaps.com/, which will crawl your site and generate a sitemap. Such tools generally cannot fill in anything useful for the three optional parameters (lastmod, changefreq, and priority), but you can edit these values manually after the sitemap is generated.
Once you have your sitemap XML file, you can store in on your Webvanta site in either of two ways:
The second approach makes it easier to edit the sitemap file online, and it also opens up the possibility for dynamically creating the sitemap file using WebvantaScript, which we'll cover shortly.
In theory, search engines will find your sitemap files on their own, as long as the file is called sitemap.xml and is located at the root of your site. You should also submit your sitemap to Google's and Bing's Webmaster Tools, however, which provides you with additional information about whether any errors occurred and how many of the listed pages the search engine has crawled.
If your sitemap is not named sitemap.xml, then you can add a line to your robots.txt file to point to the sitemap. This enables search engines to find your sitemap file even if you have not submitted it directly and it is not named sitemap.xml.
Note that you can have multiple sitemap files, and for a large site, it is a good practice to break it up by sections. You can use a sitemap index file to reference all the sitemaps.
There's two classes of pages you need to consider: static pages and dynamic (database-driven) pages.
For static pages, you can generate the sitemap manually if you only have a few, or you can use the code we present later in this article to create the static page sitemap automatically.
For database-driven pages, you need to create a sitemap entry for every database item that may be displayed on that page. For example, suppose you have a page with the slug "article" and it is database-driven. The URL would for a specific article may be something like:
You want this full URL, and the equivalent URL for every other article, to be in your sitemap. This requires iterating through the database items to create each entry, much as you would for creating a list page.
For example, here is the code that generates a sitemap file that lists all the articles:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <w:kb:item:each type='articles' limit='1000'> <url> <loc>http://www.mysite.com/article/<w:perma_link_name /></loc> <lastmod><w:updated_at format='%Y-%m-%d' /></lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </w:kb:item:each> </urlset>
Note that this code is putting in fixed values for the changefreq and priority. If you want to be able to vary these values on a per-article basis, just add fields to the database item type (you'll need to use a custom item type) for these values.
Repeat this approach for each of your dynamically generated pages.
Creating sitemaps for static pages is actually a little trickier, for several reasons:
Thus, if you have only a handful of static pages, and you aren't frequently adding more static pages, it's easiest to simply create the sitemap for the static pages manually.
If you want to create the static page sitemap automatically, you need to use a recursive approach to handle the arbitrarily deep nesting of pages.
For the XML file, which you'll save in Structure > Feeds and XML Files with the name sitemap.xml, use code like this:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <w:find url="/"> <url> <loc>http://www.mysite.com</loc> <lastmod><w:date for="modified_at" format="%Y-%m-%d" /></lastmod> <changefreq>daily</changefreq> <priority>0.9</priority> </url> <w:snippet name="xml_sitemapper_pages" /> </w:find> </urlset>
The first line of WebvantaScript, w:find url="/", sets up the context for the home page. This allows the code to access the modification date.
The recursive magic comes from the snippet being invoked, xml_sitemapper_pages, which looks like this:
<w:children:each by="title" order="asc" status="published"> <w:unless_url matches="system|no_sitemap|ajax"> <url> <loc>http://www.discoverfolsom.com<w:url /></loc> <lastmod><w:date for="modified_at" format="%Y-%m-%d" /></lastmod> <changefreq>weekly</changefreq> </url> <w:snippet name="xml_sitemapper_pages" /> </w:unless_url> </w:children:each>
This code iterates through each of the children of the currently selected page, which when we first enter this code is the home page, whose context was set via the find statement in the previous code example.
The w:unless statement then excludes pages that have a slug, or have a parent page with the slug, of system, no_sitemap, or ajax. You will want to change these values to match your site design. In this example, system excludes pages like login and change password; no_sitemap is a parent page we use to group pages that we don't want included in the sitemap; and ajax is the parent page for pages that are really fragments loaded into other pages, so we don't want those indexed directly.
Note that this snippet calls itself, which is what makes this recursive. As long as there are more child pages, the code will work its way deeper and deeper into the page hierarchy, iterating through the pages at all levels.
In this example, every page gets the same changefreq, and we haven't used priority at all. If you want to use these values and have them vary from page to page, you can either use conditionals in this code to change the values based on the URL, or you can add regions to the page template to hold these values and access those values from this code.
If you have a mobile version of your site, you may also want to provide a mobile-specific sitemap.
Mobile sitemaps were defined by Google, and are not described at sitemaps.org. Google has published the details of how to create mobile sitemaps.
In a mobile sitemap, you need to modify the urlset element to reference the mobile schema, as well as the base sitemap schema, as follows:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0">
In addition, each mobile url section needs to include an empty "mobile" element:
<url> <loc>http://mobile.example.com/article100.html</loc> <mobile:mobile/> </url>