This is a topic that makes many people run for the hills. Well, I’m here to tell you there is nothing scary about this.
What is a XML sitemap?
XML Sitemaps are a text based file that function as a Cheat Sheet or a very detailed map for search engine spiders to discover content on a Web site. They are simply a text file with a list of URLs that you can make available telling a spider what files are on your site.
Sitemaps also make use of metadata to help spiders discover when a file has changed, when new content has been added, and they are very useful to describe rich media content like videos and images. The XML Sitemaps Protocol that defines the format of all the different types of metadata is available at:www.sitemaps.org.
HTML vs XML sitemaps
In the past many Web sites used an HTML format (Web page) sitemap to help a search engine find all the files on Web sites, but the HTML format, which is intended for a browser, was only partially effective for this purpose. The XML Sitemap protocol on the other hand, is designed to be processed by machines with a rigid format making them easy to understand by a spider, and far more scalable for big Web sites with thousands, or even millions of URLs. HTML Sitemaps can still be useful in some situations, but they’re far more useful to a person visiting a site than a spider crawling it. So, in order for you to get the FULL benefits of a sitemap, you’ll need to choose the XML Sitemap format.
To be clear, there are enormous benefits to incorporating XML Sitemaps into your marketing plan. First off, they are one of the fastest ways to get an engine to crawl new content and you can even schedule how often you’d like your site re-crawled. Then for video, images and other files that are difficult for an engine to crawl, they can give you the best change at getting that content indexed into Image or Video search. In some cases they can even generate an video thumbnail in organic search for your videos! So, if you’re on the fence about using this technology – it’s past the time to jump in. Remember, we’re here to help so it’s nothing to stress about.
Note: Both Bing and Google do support RSS, Atom and Text formats. However this article is designed to specifically cover the XML format which is the preferred method and the one we recommend over the others.
File types Supported in XML Sitemaps
Sitemaps were originally designed to link to HTML Web pages, however they have been extended to support other types of files. These are the file types Google currently supports for inclusion in XML sitemaps.
- HTML Web Pages
- Video Files
Bing is a little bit of a mess in regards to supporting all the sitemap file types. At this time, Bing has documented support for HTML file types, but does NOT mention support for any extensions. On the other hand – Bing’s Webmaster tools sitemap submission tool does acknowledge Video sitemap formats, but so far they are unused by this engine. Even with all these mixed messages, it’s important for you to know that extensions to sitemapsaren’t going anywhere. Bing will get them added to their system and at some point they will support the same formats Google does.
XML Sitemaps can only support up to 50,000 URLs each and can be no longer than 10MB per file. You can however use multiple sitemap files to get around these limitations and you can also use the gzip compression format to reduce bandwidth on the larger files (unzipped the file must be under 10MB).
Before You Implement – Tips For Putting Your Best Foot Forward
If you’re working with a large Web site, perhaps one with a thousand pages or more, you’ll need to do some planning in advance of implementing Sitemaps.
We highly recommend using dedicated sitemap files for each different type of file format.
For example – one for images, one for video and another sitemap for HTML pages. This make things SO much easier to manage, especially if you are working with the files manually and/or you are using different plugins or software programs to create the sitemap files.
You may also find it useful to create multiple Web page sitemaps. Perhaps one for your blog, one for products and one for content that doesn’t change often like your Terms of Service page. Of course you can change these things as you go because you are not locked into anything. We’ve just learned over the many years that when working with larger sites the Divide and Conquer strategy really pays off in most cases.
For example, if you have a static Web site and a WordPress Blog you might be using a WordPress Plugin to generate the sitemap for the Blog, and for your static pages you might use a Spider based software program to generate the sitemap file. Separate sitemap files help multiple programs avoid overwriting each other. They’re also great for organizing the files.
Next, assuming you take our advice and use multiple sitemaps then you might be wondering how to connect those multiple sitemaps. It’s easy, and there are multiple ways.
You can use two methods to link to multiple sitemaps. One is using robots.txt file which is quite easy and is the natural place to do this because most search engine spiders will check this file every time they visit your site. We frequently list our sitemap files in robots.txt because the spiders will discover the sitemaps on their own and we don’t have to submit them. Here’s an example of the code you’d add to your robots.txt file to link to a list of sitemap files like our example above.
or alternatively you can just link to a sitemap index file from your robots.txt file with this example…
Within the sitemap index file, you would link to the child sitemaps for example:
Of course, you can also submit your sitemaps directly via Bing and Google’s Webmaster interfaces individually if you prefer. The entire idea behind using robots.txt method is to assure that the spider can find the files regardless of if the file is submitted or not. We like the idea of the robot checking the files on each visit to speed things along, so using robots.txt is a method we advise.
How About An Easy Way – 4 Great Options to Create XML Sitemaps Using Software
Time to get down to the nitty-gritty, actually creating the sitemap file. There are easy ways to do this using software or plugins, so let’s discuss what resources are available first, then we’ll get into the XML code itself. This is a quick list of plugins and software we recommend, there are actually quite a few available on almost any platform to choose from. These programs are designed to create sitemaps of Web pages, they typically do not have options for images, news, or video unless noted.
1. WordPress Plugins
- Yoast SEO – Recommended A+, works very well for posts and pages, plus many other SEO features. Best of all it’s free!
- Yoast Video Plugin – New / Not tested – creates XML Video Sitemaps for $89
2. Windows Software
- Xenu’s Link Sleuth – Recommended A+, Free Web Page Spider that can scan your site and generate XML sitemaps and discover other problems with your site as well.
- Screaming Frog SEO Spider – Recommended – Free Version can scan up to 500 URLs, Paid Version £99/yr allows unlimited URLs.
- G Site Crawler is specially designed to generate sitemaps and offers more options and features.
3. Web-based Generators
- XML-sitemaps.com/ – Generates a sitemap for up to 500 URLs for free.
- FreeSitemapGenerator.com – Free Web Based up to 5,000 URLs.
Manually Creating a XML Sitemap
If you don’t have a huge Web site, manually creating or editing a sitemap isn’t a big deal. The XML format is pretty straight forward and you can create the file in a text editor.
Important Note: All URLs must be XML encoded. This means that, among other things, all ampersands (&) in URLs must be replaced with their equivalent HTML entity (i.e., &). See this w3.org document for more in-depth technical details about XML character encoding.
Here’s what a simple regular Web page sitemap looks like
As you may have noticed, there is additional information in the file besides just the URL, that information specifies the last time the file was modified, how often the file typically changes, and its priority on the site.
In the example above, the home page is higher priority than the subpages. This information is merely a recommendation to the engines on the files, it won’t force them to re-spider a file daily if you set the changefreqsetting to daily. In fact they probably ignore most of it other than the dates as long as you are not setting them all to the same date. For example, if you add or change a file and it’s the only one with a recent date on it, it’s likely they will swing by and reindex the file. But on the other hand, if all your files always say they are New for the date, then expect the spider to mostly ignore that additional information.
The XML tags that make up a Sitemap file are very specific and must be used precisely. Some tags are optional and some are required.
Here’s the precise breakdown regarding the purpose, meaning and requirement (or not) of each of the XML tags used in the example above:
A no-brainer – there’s nothing you need to know about this block other than to include it exactly as it is displayed above. This is a required part of the document which simply describes the encoding used and the protocol.
Indicates the beginning and end of a set of URLs to be crawled.
Specifies the start and finish of an individual URL (Web page) entry.
The full URL of the Web page you wish to submit, including the domain name and path just as it would be entered in a Web browser’s address bar. You’re limited to 2048 characters (which would be an unbelievably long URL, anyway).
The date and time the document was last modified, when hand coding these files we typically just use the date as there is seldom a need to specify a time, however if you were creating a News Sitemap, including the time would be useful. The date must be specified using the ISO 8601 standard, for example YYYY-MM-DD. If you wish to include the time also, make sure to use the ISO 8061 format.
Here’s where you can suggest how often Google should revisit this URL. Bear in mind it’s not a command, but rather a hint. The days specified can be set to any one of the following values: always, hourly, daily, weekly, monthly, yearly, and never. Again, if the URL isn’t being updated daily, we don’t suggest marking it as such, being truthful here will likely make Google and Bing pay more attention to what you’re specifying for these values.
The relative priority of this URL compared to other URLs on your own site. Here’s where you can assign a crawl preference to your more important pages. The scale is from 0.0 to 1.0, in increments of .1. For example, 0.3, 0.5, 1.0 would be priorities listed from lowest to highest.
This has no direct effect on your actual search engine ranking. We typically set the home page at 1.0, important category pages at 0.8, and low priority pages such as Terms of service at 0.2. Frankly we don’t think this really matters, just make an attempt to be realistic and avoid setting all URLs high priority.
Submitting Your Sitemaps
Submitting your Sitemap in Google Webmaster Tools
You can submit your sitemap in Google’s Webmaster Tools easily, after you log in just click the Optimization menu link on the left, then choose Sitemaps