XML Sitemaps: For speed indexing large sites
After reading a website claiming XML Sitemaps were the Most Overrated SEO Tactic Ever, I felt that XML sitemaps were getting a hard time. The article started off saying that the XML sitemaps never really solve any problem in the following paragraph.
XML Sitemaps Don’t Solve Problems
I’ve done SEO on sites as small as 4-5 pages. I’ve done SEO on sites with 15,000,000+ pages. I’ve never once recommended the site owner create an XML sitemap and submit it to the search engines. Sitemaps don’t really solve any problems where indexing and crawlability are concerned. Let’s use a typical 100-page site as an example:
The article carried on about using XML sitemaps on a 100 page site:
No Problems?
If you have a 100-page site, and the spiders are able to crawl all 100 pages, and all 100 pages are indexed, and life is good … maybe you’re thinking a sitemap is a good complement, or something to do “just to be safe.” Why? If life is that good, you don’t need an XML sitemap. Let the spiders keep doing what they’re doing; let them crawl through your pages, let them figure out which pages deserve more frequent crawling, and which don’t. Don’t get in their way, and don’t steer them off track with a sitemap.
I have to agree with the article that crawl-ability of pages is important. Though I have to disagree that XML sitemaps do not have a place in SEO. I felt that the article missed some vital information that would have described the situation better:
-
How often did Google crawl the XML sitemap?
-
How often was the XML sitemap updated?
-
Was every page in the XML sitemap?
To Google, site maps can be definitive. Meaning that the only urls that it lists are the ones in your site maps. This also means that a post will not exist to Google unless it is in your site map. Although Google will crawl your site as well, it will rely on the sitemap.
What really got me about the article though was that the author missed the point.
XML Sitemaps are for speed indexing
Google Webmaster Central says it takes on average 1096 milliseconds or 1 second on average to download an average page on my site.
For a 100 page site this will take 100 seconds or 1 Minute 40 Seconds. With the help of XML Sitemaps, Google can know about all new pages within 1 second of indexing (1 XML Sitemap file), but still need to index the new pages.
This does not make XML Sitemaps look very useful, but we notice the difference on large sites like the author mentioned.
For a 15,000,000 page site this will take 15,000,000 seconds or:
-
250,000 minutes
-
4167 hours
-
174 days
At this rate and assuming that Google can only download one page at a time, it will take just under half a year to download all the pages. With the help of XML Sitemaps, Google can know about all new pages within 5 minutes of indexing (301 XML Sitemap files), but still need to index the new pages.
Any one of the 15,000,000 pages could link to a new page. When you have users creating pages constantly, you can not rely on Google to crawl every page on your site. It is just not possible to crawl that size site and have every new page listed in Google in under an hour. Expecting Google to look for new pages linked from any one of 15,000,000+ pages quickly is stupid. Use XML Sitemaps to ensure speed indexing.