What is a sitemap?
For the purpose of this guide we are focusing on the main type Google recommends for proper indexing, and that’s an XML sitemap. There are other sitemap types, such as ATOM, RSS and Text but these tend to fill niche solutions.
A sitemap is a guide for Google to know which URLs you would like it to look at, it’s a way of showing Google pages which you have may have added recently. There is no guarantee that using a sitemap will result in good indexing, it doesn’t force Google to crawl these pages.
However using a sitemap does increase your chances of the pages you want being indexed, especially where you have a larger site that may exceed your crawl budget.
Google gives the following guidance on when you might need a sitemap, and when you might not:
You might need a sitemap if:
- Your site is really large. As a result, it's more likely Google web crawlers might overlook crawling some of your new or recently updated pages.
- Your site has a large archive of content pages that are isolated or not well linked to each other. If your site pages don't naturally reference each other, you can list them in a sitemap to ensure that Google doesn't overlook some of your pages.
- Your site is new and has few external links to it. Googlebot and other web crawlers crawl the web by following links from one page to another. As a result, Google might not discover your pages if no other sites link to them.
- Your site has a lot of rich media content (video, images) or is shown in Google News. If provided, Google can take additional information from sitemaps into account for search, where appropriate.
You might not need a sitemap if:
- Your site is "small". By small, we mean about 500 pages or fewer on your site. (Only pages that you think need to be in search results count toward this total.)
- Your site is comprehensively linked internally. This means that Google can find all the important pages on your site by following links starting from the homepage.
- You don't have many media files (video, image) or news pages that you want to show in search results. Sitemaps can help Google find and understand video and image files, or news articles, on your site. If you don't need these results to appear in image, video, or news results, you might not need a sitemap.
https://developers.google.com/search/docs/advanced/sitemaps/overview
There's more than one kind of sitemap
OK so you want to use a sitemap, so what kind of sitemap do you need?
Web Sitemap
This is your standard, basic, no frills sitemap. It's also worth noting you can tag a web sitemap with media specific tags, or you can generate separate sitemaps for images and video (more on that later).
You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 50MB (52,428,800 bytes). If you would like, you may compress your Sitemap files using gzip to reduce your bandwidth requirement; however the sitemap file once uncompressed must be no larger than 50MB. If you want to list more than 50,000 URLs, you must create multiple Sitemap files.
So to recap:
- Sitemap must be no more than 50,000 URLs
- Cannot exceed 50MB
- Can be compressed via gzip but the uncompressed file must still be below 50MB
- To include more URLs, use a Sitemap Index
Sitemap Index
If you have a sitemap that's larger than 50MB, you'll need to split up your large sitemap into multiple sitemaps. You can use a sitemap index file as a way to submit many sitemaps at once. The XML format of a sitemap index file is very similar to the XML format of a sitemap file.
https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps
If you do provide multiple Sitemaps, you should then list each Sitemap file in a Sitemap index file. Sitemap index files may not list more than 50,000 Sitemaps and must be no larger than 50MB (52,428,800 bytes) and can be compressed. You can have more than one Sitemap index file. The XML format of a Sitemap index file is very similar to the XML format of a Sitemap file.
The Sitemap index file must:
Begin with an opening <sitemapindex> tag and end with a closing </sitemapindex> tag.
Include a <sitemap> entry for each Sitemap as a parent XML tag.
Include a <loc> child entry for each <sitemap> parent tag.
So to recap:
- Sitemap Index must be no more than 50,000 URLs
- Cannot exceed 50MB
- Can be compressed via gzip but the uncompressed file must still be below 50MB
- You may have more than one sitemap index file if you have more than 50,000 sitemaps
Image Sitemap
Google sometimes has trouble finding and indexing images. Image sitemaps are an easy way of showing Google where they all are.
Add images to an existing sitemap, or create a separate sitemap just for your images. Adding images to a sitemap helps Google discover images that we might not otherwise find (such as images your site reaches with JavaScript code).
Image sitemaps follow the same rules as web sitemaps:
- Image sitemap must be no more than 50,000 URLs
- Cannot exceed 50MB
- Can be compressed via gzip but the uncompressed file must still be below 50MB
- To include more URLs, use a Sitemap Index
Video Sitemap
In the same way that sometimes images are hard to find and index, video can be even more so. A video sitemap can be a useful way of ensuring Google can definitely see your videos.
A video sitemap is a sitemap with additional information about video hosted on your pages. Creating a video sitemap is an excellent way to help Google find and understand the video content on your site, especially content that was recently added or that we might not otherwise discover with our usual crawling mechanisms. A video sitemap is an extension to the Sitemap protocol.
https://developers.google.com/search/docs/advanced/sitemaps/video-sitemaps
Video sitemaps follow the same rules as web sitemaps:
- Image sitemap must be no more than 50,000 URLs
- Cannot exceed 50MB
- Can be compressed via gzip but the uncompressed file must still be below 50MB
- To include more URLs, use a Sitemap Index
News Sitemap
News sitemaps follow different rules to standard sitemaps and are much more geared towards informing Google quickly about changes made within your site.
When you create a Google News sitemap, follow these requirements:
Only include URLs for articles that were published in the last two days. Once the articles are older than two days, either remove those URLs from the News sitemap or remove the <news:news> metadata from the older URLs. The articles will remain in the index for the regular 30-day period.
Update your News sitemap with new articles as they're published. Google News crawls News sitemaps as often as it crawls the rest of your site.
You can add up to 1,000 URLs in a News sitemap. If there are more than 1,000 URLs in a News sitemap, break your sitemap into several smaller sitemaps, and use a sitemap index file to manage them defined in the sitemap protocol. Since News sitemaps are crawled more often than web sitemaps, this limit ensures that your site isn't unnecessarily overloaded.
Update your current sitemap with your new article URLs. Don't create a new sitemap with each update.
https://developers.google.com/search/docs/advanced/sitemaps/news-sitemap
So to recap:
- News sitemap must be no more than 1000 URLs
- Must be published in the last 2 days
- It's a moving window, old URLs should drop off it
- Only for News content
- Use a sitemap index file if you have more than 1000 URLs published in the past 2 days
Summary
Remember, Google doesn’t favor urls that appear in your sitemaps in the results. What sitemaps do is ensure that pages can be found and indexed properly and that your crawl budget is being used effectively.
Give this video a watch for a bit more of an overview and some implementation tips: