Natalia Wardzyńska, SEO Specialist

Thanks to sitemaps in XML format, Google can efficiently check every page important for your business and show it to users in search results. Find out what the sitemap.xml file is and how to create it.

What is the sitemap.xml file?

„Sitemap” is a list of URLs available to web crawlers, while “XML” (Extensible Markup Language) is a markup language that defines a set of rules for encoding documents in a human- and machine-readable format.

The site map can include:

URLs

no more than 50,000.

Date of last address update,

if you want to increase the visibility of content like an eBook, report, etc., encourage your audience to download it.

Priority

which determines how relevant a subpage is to the portal.

Sample sitemap generated using a CMS plugin.

Sample sitemap generated using a CMS plugin

  Of course, it is not a problem if your site has more than 50,000 addresses. In such a situation, you can create several separate site maps, e.g. one for the products on the website, blog articles and photos from the portal. They will be accessed through a summary map that links to individual standard maps.

The summary map on Artegence.com links to separate maps - a separate one for the blog section, offer pages, case studies, etc.

The collective map on the Artegence.com website refers to separate maps – separate for the blog section, offer pages, case studies, etc.

When might an xml site map be needed?

It is worth creating a site map if:

The site has multiple addresses.

In large sites, it is difficult to consider internal linking across all pages. This increases the likelihood that web robots will not detect some new pages.

The site is new and has few external links to it.

Web robots scan the Internet, following links from page to page. There is a risk that a web robot will not detect your site if there are no external links leading to it.

The site contains many images and videos or is displayed in Google News.

Google may include additional information from the sitemap in the search engine, e.g. images, videos.

A site map is not necessary if:

The site contains few subpages.

Sub-pages are well connected to each other by internal links.

The site does not contain many images and videos and is not displayed in Google News.

In most cases of sites, however, it is worth using a sitemap file just to be sure. Especially since its configuration is not one of the most difficult optimization tasks.

Site map – how to make one?

Creating a sitemap is fairly simple, but completing it manually can be quite time-consuming. Fortunately, with most CMS or a dedicated plug-in, you can generate a site map automatically. For WordPress, the ability to generate a site map is provided by the Yoast and All-in-One SEO plugins, among others. In the case of a site that does not use a CMS, the site map is easiest to create with the help of specialized tools – XML sitemap generators. Solutions based on CMS plug-ins or generators have the advantage that as new subpages (new products, blog texts) are created or deleted, such a sitamap will be able to update itself automatically. Thus, we will not have to notoriously manually change the address list file whenever we change something on the site. However, it is a good idea to review the map before posting and make sure that the generated map contains all addresses.
If they are not in the file, add them manually or configure the tool’s settings.

What URLs should not be included in sitemap.xml?

Two issues are relevant. First of all, our sitemap should include all addresses that we want to be featured in Google results. Secondly, we cannot refer in this file to addresses whose visibility we do not need. Otherwise, we’re propping up Google’s robots with positions that don’t provide us with improved traffic, and distracting them at the expense of positions important to our visibility. According to the above rule, the sitemap.xml file should avoid including addresses:

Generating response code

404, 301, 302.

Non-canonical

That is, with the canonical tag set to indicate a different address.

Blocked

via robots.txt file.

Secured by passwords

Pages

Which we do not position (e.g., subpages of regulations, RODO, etc.).

It is particularly harmful to leave addresses in the sitemap that you want Google to forget about. These are, for example, pages already redirected or blocked from positioning. If Google’s robots find it in the sitemap, they will then receive two contradictory pieces of information from us. On the one hand, the canonical/noindex tag or the 301 code convey to not position them. On the other hand, the sitemap.xml file indicates them as an integral part of our site that should appear in Google.

Publication of sitemap.xml file on the site

The site map must necessarily be published on the server. Typically, site maps are placed at your-witryna.co.uk/sitemap.xml. However, this is not a mandatory sitemap provision. Simply publishing a sitemap.xml file, however, is not enough for it to start performing its function. Once it is created, it should be submitted to Google search. There are two ways to do this.

1. submission of sitemap.xml file via Google Search Console

This can be done using Google Search Console (GSC). To do this, just go to the “Site Maps” tab, then add the URL end of the site map (without the domain address) and click “submit”.

Site Map (Sitemaps) tab in Google Search Console.

Site Map (Sitemaps) tab in Google Search Console.

In this way, we will pass information about our sitemap.xml file to Google search engine. After some time we will also get feedback in GSC on the status of its analysis. When we select one of the items from the list of uploaded sitemaps, the following panel will appear:

Feedback on Google robots' indexing of addresses indicated in sitemap file.

Feedback on Google robots’ indexing of addresses indicated in sitemap file

We can learn from it how the prepared site map was received. In turn, thanks to the “see indexed pages” option, we can learn how Google is indexing the presented URLs. Interestingly, it also lists all the cases that Google did not index the proposed address, along with the reason for this. The above information is very valuable feedback on the indexing status of our site. For this reason, adding a sitemap via Google Search Console is a better option. Nevertheless, we can use an alternative.

2. submission of sitemap.xml file via robots.txt

Another solution is to leave a link to sitemap.xml in the robots.txt file of the page in question. All you need to do is add the following directive at the end of this file:

Sitemap: twoja-witryna.pl.com/sitemap.xml

Google robots start visiting portals by reading their robots.txt files. Thus, they will easily find their way to our site map. However, what will they ultimately do with it? With this solution, we are doomed to guesswork.

Attributes of the sitemap.xml file

As we mentioned at the beginning, the sitemap does not have to include only URLs, although they are the most important ones in it. Each of the given addresses can be provided with “attributes”, that is, additional information about each indicated sub-page.

The first possible attribute is <lastmod>. It shows the date of the last modification of this page. In dynamic sitemaps, which are created by CMS plug-ins or decent XML sitemap generators, the given date will be updated whenever we change something on this subpage. This is important information for Google’s robots. He points out, for example, that something changed on the specified page on July 12 this year. This is a premise for indexing robots to return to this address and investigate what the recent modifications were about.

Sitemap files can also be tagged with <priority>. By default, it was supposed to let the robots know which subpages should be checked first, since they are particularly relevant to us. The problem with this attribute was that site owners indicated all addresses as the highest priority ones. In doing so, they hoped that it would somehow speed up the addition of all URLs to Google’s results. Ultimately, the team responsible for this search engine decided to ignore this attribute.

A similar phenomenon also affected the tag <changefreq>. With its help, it was possible to determine the frequency of content changes on a given subpage. Google also ignores this attribute, so it doesn’t make much sense to include it in sitemaps.

Summary

In summary, a properly created sitemap can help web robots get to know your site better and positively affect your positions in Google organic search. It is an important tool when managing our crawl budget, which is especially important when positioning larger portals. For all these reasons, it should not be ignored. However, it is worth bearing in mind that sitemap file optimization alone is not enough to fully realize the SEO potential of our portal. Take advantage of Artegence’s SEO department to provide holistic support for your site’s visibility.

Let’s make a great project together