If-Modified-Since/Last-Modified usage and SEO

If-Modified-Since/Last-Modified usage and SEO
Every day, thousands of resources are accessed on the web, either opened in browsers by website visitors or crawled by spiders.

Every day, thousands of resources are accessed on the web, either opened in browsers by website visitors or crawled by spiders. Each time this happens, the server that stores the resources receives a request to send back data. One way to speed up this process is to use cache controls such as If-Modified-Since and Last-Modified. When the server is additionally configured using cache controls, this process is made easier and faster because depending on whether the data has changed over time, or not, the server will be able to determine whether it needs to waste time and send the data again or not. By improving the efficiency of the client-server communication, you are also improving the performance of your website when it comes to crawling and indexing. The same cache controls that apply for server-browser communication, also apply when web spiders crawl your webpages. The easier and faster it is for the spiders to go through your site, the better it will be indexed. And that greatly influences and improves your SEO ranking.

What are If-Modified-Since and Last-Modified Headers? The If-Modified-Since and Last-Modified headers, are a pair of cache controls called a “conditional GET” request, and they define if the response to the GET will be different depending on whether the page has changed or not. If-Modified-Since and Last-Modified headers request and response HTTP headers, respectively. Their main use is to help improve applications’ performance and at the same time, save bandwidth. If-Modified-Since is a request header which is being sent to a server when a page needs to be opened in a browser (or crawled by the Googlebot) so the browser can learn when the resource was last modified on the server side. The responding, Last-Modified header indicates the time when the resource was last modified on the server and tells the browser whether it can use its previously cached copy or if it needs to download a new version of the resource. Last-modified response header contains the date and time at which the origin server thinks the resource has been last modified. When the cache stores an entry which includes the last-modified header, it can use it to query the server if the content has changed over time. Basically, it is used as a validation of whether a resource received and stored, is still the same. And this is done using the If-Modified-Since request header. How They Work? Let’s explore a simplified example of how and where these two headers are applied:

  1. Each time a browser tries to access a web page, a server receives a normal HTTP request for a specific resource, for example ABC.
  2. Now, the server prepares the response, and its logic dictates that the browser should locally cache resource ABC. It’s a default practice by all browsers, so there is no need for adding other special headers in the response. But, some browsers may explicitly send “Cache-Control” in the header to limit and specify the maximum age of the cache.
  3. In the response, the server will include the Last-modified It indicates the date and time when ABC was last modified on the server-side. Last-Modified: Tue, 25 Apr 2017 08:36:15 GMT
  4. Often, the server may include the optional Cache-Control header in the response. When this header is not included, for accessing the resource through a link, or by manually entering the link in the address bar, the browser will use the cache directly without sending a new request to the server to check the validity of the stored cache. On the other hand, when the header is included, it forces the browser to check the cache validity by submitting a request to the origin server before displaying the cached copy. The validity refers to whether the resource ABC has been changed since previous access. Cache-Control: no-cache
  5. The origin server sends the response with both, Last-modified and Cache-Control headers, the ABC resource in the body, and a 200 status code, which indicates that the content loaded correctly. At the same time when the browser loads the resource, it also stores a cached copy, along with the header information it received.
  6. Later, when the same browser makes a new request to access the same ABC resource, in the header it sends the request header If-Modified-Since. If-Modified-Since: Tue, 25 Apr 2017 08:36:15 GMT
  7. When the server receives the request for the ABC resource, along with the “If-Modified-Since” header, the server side logic is programmed to check whether it need to send a new copy of the resource by comparing the current date of ABC (the date of its last modification) and the one received in the request header.
  • If the date in the “If-Modified-Since” header is same as the currently modified date of ABC, then the server responds by sending back a 304 status code, and an empty body. The 304 status code means the resource was not modified since it was last accessed, so the browser will use its cached copy.
  • If the date in the “If-Modified-Since” header shows older date than the currently modified date of ABC, then the server will send back the new version of the ABC resource, along with a 200 status code. Also, a “Last-modified” header with a new value will be included in the response. Now, the browser will use the new version of ABC, and update its cache with the new data.

What’s important here is that depending on the implementation of the origin server and the nature of the original resource, the exact meaning of the Last-Modified header field may be different. For entities with dynamically included parts, it may be the most recent of the set of last-modify times for its component parts. For database gateways, it may be the last-update time stamp of the record. For virtual objects, it may be the last time the internal state changed. Additionally, the origin server must not send a last-modified response where the date is then the time when the server’s t message originates. In that case, where the resource’s last modification indicates some time in the future, the server must replace the response date, with the message origination date. When the origin server sends the last-modified response, it should obtain the value as close as possible to the time when it generates the Date value of the response. That will help the recipient to make a precise assessment of the entity’s modification time, and this is especially important if the entity changed close to the time when the response is generated. Why are If-Modified and Last-Modified headers important for SEO? The main purpose for using the cache control If-Modified-Since and Last-Modified headers is to allow efficient updating of the cached information while using a minimum amount of transaction overhead. These headers will help improve page speed and load times and improve the overall website performance and user experience. In terms of technical SEO, cache controls are very important because they improve the crawling and indexing of your website. When you are optimizing your website for crawling, try to think of the search engine crawler as web proxy cache that is trying to pre-fetch your website, and create a temporarily stored version of the pages in the index. Also, remember that there is a limited crawl budget dedicated to your site, and your goal when optimizing for SEO is to make the most of it. That’s why it is excellent if you can save your precious crawling budget and help search engines crawl the pages more efficiently and index them better by telling the spiders whether they need to go over certain pages and index them again, or not. This is especially true if you have a large website, and you change or add more content frequently. There’s certainly no need for Google to re-crawl a page that hasn’t changed. But if your pages aren’t tagged correctly, or if they don’t use cached headers, your site might not be crawled correctly. For example, your “About Us” page may be crawled more often because it’s linked in the footer of each page, which sends a signal to Googlebot that it’s an important page, when in fact it isn’t, especially when it comes to crawling and SEO ranking. So instead of outing a “noindex” tag, because you still want the page indexed, you can fix this problem by using cached headers to tell web spiders that the content of this page hasn’t changed and there’s no need to send new download requests to the server. Now, that the spiders know they can use the already indexed copy of the page, they can continue crawling on your new and more relevant pages. Another problem the often arises for people with large and complex websites when it comes to crawling, and can be solved by using If-Modified-Since and Last-Modified headers, is the increased consumption of bandwidth, which means extra cost. By using cache control headers and limiting the unnecessary crawling, you will be able to save bandwidth and ease the crawling process. If you manage to reduce the number of server request, and if you use cached content, you can greatly influence and increase the website speed. And we all know that website speed and page load time are among the most important user-experience factors. And happy visitors are good for your SEO ranking. If-Modified-Since/Last-Modified plugins for WordPress If you have a WordPress website, then it is very easy to improve the functionality and performance of your site by using some of the over 49 000 plugins that WordPress offers to its users. And thousands more are available from third-party websites like Github. The plugins are small software apps which integrate and run on top of the WordPress software, and allow you to create any kind of website, and simply improve the site’s functionality. If you are satisfied with your site’s design, and want to improve its SEO and boost performance, there are certainly plugins for that too. One of the easiest ways to utilize the power of caching is by installing a plugin to your WordPress website that will handle the server-side caching performed by your WordPress server. The top three cache plugins you could install include:

  • WP Super Cache – the most downloaded cache plugin, with over million installations. It’s free, easy to use and requires little to no configuration. It has a plethora of customizations to suit everyone’s needs.
  • W3 Total Cache – the second most downloaded caching plugin. It’s also free, and has 16 pages of configurations options, and you’ll be able to tailor a caching solutions to your needs.
  • WP Rocket – It’s a premium plugin, and one of the fastest options on the market, but unlike the previous two, it’s not free. It costs between $39-$199, depending on how many sites you want to install it on. It’s easy to install it, but you also get technical support once you buy a license.

Or, you can simply go with some of the If Modified Since and Last Modified plugins that WordPress has to offer too. Conclusion Having your webpages’ perform better is very important for improving the end-user’s experience and your SEO ranking. It makes no sense to have the browser download a webpage again, or have Googlebot repeatedly go over pages in which there’s no change in the content. Caching your webpages saves a lot of time when it comes to both, opening your website in a browser, or having Googlebot crawl it. The implementation of the If-Modified-Since and Last-Modified headers translates into better SEO results. It will stop Google from going over the same, unchanged pages of your site, save your crawling budget, and lower the amount of bandwidth spent for crawling. Additionally, it will improve the experience for your first-time as well as repeating visitors by shortening page load times. With a little effort, you can significantly influence your website’s performance and SEO ranking. And whether you implement this cache control manually, or decide to use a plugin, don’t hesitate. It can only do good for your website and SEO efforts.