This one time, one of our clients experienced a problem that many website developers confront from time to time. Does this problem sound familiar? You upload fresh, new content to a web page – something you want visitors, old and new, to see. Yet many visitors only see an older, outdated version of the same page. Their browsers are ignoring the new page and instead displaying a previously cached version of the same page – much to the chagrin of both the visitor and website manager.
It’s a frustrating problem, but caching web pages is a feature that is intended to be useful. By temporarily storing web documents, network traffic is reduced along with bandwidth usage and server load. Pages are loaded faster for the user, and the website is subject to less visitor stress.
How Web Caching Works
Web caching is a performance measure implemented to decrease server load and improve web page delivery speed. This is done by storing copies of web pages, resources, or data in cache. Such cache could be a browser one, a server one, or a content delivery network (CDN). Upon receiving a URL, the primary memory is checked first. If the content stored in the cache is current, it is transmitted without the need for a server request instead. Else, a new petition is put forward to extract the latest content.
To know, whether the use of cached content is appropriate or not, certain rules must be followed. Most of the pages that host vital data but are encrypted using HTTPS are usually excluded from caching for reasons of security. Other factors that guide cache behavior are based on HTTP response headers, including three main controls: novelty, acknowledgment, and dismissal.
# Freshness
This somehow ties in with how long a cached copy will be considered valid. It's typically determined by the "Expires" or "Cache-Control: max-age". A cached document that has expired is replaced if it is fresh, and it can be provided to the client without contacting the server.
# Validation
Here, we are required to query the server and confirm that the cached content is inclusive of the latest information. This is performed by conditional requests of the type "if-modified-since" or "if-none-match." The server could confirm if the cached information is changed or not since it was last served. Furthermore, using ETag in the response header's unique identifier can also help provide strong or weak validation.
# Invalidation
This case, on the other hand, happens when an old page with the cached content is no longer relevant due to a dramatic change, for example, when a related POST, PUT, or DELETE request is made. The occurrence of these actions suggests that the content could have changed and thereby be the reason for a new piece of information from the server.
content delivery network (CDN) Web caching is a performance enhancement strategy that also saves bandwidth; however, the accuracy and security of the content have to be taken into account while managing such systems. These controls ensure that users receive updated information while allowing the benefits of faster loading times and reduced server load.
Failed Attempts Using Tags
If you tend to solve caching problems in web pages you will look at the cache instructions and how various browsers are interpreting them as well as how these browsers are handling their internal caching.
Attempt #1: Cache-Control Directive: "NO-CACHE"
This is a controller that specifies that the browser won't use cached data and will fetch fresh data from the server instead. The purpose of this is to ensure that the client is updated on the latest web developments. Apparently, Internet Explorer (though, earlier versions as well) sometimes couldn't handle the line "NO-CACHE" properly. Rather than totally substituting for the cache, it may be possible that the resources used from the cache are of outdated information.
Attempt #2: Expires
The "Expires" tag specifies the exact expiration date and the time when the cached content is expired. At this point, the browser should make a new information query to the web server. The validity period with "Expires" is not flexible. Also, if updates happen occasionally, making an exact expiration date may create inconsistencies among different browsers as well as individual variations or misunderstandings.
"Max-Age" is a "Cache-Control" tag that defines the maximum time (in seconds) a resource will be stored in the cache. Instead of "Expires" which indicates an absolute duration for cache validity, it is more flexible which lets you denote a relative period. This flexibility in turn makes the Internet Explorer to ignore the specified "Max-Age" value or misinterpret it on its own, causing problems and cache issues.
Challenges with Internet Explorer
This implies that the effectiveness of these methods is directly associated with the web browser's ability to use cache-related tags in the right way. Internet Explorer's early versions mostly produced bugs and prevented even a smooth caching process. For example, tag holders were attempting to put the new content into the display, but due to the outdated browser architecture or non-standard behavior, this didn't always happen.
Alternative Approaches
Whilst walking the paths of the other ways to cure this problem, we found several more approaches, one of which was…
Last-Modified is an approach where the server sends a timestamp noting when the file was last permanently changed. While the browser is referring to a resource, the browser checks the timestamp to see if the content has been modified since the last access. If the server's Last-Modified date matches the browser's cache date, the browser knows its stored files are fresh, and thus avoids needless re-loading. If it doesn't match, the browser will automatically fetch the server version.
ETag is unique in its name for file or resource creation. The file providing this data will be created on the content, size, or version of the file. With the modification of the content, a new ETag is assigned. The browser compares the unique identifier associated with every one of these files against the one in its cached version to decide if there's a need for a download of a new version. These sequences serve as a marker that gives DNA a unique "fingerprint" when it is modified.
Obtaining Unique URL
We observed that placing a unique URL for each access was necessary to keep browsers displaying new pages at all times. This strategy removed the caching problems and got rid of the need to update the content. undefined
# Random Number Method
Uniqueness was achieved as a random number acted as a query parameter. For example, if our base URL was http:?unique=12345&page=//example.com/page, where 12345 was a randomly generated number. What was randomness is introduced in the URLs, the sites were treated on every request as a unique resource, avoiding any caching problems as browsers handle each page request as a new resource.
# Timestamp Method
As an alternate option, we can take the current time stamp for creating a unique URL. Here the base URL would have a query parameter along with the current time. For example, http://example.com/page?timestamp=1642523476 is an example of such a mass storage system where the current time in seconds since the Unix epoch was represented by 1642523476. By utilizing the timestamp, which was altered with every request, a unique identification number was created that helped update the content
Using Math.random
This code is placed at the top of the page within the tag. The variable ‘random number’ is then appended to the URL of the page, which generates a random number each time the page is encountered by the browser.
Appending a TimeStamp
This method works the same way as Math. random but appends the current time instead of a random number. So what if the page is requested twice in the same minute? The code actually returns a new value every millisecond so even if the page is requested twice in the same minute it will still have a different URL.
We chose to resolve the issue using the ‘TimeStamp’ approach and succeeded. In case you are facing a similar problem feel free to try any of these methods.
Call us at 484-892-5713 or Contact Us today to know more details about how We overcame web page caching: A simple and innovative solution for great client satisfaction.