Over the past few months, I've been reading about BigPipe, Cache Tags, Dynamic Page Cache, and all the other amazing-sounding new features for performance in Drupal 8. I'm working on a blog post that more comprehensively compares and contrasts Drupal 8's performance with Drupal 7, but that's a topic for another day. In this post, I'll focus on cache tags in Drupal 8, and particularly their use with Varnish to make cached content expiration much easier than it ever was in Drupal 7.
Purging and Banning
Varnish and Drupal have long had a fortuitous relationship; Drupal is a flexible CMS that takes a good deal of time (relatively speaking) to generate a web page. Varnish is an HTTP reverse proxy that excels at sending a cached web page extremely quickly—and scaling up to thousands or more requests per second even on a relatively slow server. For many Drupal sites, using Varnish to make the site hundreds or thousands of times faster is a no-brainer.
But there's an adage in programming that's always held true:
There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.
Cache invalidation is rightly positioned as the first of those two (three!) hard things. Anyone who's set up a complex Drupal 7 site with dozens of views, panels pages, panelizer layouts, content types, and configured Cache expiration, Purge, Acquia Purge, Varnish, cron and Drush knows what I'm talking about. There are seemingly always cases where someone edits a piece of content then complains that it's not updating in various places on the site.
The traditional answer has been to reduce the TTL for the caching; some sites I've seen only cache content for 30 seconds, or at most 15 minutes, because it's easier than accounting for every page where a certain type of content or menu will change the rendered output.
In Varnish, PURGE requests have been the de-facto way to deal with this problem for years, but it can be a complex task to purge all the right URLs... and there could be hundreds or thousands of URLs to purge, meaning Drupal (in combination with Purge/Acquia Purge) would need to churn through a massive queue of purge requests to send to Varnish.
Drupal 8 adds in a ton of cacheability metadata to all rendered pages, which is aggregated from all the elements used to build that page. Is there a search block on the page? There will be a config:block.block.bartik_search
cache tag added to the page. Is the main menu on the page? There will be a config:system.menu.main
cache tag, and so on.
Adding this data to every page allows us to do intelligent cache invalidation. Instead of us having to tell Varnish which particular URLs need to be invalidated, when we update anything in the main menu, we can tell Varnish "invalidate all pages that have the config:system.menu.main
cache tag, using a BAN instead of a PURGE. If you're running Varnish 4.x, all you need to do is add some changes to your VCL to support this functionality, then configure the Purge and Generic HTTP Purger modules in Drupal.
Whereas Varnish would process PURGE requests immediately, dropping cached pages matching the PURGE URL, Varnish can more intelligently match BAN requests using regular expressions and other techniques against any cached content. You have to tell Varnish exactly what to do, however, so there are some changes required in your VCL.
Varnish VCL Changes
Borrowing from the well-documented FOSHttpCache VCL example, you need to make the following changes in your Varnish VCL (see the full set of changes that were made to Drupal VM's VCL template):
Inside of vcl_recv
, you need to add some logic to handle incoming BAN requests:
sub vcl_recv {
...
# Only allow BAN requests from IP addresses in the 'purge' ACL.
if (req.method == "BAN") {
# Same ACL check as above:
if (!client.ip ~ purge) {
return (synth(403, "Not allowed."));
}
# Logic for the ban, using the Purge-Cache-Tags header. For more info
# see https://github.com/geerlingguy/drupal-vm/issues/397.
if (req.http.Purge-Cache-Tags) {
ban("obj.http.Purge-Cache-Tags ~ " + req.http.Purge-Cache-Tags);
}
else {
return (synth(403, "Purge-Cache-Tags header missing."));
}
# Throw a synthetic page so the request won't go to the backend.
return (synth(200, "Ban added."));
}
}
The above code basically inspects BAN requests (e.g. curl -X BAN http://127.0.0.1:81/ -H "Purge-Cache-Tags: node:1"
), then passes along a new ban()
if the request comes from the acl purge
list, and if the Purge-Cache-Tags
header is present. In this case, the ban is set using a regex search inside stored cached object's obj.http.Purge-Cache-Tags
property. Using this property (on obj
instead of req
) allows Varnish's ban lurker to clean up ban requests more efficiently, so you don't end up with thousands (or millions) of stale ban entries. Read more about Varnish's ban lurker.
Inside of vcl_backend_response
, you can add a couple extra headers to help the ban lurker (and, potentially, allow you to make more flexible ban logic should you choose to do so):
sub vcl_backend_response {
# Set ban-lurker friendly custom headers.
set beresp.http.X-Url = bereq.url;
set beresp.http.X-Host = bereq.http.host;
...
}
Then, especially for production sites, you should make sure Varnish doesn't pass along all the extra headers needed to make Cache Tags work (unless you want to see them for debugging purposes) inside vcl_deliver
:
sub vcl_deliver {
# Remove ban-lurker friendly custom headers when delivering to client.
unset resp.http.X-Url;
unset resp.http.X-Host;
unset resp.http.Purge-Cache-Tags;
...
}
At this point, if you add these changes to your site's VCL and restart Varnish, Varnish will be ready to handle cache tags and expire content more efficiently with Drupal 8.
Drupal Purge configuration
First of all, so that external caches like Varnish know they are safe to cache content, you need to set a value for the 'Page cache maximum age' on the Performance page (admin/config/development/performance
). You can configure Varnish or other reverse proxies under your control to cache for as long or short a period of time as you want, but a good rule-of-thumb default is 15 minutes—even with cache tags, clients cache pages based on this value until the user manually refreshes the page:
Now we need to make sure Drupal does two things:
- Send the
Purge-Cache-Tags
header with every request, containing a space-separated list of all the page's cache tags. - Send a BAN request with the appropriate cache tags whenever content or configuration is updated that should expire pages with the associated cache tags.
Both of these can be achieved quickly and easily by enabling and configuring the Purge and Generic HTTP Purger modules. I used drush en -y purge purge_purger_http
to install the modules on my Drupal 8 site running inside Drupal VM.
Purge automatically sets the http.response.debug_cacheability_headers
property to true
via it's purge.services.yml
, so Step 1 above is taken care of. (Note that if your site uses it's own services.yml
file, the http.response.debug_cacheability_headers
setting defined in that file will override Purge's settings—so make sure it's set to true
if you define settings via services.yml
on your site!)
Note that you currently (as of March 2016) need to use the -dev release of Purge until 8.x-3.0-beta4 or later, as it sets the
Purge-Cache-Tags
header properly.
For step 2, you need to add a 'purger' that will send the appropriate BAN requests using purge_purger_http: visit the Purge configuration page, admin/config/development/performance/purge
, then follow the steps below:
- Add a new purger by clicking the 'Add Purger' button:
- Choose 'HTTP Purger' and click 'Add':
- Configure the Purger's name ("Varnish Purger"), Type ("Tag"), and Request settings (defaults for Drupal VM are hostname
127.0.0.1
, port81
, path/
, methodBAN
, and schemehttp
): - Configure the Purger's headers (add one header
Purge-Cache-Tags
with the value[invalidation:expression]
):
Note: Don't use the header in the screenshot—usePurge-Cache-Tags
!
Testing cache tags
Now that you have an updated VCL and a working Purger, you should be able to do the following:
Send a request for a page and refresh a few times to make sure Varnish is caching it:
$ curl -s --head http://drupalvm.dev:81/about | grep X-Varnish
X-Varnish: 98316 65632
X-Varnish-Cache: HITEdit that page, and save the edit.
Run
drush p-queue-work
to process the purger queue:$ drush @drupalvm.drupalvm.dev p-queue-work
Processed 5 objects...Send another request to the same page and verify that Varnish has a cache MISS:
$ curl -s --head http://drupalvm.dev:81/about | grep X-Varnish
X-Varnish: 47
X-Varnish-Cache: MISSAfter the next request, you should start getting a HIT again:
$ curl -s --head http://drupalvm.dev:81/about | grep X-Varnish
X-Varnish: 50 48
X-Varnish-Cache: HIT
You can also use Varnish's built in tools like varnishadm and varnishlog to verify what's happening. Run these commands from the Varnish server itself:
# Watch the detailed log of all Varnish requests.
$ varnishlog
[wall of text]
# Check the current list of Varnish bans.
$ varnishadm
varnish> ban.list
200
Present bans:
1458593353.734311 6 obj.http.Purge-Cache-Tags ~ block_view
# Check the current parameters.
varnish> param.show
...
ban_dups on [bool] (default)
ban_lurker_age 60.000 [seconds] (default)
ban_lurker_batch 1000 (default)
ban_lurker_sleep 0.010 [seconds] (default)
...
If you're interested in going a little deeper into general Varnish debugging, read my earlier post, Debugging Varnish VCL configuration files.
Other notes and further reading
I spent a few days exploring cache tags, and how they work with Varnish, Fastly, CloudFlare, and other services with Drupal 8, as part of adding cache tag support to Drupal VM. Here are some other notes and links to further reading so you can go as deep as you want into cache tags in Drupal 8:
- If you're building custom Drupal modules or renderable arrays, make sure you add cacheability metadata so all the cache tag magic just works on your site! See the official documentation for Cacheability of render arrays.
- The Varnish module is actively being ported to Drupal 8, and could offer an alternative option for using cache tags with Drupal 8 and Varnish.
- Read the official Varnish documentation on Cache Invalidation, especially regarding the effectiveness and performance of using Bans vs Purges vs Hashtwo vs. Cache misses.
- There's an ongoing meta issue to profile and rationalize cache tags in Drupal 8, and the conversation there has a lot of good information about cache tag usage in the wild, caveats with header payload size and hashing, etc.
- As mentioned earlier, if you have a
services.yml
file for your site, make sure you sethttp.response.debug_cacheability_headers: true
inside (see note here). - Read more about Varnish bans
- Read more about Drupal 8 cache tags
- Read a case study of cache tags (with Fastly) dramatically speeding up a large Drupal 8 site.
- Be careful with your ban logic in the VCL; you need to avoid using regexes on
req
to allow the ban lurker to efficiently process bans (see Why do my bans pile up?). - If you find Drupal 8's
cache_tags
database table is growing very large, please check out the issue Garbage collection for cache tag invalidations. For now, you can safely truncate that table from time to time if needed.
Comments
Note that there is still active discussion over some aspects of best practices around cache tags in Drupal 8, mostly in the thread Profile/rationalize cache tags. See, for example, Wim's comment with some further information about best practices for what headers to be used, and where those headers are generated.
Also make sure you follow the status of Generate own X-Cache-Tags or X-Drupal-Cache-Tags header.
I've updated the post with the new
Purge-Cache-Tags
header, which is the default header name set by the Purge module as of the latest -dev release.Thanks so much for this post! Might be good to update the post so that it explicitly notes the new header value is different from what is in the screenshot, because as it stands it's a little confusing.
Done!
I think it'd be good to explicitly mention this only affects the caching of responses in the end user's browsers. Varnish can cache responses indefinitely, precisely thanks to cache tags: whenever a response changes, a tag will be invalidated, and Varnish will be informed. Therefore, it's fine for Varnish to cache all (cacheable) responses forever.
Good point! I've updated the post to make this a little more clear.
Great article, although there is one point you touch on briefly: PURGE vs BAN. And there is ont hing that needs to be said: Bans work, but if you have to do hundreds of bans, that does not scale. Also, bans do not free memory at once, but rely on the banlurker cleaning the ban list. Purges free memory right away, but normally cannot be wildcarded. Enter the XKEY VMOD, previously known as Hash-Two. It provides Surrogate Keys support in Varnish Cache:
The XKEY VMOD was open sourced as part of the release of Varnish Cache 4.1 and it is now included in our Varnish modules bundle (together with other great VMODs):
I believe that once the Drupal community realizes the power of Surrogate Keys and Varnish, you will forget about bans :) More on this topic:
Feel free to reach out to me if you want to know more. I'll be happy to help.
This is excellent! I'll definitely have a look at the xkey vmod, though in my (limited) testing, bans and the lurker are performing quite well for even a pretty hefty load of BAN requests—of course that's making sure I'm using lurker-friendly bans! I can imagine things would get a bit shaky if you're using
req
matching everywhere.The main advantage is that you do not depend on the ban list being cleaned by the ban lurker. Rather you get the objects removed from memory right away with purges and with xkey/surrogate keys style purges.
One question I've been wondering about, especially in the light of Ruben's answer, is what the cost of using the xkey.softpurge() method instead of xkey.purge() is likely to be, considering the fact that with the Varnish Purger module, we tend to make max-age very high (like over one month), and soft purges are supposed not to release memory until the TTL expires.
Thank you for the excellent tutorial, one additional note:
Purge module will generate HUGE headers, you might quickly end up with WSODs on more complex pages, as the header size starts to be above the limit.
http://stackoverflow.com/questions/11526674/nginx-big-header-response
The accepted answer corrects this issue, of course play with the numbers if needed.
There is now a Varnish purge module: https://www.drupal.org/project/varnish_purge, I also updated the suggested VCL here: https://gitlab.wklive.net/snippets/26
Is the idea that this will be the 'official' Varnish purge module going forward, or an alternative to setting up a normal HTTP purger with Purge?
My idea is that this is going to be the official one, and adding more varnish related features to it.
This article really helped us get going on a proper Varnish BAN setup, however the example VCL could use some work – One issue in it really tripped us up!
The issue is in this section:
# Logic for the ban, using the Purge-Cache-Tags header. For more info
# see https://github.com/geerlingguy/drupal-vm/issues/397.
if (req.http.Purge-Cache-Tags) {
ban("obj.http.Purge-Cache-Tags ~ " + req.http.Purge-Cache-Tags);
}
This really should be testing against `obj.http.Cache-Tags`, not `obj.http.Purge-Cache-Tags`, since the latter appears only in the BAN request, but the former is the header that Drupal spits out that lists the Cache Tags for that page.
The correct block is:
if (req.http.Purge-Cache-Tags) {
ban("obj.http.Cache-Tags ~ " + req.http.Purge-Cache-Tags);
}
Fantastic read Jeff! Thanks for this, it will be a goto source for many!
While following this, in the "Testing cache tags" section, step 3 trying to run "drush p-queue-work" I get the following:
"Not authorized, processor missing! [error]"
Ok, I enabled the "Cron processor" & "Purge Drush" modules and now get:
There were no items to be claimed from the queue.
According to https://www.drupal.org/project/purge you will want to enable the following modules, not just the ones listed on this page:
purge purge_ui purge_drush purge_queuer_coretags purge_processor_cron
Unfortunately for some reason when I did this via drush it did not enable "purge_queuer_coretags" so I had to do it manually via the web UI.
Hey Jeff! Thanks for some great resources as always.
One question I have regarding integrating Drupal and Varnish is that if it's possible to send a "clear everything" command to Varnish?
I can't find any information about it.
There seems to be some kind of "Everything" purger in the modules, but I have no idea how to trigger it.
> but a good rule-of-thumb default is 15 minutes
But isn't that also the TTL of the page at Varnish? What makes the browser to cache the page for 15 minutes and Varnish to cache it indefinitely (say)?
The guide is pretty good. A thing that I couldn't find here is that you actually need to set the reverse proxy if you're not using one(my case).
I did that in the settings file
$settings['reverse_proxy_addresses'] = ['myproxy.nix.universe.com'];
Problem Statement - Need to setup reverse proxy server like varnish for Drupal 10 hosted on Microsoft Azure, how to proceed?