Introduction to Varnish

I’ve been meaning to write some posts about Varnish for a long time now. We use it extensively at work as part of our server stack and it’s a piece of software that consistently makes my jaw drop with it’s amazing performance and speed. What I’m hoping to do is to write a series of four blog posts introducing it. This one will explain what Varnish is as well, when you might want to use it and what some of the pitfalls are around using it. The second will look at installing and basic setup of Varnish while the third will look at more advanced configuration, including an introduction to the Varnish Configuration Language (vcl) as well as saint mode, grace mode and edge side includes. The final post will take a look at some of the tools that Varnish makes available to help monitor and fine tune its performance. So, without further ado let’s get on with the show.

What is Varnish?

Varnish is a caching reverse proxy, also known as an HTTP accelerator. It sits in of any http server, listening on port 80 for requests. When a request arrives Varnish checks to see if it has content in cache to serve and only contacts one of the web servers if it doesn’t. Content returned by the server is then potentially cached, depending on the values set in the cache-control header or varnish configuration. Varnish can cache any content that is delivered via HTTP including (but not limited to) HTML, XML, JSON, images and media files.  This video from Varnish software will hopefully make this a little clearer:

Varnish is open source software and completely free to use, although you can purchase extra support and services from Varnish Software if you wish. It’s a stable and mature piece of software, with version 1.0 being released in 2006 and the current 3.x branch being first released in 2011. It’s also widely used, with some estimates stating that around 500 (or 5%) of the biggest 10,000 sites in the world use it.

Why should you consider using Varnish?

I don’t have any experience with using another caching reverse proxy such as Squid or Nginx but below are some of the main reasons that explain why I think you should consider using Varnish.

Speed

The developers of Varnish claim that you will generally see an increase of between 300-1000x in terms of number of requests that can be served and I’ve seen nothing to disprove that in our use of it.

Performance

As well as being incredibly fast Varnish also performs amazingly well, handling massive amounts of traffic without killing your server. The developers claim that you would typically max out the internet connection of your server before you reach Varnish’s limit in terms of the number of requests it can serve content from cache to.

Scalability

Due to its performance Varnish can easily handle massive bursts or sustained high levels of traffic. The BBC make heavy use of Varnish and during the London Olympics they served 10.4 million visitors (not page views, visitors) daily to the BBC sports website from Varnish. There’s a very interesting presentation about how the BBC use Varnish with slides and video here. The Texas Tribune also uses Varnish and has written about how Varnish helped them to keep their site up under an enormous burst of traffic during a recent Texas house filibuster.

Flexible configuration

Varnish has it’s own configuration language (vcl) that enables you to precisely specify how it should behave in various situations. At runtime the vcl files are compiled into C and then dynamically loaded into Varnish. You can think of Varnish as being like a Lego model that comes pre-assembled. Out of the box it’s a great tool but you also have the option to rebuild it to precisely fit your needs.

Support for edge side includes (ESIs)

Varnish has built in support for Akamai’s ESI standard. This allows you to cache different content on the same page for different lengths of time. For example, you may have a news item on your site that you cache for 2 days but a side bar with a social media feed that is only cached for 5 minutes. With ESIs Varnish will only request the pieces of the page that it needs from your backend servers as they expire in the cache, stitching the pieces together into a complete page to send to users.

Support for extensions

As well as providing its own configuration language since version 3 you can write or use extensions called VMods written in C to add extra behaviour to Varnish. An example of this is again provided by the BBC who use a GeoIP VMod to decide what content to serve depending on where you are in the world. You can write your own VMods but there are a large number of them available to download on the Varnish website.

What questions should you be thinking of before using Varnish?

While Varnish is an amazing tool it is not perfect for every situation. If you’re considering using Varnish here are some of the questions that I think you should be asking yourself.

What kind of content are you serving?

A caching reverse proxy is perfect for situations where the content you are serving is largely the same for every user on your site. You can accommodate content tailored for different users through the use of ESIs but if you need to serve a page where every element is uniquely tailored to the user then it cannot really be cached by Varnish. If the majority of your site contains this sort of content Varnish is probably not the right tool for you.

Limited supported environments

The developers of Varnish have made a decision to only support a limited number of operating systems. They guarantee that Varnish will work on modern, 64 bit versions of Linux, FreeBSD and Solaris. They try to make sure that it will work on OS X, OpenBSD and NetBSD although this is not guaranteed. You cannot install Varnish on Windows, although there’s nothing to stop you installing it on a supported OS that then sits in front of one or more Windows servers. You can also install Varnish on 32 bit versions of supported OS’s but it will then be limited as to the amount of memory that it can address.

Cookies

As I’ve already said, Varnish is a tool that usually works best when the content that is being served is largely the same for everyone. Having a cookie present in a request or a response normally implies that some sort of state is being persisted between user requests, which also implies that the content is somehow tailored to the individual user. In cases where a cookie is present in a request Varnish’s default behaviour is to pipe the request directly to a backend server. Similarly where a cookie is present in a response from a backend server Varnish will by default pass that onto the client and not cache the content. You can control this in a number of ways, perhaps by limiting where your application sets cookies or through having Varnish strip them out of requests and responses in your vcl configuration. While cookies can be handled with Varnish, it is an area that needs some thought and planning.

SSL

Varnish does not handle requests encrypted with SSL. It does not include functionality to install or use SSL certificates. I have read of cases where Nginx is setup in front of Varnish to receive and decrypt requests before forwarding them onto Varnish for processing but this would add extra complexity to your stack. In many cases applications that use SSL will be sending content across secured connections that is meant for an authenticated user which should not be cached. For cases where this is not true Varnish may not be the best solution for you.

Varnish works like a layer of… varnish

Varnish is amazing at caching content and delivering cached copies of that content as quickly as possible. There are many occasions when this may be a problem though. Let’s imagine that you have some sort of CMS and a user of your system publishes an article. You’ve configured Varnish and your application to cache that content for 2 hours. Your user then publishes an update that absolutely must go live that moment. Your user won’t tolerate waiting for up to two hours for the content to refresh but how do you tell Varnish to drop the cached item and to fetch a new copy without clearing the entire cache? Varnish does provide a solution by adding a custom HTTP PURGE method to the protocol as well as enabling you to purge content through the varnishadm console utility. The PURGE method works exactly like a GET request but instead of returning the content Varnish will drop objects that match that URL from its cache. You need to carefully configure who is allowed to send PURGE requests to your server though since this could potentially result in denial of service attack. Varnish does provide all of the tools you need to configure this but it is an area you will need to carefully consider before deploying Varnish in your stack.

Do you really need Varnish?

This point is less clear cut than the others. Varnish is a tool that really shines when you’re dealing with a crushing load of traffic. Adding it to your server stack does add complexity to your environment but it can enable you to handle enormous volumes of requests with relatively few servers. If you’re not at the point where you’re dealing with traffic at that scale you should probably ask yourself if you really need Varnish. The users of your application may (depending on how Varnish is configured) experience lightning fast page loads but that may be outweighed by the maintenance overhead of adding Varnish to your stack.

Next steps

Hopefully this post has whetted your appetite for exploring what Varnish can do. In the next post in this series I’ll cover installing it and basic configuration for serving a site through it using the default options.

Leave a Reply