Pregenerating Static Web Pages for Better Performance

Written by: Ben Cane

In my recent Tuning NGINX article, I talked about how it's important to tune based on the specific needs of an application and its environment. In today's article, we're going to put that in practice. Last time, we tuned our environment by adjusting parameters within NGINX. Now, we're going to explore a sometimes-overlooked aspect of tuning: making adjustments to how our application works.

The application we will be tuning is a generic WordPress site that is built on a LEMP (Linux + NGINX + MySQL + PHP) stack. I selected WordPress because by default it dynamically generates every page, whether it's the front page of a blog that may not change often or specific articles that may receive comments daily.

For each request made to a WordPress site, that site is served by NGINX, a PHP application, and MySQL. This means that every request is serviced by three different service layers.

Pregenerating Semi-dynamic Pages

In order to increase our web application's performance, we'll be pregenerating the results of our "dynamic" pages and saving those results into a file cache. We will then configure NGINX to use the file cache for HTTP requests rather than our dynamic web application.

The end result will be that every HTTP request will be serviced by only one service layer, which will have quite an impact on performance.

Establishing a Baseline

Before making any changes, the first thing we should do is establish a baseline metric for application performance. In the previous article, I used ApacheBench to measure the number of requests per second that NGINX could service. We can use this same metric for our testing today as well.

Let's see what happens if we run ab against an existing WordPress installation I have already set up.

$ ab -c 40 -n 1000 http://example.com/
Concurrency Level:      40
Time taken for tests:   141.269 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      64731000 bytes
HTML transferred:       64483000 bytes
Requests per second:    7.08 [#/sec] (mean)
Time per request:       5650.762 [ms] (mean)
Time per request:       141.269 [ms] (mean, across all concurrent requests)

Based on the output above, it appears that our baseline is 7.08 requests per second. At the moment, this web application is a vanilla WordPress installation without any performance plugins or tuning of the LEMP stack.

Let's see how much of an improvement we'll get by pregenerating our front page.

Creating a Cache

Even within WordPress, there are several ways to generate a file cache of dynamic content. For this article, I will be taking a simple, noninvasive approach. We'll simply request the page we wish to cache and save the results into a file.

Create a store for our cache

Since we'll be creating a file-based cache, let's go ahead and create a directory within our application's root directory to retain cached files.

# mkdir /var/www/example.com/htdocs/cached

Within the htdocs directory of our WordPress installation, we created a cached directory. Within this directory, we are going to create our cache files. To generate those files, we will be using the curl command.

# curl -H 'host: example.com' http://192.168.33.10 > /var/www/example.com/htdocs/cached/index.html

The curl command is a very powerful utility for working with web applications from the command line. Chances are, if you've worked from the command line of a Linux/Unix system, you have used curl at some point.

In the example above, the curl command will be performing an HTTP GET request to http://192.168.33.10. This address is the IP address of our example WordPress installation. In addition to the HTTP address in the command above, we are also using the -H flag to set the host header to a value of example.com. This header is used by NGINX to determine how our request is routed and which content to display. By setting this to example.com, we are ensuring that we are routed to our example site.

At the end of the command above, you can see a > used to redirect the output of the curl command to /var/www/example.com/htdocs/cached/index.html. This redirect will write the output of our HTTP request to the index.html file.

The output of our HTTP request is of course, the HTML generated when curl requested our WordPress site's front page.

Whether it's with curl or a web browser such as Chrome, the contents of the index page for this WordPress site are always the same. At least, until a new article is posted. By saving the generated HTML to a file, we can tell NGINX to serve requests from our cached HTML file rather than sending the request to the PHP application on the backend.

# chown -R www-data:www-data /var/www/example.com/htdocs/cached

Before we move on to configuring NGINX, we first need to reset the ownership of our newly created cache file. So far, the commands we have run have been executed as the root user. Which means that the file and directory we created are owned by the root user. By resetting the owner to www-data, we are ensuring that NGINX can read our cached file.

Configure NGINX

Now that we have our cached file created, we need to tell NGINX to serve that cached file instead of the WordPress application. To do this, we'll be editing the /etc/nginx/sites-enabled/example.com file. This file is the example.com specific configuration file that defines how the example.com site is served by NGINX.

Let's take a look at the current contents.

server {
    server_name example.com   www.example.com;
    access_log /var/log/nginx/example.com.access.log rt_cache;
    error_log /var/log/nginx/example.com.error.log;
    root /var/www/example.com/htdocs;
    index index.php index.html index.htm;
    include common/php.conf;
    include common/wpcommon.conf;
    include common/locations.conf;
    include /var/www/example.com/conf/nginx/*.conf;
}

The above is fairly standard for a site running WordPress. To leverage our file cache, we will need to add to add a few items to the above configuration.

location ~ ^/$ {
  try_files /cached/index.html /index.php;
}

While at first our additions may look a bit complex, they're actually pretty simple once you understand what they're doing.

The first part, location, is an NGINX directive that identifies the location of the HTTP request. If that request matches the ^/$ regular expression, NGINX will apply the directives within the curly brackets. The ^/$ regular expression will match any HTTP request that is targeting a location that begins and ends with /. Essentially this is the address for the WordPress front page (http://example.com/).

The second part, try_files, is an NGINX directive that tells NGINX to look for the specified file (/cached/index.html) and return the contents of that file rather than its normal processing. If for whatever reason it doesn't find that file, NGINX will route the request to the defined URI (/index.php).

The fact that NGINX will redirect requests to /index.php if no cache file is found is very useful. If for some reason NGINX cannot find our cached file, the WordPress application would still serve content to visitors. Now that our changes have been added, let's take a look at the entire configuration again.

server {
    server_name example.com   www.example.com;
    access_log /var/log/nginx/example.com.access.log rt_cache;
    error_log /var/log/nginx/example.com.error.log;
    root /var/www/example.com/htdocs;
    location ~ ^/$ {
      try_files /cached/index.html /index.php;
    }
    index index.php index.html index.htm;
    include common/php.conf;
    include common/wpcommon.conf;
    include common/locations.conf;
    include /var/www/example.com/conf/nginx/*.conf;
}

The location of these lines are somewhat arbitrary, however they should be added after the root definition and before the includes; the configurations being included in this example may have some processing rules that would override our directive.

With our changes made, we can apply them by issuing a reload of NGINX. We can do this with the service command.

# service nginx reload

With the reload complete, our configuration changes have now taken effect.

Measuring the Results

With our modifications made, let's see how our performance has changed by rerunning the same test using the ab command.

$ ab -c 40 -n 1000 http://example.com/
Concurrency Level:      40
Time taken for tests:   1.524 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      64783000 bytes
HTML transferred:       64499000 bytes
Requests per second:    656.10 [#/sec] (mean)
Time per request:       60.966 [ms] (mean)
Time per request:       1.524 [ms] (mean, across all concurrent requests)

In the baseline test, our site was able to service 7.08 requests per second (mean). With the changes above, our site is now able to service 656.10 requests per second (mean). That is a performance increase of over 9000%, and the only thing we did was pregenerate the content of our front page.

Understanding the results

The resulting improvement may seem quite high; to better understand why the performance increased so much, let's break down how our HTTP requests were serviced before and how they are serviced now.

With the vanilla WordPress installation, every HTTP request to / would be forwarded from NGINX to the php-fpm the application service used to serve PHP applications. Specifically, WordPress in this case. The WordPress application itself would then make several queries to the MySQL database. This transaction flow would occur for each and every HTTP request.

With the above configuration changes added, now NGINX will first search for the /cached/index.html file when an HTTP request is made to /. If it finds that file, it opens that file and returns the contents to the HTTP client.

By having NGINX serve the content from cache, we save time and system resources with each request by eliminating the need to call php-fpm and MySQL. We are also playing to one of NGINX's strengths, as we saw in the previous NGINX tuning article; the NGINX service is very efficient at serving static content.

In Conclusion

In this article, we set up our web server (NGINX) to look for and use a cached HTML file to answer HTTP requests. This resulted in a huge performance increase in our web application's response time. This improvement in performance does however come with some negatives.

One of the negatives is that we have now removed the dynamic nature of our front page. For this site, it's okay because our front page is rarely updated. However when updates do happen, they will not automatically be seen by our visitors, which leads us to another negative. In order to ensure that visitors are seeing new content as soon as it's published, we must update the cache every time new content is published.

With WordPress, this process is made easy thanks to caching plugins such as WP Total Cache and WP Super Cache. For non-WordPress web applications, this may require some customization, whether that customization is external to the web application like our curl example or an internal function.

The key to using this methodology is to understand just how dynamic each page needs to be and cache pages that can be cached, while selectively not caching pages that require dynamic content. I personally have found that after implementing a file-based cache within my web applications, I will often change the way I design my web applications to use caching more efficiently.

One way I update design for cache efficiency is to try to use client-side dynamic content rather than server-side. An example of this is using the Disqus comments plugin rather than WordPress' native comment system. This allows me to serve dynamic content while also leveraging the speed of file caches.

Have a tip that makes it easier to create and manage caches? Share it by adding a comment to this article.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.