Zero-downtime server updates

Zero-downtime server updates with HTTPS, Tomcat, Nginx and Amazon Load Balancer (ELB)

Hi all,
it is strange for me to write in a tech blog not directly related to coding but you know, you never stop learning here at Balsamiq.

I will describe our recent experience setting up a load balancer with SSL on Amazon Web Services. Our goal was to achieve zero down time during upgrades for the server running our internal back-office webapp. We wanted also to redirect all http traffic to https to enforce encryption. It was not trivial, but Peldi and Luis did a great work!!

One of our webapp tasks is to listen to incoming transactions posted by our online seller and store them in our database. It's based on grails and is running on an Amazon EC2 instance. Every day we develop and release little new features and is quite a problem when we upgrade our server because incoming transactions will be lost while it is down for maintenance...

Initial setup

We needed a solution that makes upgrading our webapp simple and reduces downtime to zero! We wanted also to keep the ability to redirect http traffic to https like in our single server setup.
So, let's go for it using Amazon Services!

ELB setup

The first step is to setup an Amazon Elastic Load Balancer. Really easy using the Amazon Management Console. We want our load balancer to receive all incoming traffic on ports 80 and 443 and to simply forward it to the same ports on our instances, like this:

Since the load balancer will handle SSL, we had to provide a certificate. If you already uploaded your certificate to Amazon, it will appear in a drop down list for you to choose, alternatively you will have to paste the certificate and the key issued by your certificate authority in the corresponding text boxes. Do not forget (like I did!!) to add also the chain certificate. Note that if you want to modify the certificate after your ELB has been created you can use command line tools like described here (we used a certificate issued by godaddy with no problems).

Next step is to configure the heath check for your instances so that the load balancer will know which ones are down and where to safely route its traffic. We used a check on port 44 HTTP (remember SSL is managed by the load balancer!) of each EC2 instance.

Amazon provides out of the box a fancy public DNS name for your load balancer but you will certainly want to use your own sober domain name. To do this you should create a CNAME record for the LoadBalancer DNS name as specified by Amazon docs. For more information about CNAME records, see the CNAME Record Wikipedia article.

Ok, now we have a simple load balancer but we have no instances yet!

EC2 instances setup

In our idea each instance should run a servlet container with our grails application and should listen for incoming connections on ports 80 and 443. Connections on port 80 should be redirected to 443 to enforce encryption through the load balancer. We started from a standard Ubuntu 10 image and customized if for our purposes.

We used a divde et impera thecnique here, thanks to a brilliant idea of Luis!
We had 2 different tasks:

Running a webapp
Doing some proxying/url rewriting stuff

So we used two different pieces of software on the same machine, each doing its own best: Tomcat would have handled our application and Nginx would have acted as our proxy. Installing both products was straightforward.

Tomcat configuration was really nice and easy. Since Tomcat only purpose was to run our webapp we just put a simple connector for http on port 8080 in the configuration file server.xml with no encryption and no redirects:
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
URIEncoding="UTF-8" />Then we just set our webapp as the ROOT one. Here the Tomcat reference to do that.

Nginx configuration was a little trickier. We created a virtual host config file under nginx folder /etc/nginx/sites-available and created a link to it inside the folder /etc/nginx/sites-enabled. The file contains Nginx directives for url rewriting and redirecting. Detailed info on virtual hosts can be found on Nginx wiki.

The configuration file uses two server statements, the first looks like this:
server {
server_name localhost;
access_log /var/log/nginx/website.redirector.access.log;
location / {
rewrite ^ https://public.website.com permanent;
}
}
This basically says to nginx to listen on port 80 for any location request (/) and redirect clients to https://public.website.com that is the main website using https protocol. Now all http connections will be redirected to the home page using https!

You can alternatively use this location line to redirect to the same page requested, always forcing https.
location / {
rewrite ^ https://public.website.com$uri permanent;
}
Ok, now we just need to create the last server statement:
server {
listen 443; ## listen for ipv4
listen [::]:443 default ipv6only=on; ## listen for ipv6
server_name localhost;
access_log /var/log/nginx/website.access.log;
location / {
proxy_pass http://127.0.0.1:8080;
}
}
This is needed for handling connections on port 443. Connections incoming on this port are unencrypted because the load balancer handles the SSL certificate and decryption. We want these connections to be passed on to our tomcat server listening on the same machine on port 8080, so we set the proxy pass value to http://127.0.0.1:8080.

Done! Connections on port 443 are decrypted by our load balancer, passed on port 443 of our instance to Nginx that forwards them to Tomcat running on port 8080!

It remains just a little issue to take care of...
Our webapp is totally ignorant about our complex-but-beautiful-2stages-proxy-configuration, so when it sends a redirect to the client browser it will use a location containing an address of the kind: http://127.0.0.1:8080 !!!
Nginx to the rescue! We can add a rule to our configuration to take care of all that messy redirects. This is the final config that translates the redirects from http://127.0.0.1:8080 to our pretty https://public.website.com
server {
listen 443; ## listen for ipv4
listen [::]:443 default ipv6only=on; ## listen for ipv6
server_name localhost;
access_log /var/log/nginx/website.access.log;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_redirect http://127.0.0.1:8080 https://public.website.com;
}
}
Now our setup is finally ok!

Final setup

Automatic updates: the icing on the cake

Last thing we did is to add some automation to our instance. We created a startup job that executes the following tasks:

Download the last build of our webapp from our build server and put it into the Tomcat war folder
Start Tomcat so that it will deploy the new war as the ROOT webapp and start listening on port 8080
Start Nginx that will serve requests on ports 80 and 443 as per our configuration

After this last piece of configuration we crated an AMI out of our instance.

Each time we want to update our server we simply launch a new instance and add it to the load balancer. As soon as it recognizes the instance as "in service" we just stop the old one... voilà our webapp was updated with zero downtime! :)

Nearly automatic updates

http://blogs.balsamiq.com/tech/2011/04/08/460/