We’ve recently migrated cricket.com’s Fantasy Cricket application from a single server running at ServerBeach to a multi-server cluster running on Mosso’s Cloud Servers system. In an attempt to document the problems we’ve had and the solutions we’ve found, I’ve decided to write down my thoughts and memories, and the processes we went through, in the hopes that it will help others.

Atomic Migrations

The actual act of copying the data from one server to another is simple. It’s not difficult to take a point-in-time snapshot of a Rails app and move it to another server. The difficult part, when dealing with a dynamic website, is keeping the two servers in sync.

Because of DNS propagation issues, you will continue to have clients hitting the old server while some users start hitting the new server as well. This can cause concurrency issues; if a user is hitting the old server, then they may make modifications to their account (or in our case, modifications to their team) that will not show up on the new site once their DNS changes. This can be a pretty big issue, and we came up with a pretty useful method of dealing with the issue that eliminates concurrency issues with minimal impact on the users.

Testing the Server

The first step was to copy the app over and get it ‘working’. To do this, we used a dump of the MySQL database, using the mysqldump utility included with MySQL. We got the app up and working, tested it, and made sure that it was going to work properly (i.e. all Ruby gems were installed, etc.). Next we took a fresh dump of the database using the –master-data option to mysqldump, which includes the information necessary to set up a replication slave.

MySQL Replication

MySQL provides realtime replication between two servers, and it’s pretty simple to set up. Setting up MySQL replication between the two servers required briefly opening MySQL to the outside world (having it listen on port 3306), but we used iptables firewalling rules to ensure that no one else could connect. Once replication was set up, we knew that the new server would be an exact replica of the old server. The application was working, the database was identical, and so on. Any changes happening on the old server would happen instantly on the new server.

MySQL provides functionality called multi-master replication, which allows two servers to be written to, and have them push their changes back and forth between two servers. While I’ve set this up and used it in the past, it’s more complicated to set up and can be a pain to maintain. If replication fails in one direction, you can get inconsistent results between the two. For this reason, we wanted to ensure all of our traffic would switch over all at once – an atomic migration. Either all traffic hits the new site, or all traffic hits the old site. This is where Apache comes in.

Apache’s mod_proxy

We use Apache to run our websites, including our rails applications (using Passenger’s mod_rails). The solution we came up with was to make use of Apache’s built-in modules to ease our transition. We set up two config files, the original (which we had already) and a new configuration that, instead of serving pages, proxied all traffic to the secondary server. In essence, all traffic to the old site was transparently proxied to the new server, and the responses were proxied back. This is the config snippet we used:

This had two benefits: first of all, it was instantaneous and atomic – all users would see traffic from the same server, either the old one or the new one. We wouldn’t have any frustrating situations where one user would hit the old server and another user would hit the new one. Secondly, it’s quick and easy. If there were problems on the new server that we hadn’t forseen, it would be easy for us to switch everything back to the old server (by swapping out the config files again).

The Changeover

In order to ensure we didn’t run into any collisions with replication, our solution was to stop Apache entirely for a second, then bring it back online with the new config. This ensured that if there was any latency between the master (old) and slave (new) servers, it would catch up in the second that we were down. After we made the change, we checked, and the site was up and running. Users were all logged out, but beyond that the experience was largely smooth. Once we knew the server was working, we changed the DNS to point to the new server, and the changeover had finished. We were live on our new (virtual) server.

In Part 2, I plan to discuss the initial scaling issues we found, as we tried to learn the bounds of Mosso’s virtual servers, and of our own application and server design.