The company made a good decision in the recent weeks: the target is the sky, but at least the cloud. Amazon AWS offerings are hard to beat, so we have started with that one, played around with different configurations a bit, and finally decided that first we shall migrate the company Subversion repository to the cloud, with ZFS mirrors and encryption.
I'm a long-time fan of the ZFS filesystem and the Sun's OpenSolaris offering around it, basically because this is the best, easily accessible filesystem that provides drive mirroring with checksums, enabling automatic recovery from the underlying storage's failures. So it became a natural plan to run OpenSolaris on EC2, ZFS with EBS volumes mirrored. Although the EBS is meant to be very robust, there are always failures in every system, and we have checked a few blog entries where the EBS actually did fail, so better be prepared...
We know that we cannot achieve absolute secrecy only if we unplug the server, dump it a big hole in a deserted location and forget about it, but it seemed to be reasonable to have some encryption. The plan was that at the time the instance starts, we log in, attach the the encrypted ZFS pool with typing the password. Okay, the running instance may be monitored and the content might be extracted if the infrastructure allows such move, but we hope this is a much harder and more classified job to do, than sniffing around a volume snapshot.
I've mailed to the Sun OpenSolaris EC2 team, and they were very kind giving the initial pointers to look for the stuff. I can recommend the following sites in this topic:
Basically the last one pretty much describes most of the important part, but there are a few differences on EC2. First, the Web Console doesn't allow you to mount the EBS volumes directly, because it will provide the /dev/sdf-like mount points for you, but this is not what you are looking for, as the OpenSolaris AMI requires the device number rather. So go to the command line or use ElasticFox, to attach these drives properly. In our test drive, I've attached two 1GB volume as the 2nd and 3rd drive to the EC2 instance, they became the c7d2 and c7d3 respectively.
To cut a long story short, I've used the sun-opensolaris-2009-06/opensolaris_2009.06_32_6.0.img.manifest.xml AMI, and here are the commands that were required to complete the process:
So what does it give for me?
This works from this point on, but what happens if I shut down the instance and start a new one? Well, let's attach the EBS volumes again, and follow these commands:
Cool, it works again! You just need to import the rawstorage pool first, attach the lofi driver (get the proper password here), import the second pool, and use it as you like.
But what happens if the password is wrong? First of all, the lofi driver is unable to decide. That seems to be bad at first, but actually it doesn't matter, as we are not going to write any data if we are not able to import the subversion pool. So the worst scenario is that you type a bad password, and the zpool import won't import the subversion pool, and that is it. In such case, you shall detach the lofi drive and retype the password until it gets the pool.
Simple? Seems to be, but before you put all your crucial data on top of it, you might want to play around a bit with OpenSolaris and EC2 first. Many thanks to the Sun and Amazon teams to enable such marvelous technology combination.
Last week we have made a little proof of concept about the encrypted Subversion on Amazon EC2. This week, we decided to move forward and migrate most of our development-related stuff to the EC2 cloud, and now here goes our little success story.
The ZFS encryption works mostly as described on the previous blog, although it has a little difference after we have rebundled the OpenSolaris image. (Make sure you follow this guide!) The difference is that on the rebundled image you shall do something like this (supposed that 'storage' is the normal pool, 'safe' is the encrypted pool:
Except that, everything works as expected. We have made the following setup on the EC2:
If we ever need larger storage, we just attach a new drive, the ZFS handles the hard stuff, and detach the old. We have all the development stuff on a remote server that is reliable (okay, we need to do some regular backups even on Amazon), and we are paying much less than our previous server hosting provider. And our public company page can be hosted on a cheap host, as it is 100% static content.
So far so good.
We have started evaluating and using Amazon EC2 almost a month ago. Here are our 'lessons learned' items.
We have evaluated and used encryption with OpenSolaris and ZFS on EBS. We have successfully rebundled the instance to migrate our Subversion repository on this server. Although we have always typed the encryption password right after this migration, we have finally decided to check some scenario, e.g. when we do type it wrong: can we loose data some way? Just in case something does go wrong, we have created EBS snapshots on the volumes. After some testing, we see the data lost scenario unlikely, because if we type the password wrong, we will receive something like the following:
So we need to remove the lofi storage with lofiadm, and remount it, solves all the problem.
It is always a good idea to document things, and this is especially true with a sometimes transient service like Amazon EC2. It turned out that there was a startup bug in the official OpenSolaris bundle and you need to rebundle your server with the new version if you would like to have a better version. We did, as we have encountered this bug sometimes, so the documentation become very handy: we were required just to copy-paste the commands in the console and wait for the output, as most of our documentation was like a shell-script.
The next level of automation will be to create expect-scripts to automatically set-up and bundle full images. I'd suggest anyone starting with EC2 to write the setup scripts in this later fashion from the beginning. For the hard-core Java people like myself, ExpectJ or Enchanter are vital options too, but the ultimate solution is to use something like JSch and Groovy to control every aspect of the communication.
When we start an instance, we attach the drives, the elastic IP, then execute a few commands to mount the encrypted storage and start the services. This is a very boring process, and fortunately you could automate this process too:
Even if you are using encryption, late service starting or other exotic requirement, you might reduce the number of required steps to a very small number (1-5, including the password specification).
Automate, automate, automate...
Sometimes it is not known before the server setup how often you would like to have backups / report processes. Rebundling the server just to add a new crontab entry is a very unlucky task for anyone involved. It is better to prepare the bundle image with a few cron job that might not be ever used, but if we does require them, we are not required to re-bundle the image. For example the following commands help to define a hourly report script:
As you can see, this script is placed in the '/safe' directory, which is on the encrypted volume. If for some reason the encryption / mount fails, or if there is no such file at that place, there will be no error: the [ -x ... ] directive ensures it will be executed if and only if it is present and executable. Placing this in the encrypted volume allows us the opportunity to store a few, more confidential items here as well, e.g. our script can encrypt the report mail, or use some sftp mechanism to access some remote site for such report.
Of course the type and variety of such scripts you define in your crontab is up to you entirely.
With the ElasticFox plugin, we have encountered some strange problem, e.g. sometimes it does take a very long time to get the list of KeyPairs. One inpatient member clicked on the 'create' button, typed the same name we have had previously and silently removed our old key and placed a new one. The KeyPair was distributed internally again, but this is just a silly move it is rather not encountered.
published: 2009-07-09, a:István, y:2009, l:aws, l:cloud, l:ebs, l:ec2, l:encryption, l:opensolaris, l:subversion, l:zfs