The company made a good decision in
the recent weeks: the target is the sky, but at least the cloud. Amazon AWS offerings are hard to beat,
so we have started with that one, played around with different
configurations a bit, and finally decided that first we shall migrate
the company Subversion repository to the cloud, with ZFS mirrors and
encryption.
I'm a
long-time fan of the ZFS filesystem and the Sun's OpenSolaris offering
around it, basically because this is the best, easily accessible
filesystem that provides drive mirroring with checksums, enabling
automatic recovery from the underlying storage's failures. So it became a
natural plan to run OpenSolaris on EC2, ZFS with EBS volumes mirrored.
Although the EBS is meant to be very robust, there are always failures
in every system, and we have checked a few blog entries where the EBS
actually did fail, so better be prepared...
We know that we cannot achieve absolute
secrecy only if we unplug the server, dump it a big hole in a deserted
location and forget about it, but it seemed to be reasonable to have
some encryption. The plan was that at the time the instance starts, we
log in, attach the the encrypted ZFS pool with typing the password.
Okay, the running instance may be monitored and the content might be
extracted if the infrastructure allows such move, but we hope this is a
much harder and more classified job to do, than sniffing around a volume
snapshot.
I've
mailed to the Sun
OpenSolaris EC2 team, and they were very kind giving the initial
pointers to look for the stuff. I can recommend the following sites in
this topic:
Basically the last one pretty much describes most of the
important part, but there are a few differences on EC2. First, the Web
Console doesn't allow you to mount the EBS volumes directly, because it
will provide the /dev/sdf-like mount points for you, but this is not
what you are looking for, as the OpenSolaris AMI requires the device
number rather. So go to the command line or use ElasticFox,
to attach these drives properly. In our test drive, I've attached two
1GB volume as the 2nd and 3rd drive to the EC2 instance, they became the
c7d2 and c7d3 respectively.
To cut a long story short, I've used the
sun-opensolaris-2009-06/opensolaris_2009.06_32_6.0.img.manifest.xml AMI,
and here are the commands that were required to complete the process:
# zpool create rawstorage mirror c7d2 c7d3 So
what does it give for me?
This works from this point on, but what
happens if I shut down the instance and start a new one? Well, let's
attach the EBS volumes again, and follow these commands:
# zpool import Cool,
it works again! You just need to import the rawstorage pool first,
attach the lofi driver (get the proper password here), import the second
pool, and use it as you like.
But what happens if the password is wrong?
First of all, the lofi driver is unable to decide. That seems to be bad
at first, but actually it doesn't matter, as we are not going to write
any data if we are not able to import the subversion pool. So the worst
scenario is that you type a bad password, and the zpool import won't
import the subversion pool, and that is it. In such case, you shall
detach the lofi drive and retype the password until it gets the pool.
Simple? Seems to be, but before you put
all your crucial data on top of it, you might want to play around a bit
with OpenSolaris and EC2 first. Many thanks to the Sun and Amazon teams
to enable such marvelous technology combination.
Update on 2009-07-16Last week we have made a little proof of concept about the encrypted Subversion on Amazon EC2. This week, we decided to move forward and migrate most of our development-related stuff to the EC2 cloud, and now here goes our little success story.
The ZFS encryption works mostly as described on the previous blog, although it has a little difference after we have rebundled the OpenSolaris image. (Make sure you follow this guide!) The difference is that on the rebundled image you shall do something like this (supposed that 'storage' is the normal pool, 'safe' is the encrypted pool:
zfs mount storage Except that, everything works as expected. We have made the following setup on the EC2:
If we ever need larger storage, we just attach a new drive, the ZFS handles the hard stuff, and detach the old. We have all the development stuff on a remote server that is reliable (okay, we need to do some regular backups even on Amazon), and we are paying much less than our previous server hosting provider. And our public company page can be hosted on a cheap host, as it is 100% static content.
So far so good.
Update on 2009-08-07We have started evaluating and using
Amazon EC2 almost a month ago. Here are our 'lessons learned' items.
Be
prepared...
We have
evaluated and used encryption
with OpenSolaris and ZFS on EBS. We have successfully rebundled the
instance to migrate
our Subversion repository on this server. Although we have always
typed the encryption password right after this migration, we have
finally decided to check some scenario, e.g. when we do type it wrong:
can we loose data some way? Just in case something does go wrong, we
have created EBS snapshots on the volumes. After some testing, we see
the data lost scenario unlikely, because if we type the password wrong,
we will receive something like the following:
Initial state: So we need to remove the lofi storage with
lofiadm, and remount it, solves all the problem.
Automate...
It is always a good idea to document
things, and this is especially true with a sometimes transient service
like Amazon EC2. It turned out that there was a startup
bug in the official OpenSolaris bundle and you need to rebundle
your server with the new version if you would like to have a better
version. We did, as we have encountered this bug sometimes, so the
documentation become very handy: we were required just to copy-paste the
commands in the console and wait for the output, as most of our
documentation was like a shell-script.
The next level of automation will be to
create expect-scripts to
automatically set-up and bundle full images. I'd suggest anyone starting
with EC2 to write the setup scripts in this later fashion from the
beginning. For the hard-core Java people like myself, ExpectJ or Enchanter are vital
options too, but the ultimate solution is to use something like JSch and Groovy to control every aspect of
the communication.
Automate, automate...
When we start an instance, we attach the
drives, the elastic IP, then execute a few commands to mount the
encrypted storage and start the services. This is a very boring process,
and fortunately you could automate this process too:
Even if you
are using encryption, late service starting or other exotic
requirement, you might reduce the number of required steps to a very
small number (1-5, including the password specification).
Automate,
automate, automate...
Sometimes
it is not known before the server setup how often you would like to
have backups / report processes. Rebundling the server just to add a new
crontab entry is a very unlucky task for anyone involved. It is better
to prepare the bundle image with a few cron job that might not be ever
used, but if we does require them, we are not required to re-bundle the
image. For example the following commands help to define a hourly report
script:
export EDITOR=nano As you can see, this script is placed in
the '/safe' directory, which is on the encrypted volume. If for some
reason the encryption / mount fails, or if there is no such file at that
place, there will be no error: the [ -x ... ] directive ensures it will
be executed if and only if it is present and executable. Placing this
in the encrypted volume allows us the opportunity to store a few, more
confidential items here as well, e.g. our script can encrypt the report
mail, or use some sftp mechanism to access some remote site for such
report.
Of course
the type and variety of such scripts you define in your crontab is up to
you entirely.
Be patient...
With the ElasticFox plugin, we have
encountered some strange problem, e.g. sometimes it does take a very
long time to get the list of KeyPairs. One inpatient member clicked on
the 'create' button, typed the same name we have had previously and
silently removed our old key and placed a new one. The KeyPair was
distributed internally again, but this is just a silly move it is rather
not encountered.
published: 2009-07-09, a:István, y:2009, l:aws, l:cloud, l:ebs, l:ec2, l:encryption, l:opensolaris, l:subversion, l:zfs |
Knowledge base >