Thursday, May 15, 2014

Using an m3.medium AWS server for less than the cost of a t1.micro

I've used Amazon Web Services for quite some time. My online football management simulator, MyFootballNow, runs on AWS.  The sims spin up on spot instances to help keep costs down, because they need to run on at least a m1.small server to finish in an acceptable amount of time. I also have a test environment where I try to sim a full season every day, to see the long-term effects of various code changes in the game engine. I've been using a physical server for that, because at this stage I don't want to pay for a server at Amazon full-time just to run a test environment. This server has been showing signs of impending failure lately, though, and rather than purchase a new server I began to explore the least expensive way to have a decent server at AWS.

My first thought of course was to purchase a reserved instance.  With the ability to sell your reservation if you end up not needing it any longer, the risks are low. But it's still a good amount of money up front for a system that isn't generating any revenue yet.

The absolute lowest cost for a server at AWS can be through spot instances.  When I changed running the sims from an on-demand server to a spot instance server, my costs reduced dramatically. Could I leverage a spot server to run my test environment?

If you're not familiar with spot instances, Amazon makes their unused compute capacity available at a discounted price.  You set a maximum price you're willing to spend, and if the spot price is below your bid then your instance launches, and you pay whatever the current spot price is.  If the spot price rises above your bid, your instance is terminated.  You can't stop a spot instance, you can only terminate it.  But you can set up a persistent spot request that will re-launch your spot instance from the AMI once the price goes back below your bid.

The biggest problem with the persistent request for me was that the instance essentially resets itself every time it launches.  This wouldn't be horrible, but it would mean making sure that I created a new AMI every time I made a significant change to the environment, and if I was shut down then I'd have to roll back to my most recent AMI.  There had to be a better way.

And there was.  When you create a spot instance (or any instance, for that matter) you are presented with the following screen as you create your volumes:


Note the checkbox on the far right "Delete on Termination" - if you uncheck this, the volume will stick around even after the instance has been terminated.  With this orphaned volume, we can recreate our instance at the point it was shut down.

The price of an m3.medium spot instance right now is a few hundredths of a cent less per hour than an on-demand t1.micro, and quite a bit less than a heavy utilization m3.medium instance.  The catch is that your instance could be shut down if there is a spike in the spot price, so take caution using this technique on a production server that you don't want to disappear unexpectedly.  For a development server, it works fabulously, and if you set your spot price bid high enough you might even be able to sustain your instance as long as you like.

So, what I did is create a spot instance of type m3.medium with a $0.02 max bid (the current rate is $0.008 and hasn't gone up even to $0.009 since Amazon's last price reduction).  That means I should pay no more than $15 per month for this server, and if the spot price never goes above my bid then I could potentially have my m3.medium for about $5/month.  You can of course extrapolate this to any server type.

Back to the catch: if your server shuts down, it will leave its volume behind, but if you're like me and have tried to launch a new instance from a volume in this state you probably failed miserably.  I finally figured out that the reason for this failure is because I was using the default kernel, which didn't match my previous server.  So, find the kernel id for your spot instance:


and make a note of it.  I have it in the name of my Volume, i.e. "Dev-01-/dev/sda-aki-919dcaf8" - which will give me everything I need to know to launch a new instance from this volume.

To launch an instance with your volume after the system has been terminated, right-click on the volume and create a snapshot.



Now go to the snapshot list, find your snapshot you just created, and create an image from it:


When you create the image, you are given the opportunity to choose the kernel ID - it is important to use the same kernel ID you discovered in the above step.  This is why I always name my spot instance volumes with the kernel ID.


My spot instance actually has two volumes plus the instance store - the instance store is gone forever when the instance is terminated, I just use it for scratch space. To apply the second volume, you'll need to create a snapshot from it and then choose the snapshot when you set up the volumes.  Make sure you have it mounted to the same location if you have an /etc/fstab entry for it; again, that's why my volume naming convention contains the /dev/sda part.

After you've created the image, make a new spot request with that image.  Once you have logged in to the new instance and verified that everything is as it should be, you can delete the old volumes, delete the AMI, and delete the snapshot.

You might also combine this with my script to set a DNS entry on boot, which will keep your server's DNS record up to date.  The only gotcha here is that the host identification will change with the new image; depending on how you are connecting via SSH it should not be difficult to reset the identification.

Hope this helps and enjoy your no commitment discount m3.medium servers!

An automatic snapshot management script for AWS

I love using Amazon's servers. I grew up in this industry dealing with physical servers and they are a pain to deal with, especially if you ever need to migrate to a new piece of hardware or have a hard drive fail.  With Amazon's ability to snapshot your drives, you can quickly spin up a server that is identical to an existing one, or is from a snapshot from before you made that huge mistake that blew up your filesystem.

There is the rub: you need to have those snapshots, and if you are paranoid like me you need to have them taken regularly.  AWS doesn't really have a great facility to manage your snapshots.  What I wanted was a way to take a weekly snapshot of all of my drives, but only keep the snapshots for a month so as not to clutter my snapshot list.  This post is to share my script that I have written to manage these snapshots.

This is actually really a simple script.  I'm going to drop it in here, and then tell you how it works:


#!/bin/bash

export EC2_HOME=/opt/aws/apitools/ec2
export JAVA_HOME=/usr/lib/jvm/jre
export SNAPSHOT_LIST=/var/spool/snapshots

VOLUMES=$(/opt/aws/bin/ec2-describe-volumes | sed 's/ /-/g' | grep TAG | cut -f 3,5 --output-delimiter='|')
for line in ${VOLUMES//\\n/$cr}
do
VOLUME=`echo $line | cut -f 1 --delimiter='|'`
NAME=`echo $line | cut -f 2 --delimiter='|'`-`date "+%m-%d-%y"`
SNAP=`/opt/aws/bin/ec2-create-snapshot --description $NAME $VOLUME | cut -f 2`
echo $SNAP > $SNAPSHOT_LIST/$SNAP
echo $NAME snapshot to $SNAP
done

echo
echo

# Purge old snapshots
find $SNAPSHOT_LIST -ctime +30 -type f -execdir /usr/local/bin/delete_snapshot {} \;

First we're setting some variables needed by the amazon tools.  The third entry for SNAPSHOT_LIST is a folder that I'm going to use to keep track of my snapshots.  You'll need to create this folder and give access to the user you'll have run this script.

Next, I'm calling ec2-describe-volumes to retrieve all of the volumes in my EC2 area.  I'm replacing spaces with a hyphen, looking for the TAG line, and using cut to get the volume and volume name. One feature of this script is that it will only snapshot volumes that you have named - so if you have something temporary you can leave its volume unnamed.  You certainly could modify this script to do every volume, but you'd have to come up with some way to make the snapshots make sense as to where they came from.  Here, we are naming the snapshot after the volume name and stamping the date to the end of it.

Next, we execute ec2-create-snapshot to create the snapshot for our volume, and storing it in a file in our snapshot list folder.  We'll use this file list to see how old our snapshots are, which you see in the last line where we are finding any files in our snapshot list that are older than 30 days.  We're executing /usr/local/bin/delete_snapshot which is the second script in our system:


#!/bin/bash

# Receives a file that contains the snapshot-id that we want to delete

SNAP=`cat $1`
echo Deleting snapshot $SNAP
/opt/aws/bin/ec2-delete-snapshot $SNAP
rm $1

This one is pretty easy, we're grabbing the snapshot id from the file (which also happens to be the file name) and executing ec2-delete-snapshot to delete it.  Then we're removing the file.

I have this set to run every monday morning.  It does a very nice job of keeping one month's supply of snapshots should something catastrophic happen.

Have your AWS server set its own DNS on boot

One of the huge advantages of using Amazon's Web Services is the fact that you can turn virtual machines on and off, and only pay for the time that the instance is running.  This can be really cool if you need a staging server of some sort, and only need it to be running during the work day, or even just on the days you are working on a particular project.  One huge disadvantage to  this, though, is that you get a new IP address every time you boot an instance. Sure, you could use an elastic IP address, or you could manually set your DNS record each time you boot, but why pay for the IP address when you're not using it (Amazon charges you by the hour that it's not connected to an instance)? And why go through the trouble of setting your own DNS?  You have a server at your fingertips, let it grow up and take care of itself!

Unfortunately, there is not a Route 53 implementation in the command line tools that are automatically installed on the Amazon AMI.  There is, however, a great application by Barnaby Gray called cli53 which is on github here: https://github.com/barnybug/cli53.

Installation is painless via the python pip package management system.  Here is an install on a fresh Amazon AMI instance:

First, I'm installing pip (because it doesn't come automatically):

$ sudo yum install python-pip

Now we can install cli53:

$ sudo pip install cli53

That's it, it is now installed. In order for pip to run it needs to have either the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables set, or they need to be placed in a file located in ~/.boto. I'm using the boto method, so using your favorite text editor create a file that contains the following:

[Credentials]
aws_access_key_id = XXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXX

You can also set up your server in a role that has access to Route 53, and you won't have to manually set these credentials.

The next step is to install the script that will set your DNS entry for the server.  Amazon provides several URLs that you can request from your server that gives you information about your server.  We'll be using http://instance-data.ec2.internal/latest/meta-data/public-hostname which returns our public hostname entry.  We'll use this to add a CNAME entry to our Route53 DNS.


#!/bin/bash

# This script will set the DNS record for this server.  Run from rc.local to set on boot.
# Requires cli53 https://github.com/barnybug/cli53

# This is the domain we are updating
DOMAIN=example.com
# This is the subdomain
SUBDOMAIN=staging

# Obtain our public host name
PUBLIC_HOST_NAME="`wget -q -O - http://instance-data.ec2.internal/latest/meta-data/public-hostname || die \"wget public-hostname has failed: $?\"`"
test -n "$PUBLIC_HOST_NAME" || die 'cannot obtain public-hostname'

echo Setting $SUBDOMAIN.$DOMAIN to $PUBLIC_HOST_NAME
echo

cli53 rrcreate $DOMAIN $SUBDOMAIN CNAME $PUBLIC_HOST_NAME --ttl 300 --replace

Let me go over this line by line.  First we have two variables that we're setting: $DOMAIN and $SUBDOMAIN.  $DOMAIN is a domain that you have in Route53; it's the primary domain name, not the subdomain.  The $SUBDOMAIN is the actual subdomain element that you are wanting to set the CNAME for.  In my case, I'm using example.com and setting the subdomain staging, so my server will receive the DNS entry staging.example.com.

The next part queries http://instance-data.ec2.internal/latest/meta-data/public-hostname to get our public hostname.  This is the long public DNS entry that looks like ec2-X-X-X-X.compute-1.amazonaws.com, and we'll set our CNAME to this value.  We could just as easily get the IP address, but using the subdomain has the advantage of resolving to an internal IP address from another AWS instance, making communication between AWS instances available on the internal network.

Lastly, the cli53 command replaces the CNAME entry for this hostname. This will only work if you don't have an existing setting for this subdomain, or if the existing entry is already a CNAME; i.e. it  will fail if you have an A record.

Once you have the script ready, run it from the command line. You should see an output something like this:


Setting staging.example.com to ec2-X-X-X-X.compute-1.amazonaws.com

Success
ChangeInfo:
  Status: PENDING
  SubmittedAt: 2014-05-15T15:51:11.975Z
  Id: /change/C3LPBE20HC9ITN

If you now go into your Route53 console, you should see the new entry with a TTL of 5 minutes.

The next step is to get this script to execute every time the machine boots. Enter cron, which has a handly option to execute on boot.

Using your text editor, create a file at /etc/cron.d/set_route_53 with the following contents:


EC2_HOME=/opt/aws/apitools/ec2
JAVA_HOME=/usr/lib/jvm/jre

@reboot [user] /usr/local/bin/set_route_53

where [user] is the username you installed the .boto file in.  The environment variables at the top are required to have the environment variables accessible to the script, and those are variables that are used in the AWS command line utilities.

Now, reboot your machine.  If you have mail configured for the user running the cron job, you'll get an e-mail with the output of the command.  If you stop and then start your instance, you should see the Route53 entry update.  With the TTL set to 5 minutes, you should always be able to access your instance within 5 minutes of its boot being complete, and you don't have to lift another finger!