HowTo: Use AWS Autoscaling API to Deploy High Availability Clusters

One of the big draws of services such as scalr and rightscale is the ability to manage clusters of high availability servers. However, these services cost money in addition to the infrastructure costs associated with using AWS. For this HowTo you will need:

Amazon Auto Scaling Command Line Tools (How To Install)
Amazon Elastic Load Balancer API Tools (How To Install)

Here are the steps to do multi-az resilient deployments:

Setup

Create a Load Balancer

This can be created using Amazon’s AWS Management Console at https://console.aws.amazon.com/ec2/home?region=us-west-2#s=LoadBalancers

Add Zones To Load Balancer

Add all zones in the overall zone to your load balancer. The following example assumes there’s only one load balancer in the zone. Add | grep "_loadbalancername_" if there is more than one.

Console - user@hostname ~ $

1
2
3
4 elb-enable-zones-for-lb \
`elb-describe-lbs | awk '{ print $2 }'` \
--headers \
--availability-zones us-west-2a,us-west-2b

Create the Launch Configuration

Next, create the launch config. Every option is scriptable using grep and awk:

Console - user@hostname ~ $

1
2
3
4
5
6
7
8 as-create-launch-config \
--image-id ami-fa971aca \
--instance-type t1.micro \
--monitoring-enabled \
--kernel aki-c2e26ff2 \
--group sg-3666ef06 \
--user-data "Created by Launch Config test-lc-1" \
--launch-config test-lc-1

Create the Auto Scaling Group

Next, create the auto scaling group using the launch config and the load balancer you set up already:

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11 as-create-auto-scaling-group \
--availability-zones us-west-2a,us-west-2b \
--launch-configuration test-lc-1 \
--min-size 2 \
--max-size 4 \
--grace-period 240 \
--default-cooldown 320 \
--health-check-type ELB \
--load-balancers test-lb-1 \
--tag "k=name,v=test-asgroup-1,p=true" \
--auto-scaling-group test-asgroup-1

Create Scaling Policies

Next, create your scaling policies, that is, what happens when the health checks fail or traffic has slowed down after spinning up extra instances:

Scale Down

Notice the –adjustment=-1 part. The equal sign is required for negative values!

Console - user@hostname ~ $

1
2
3
4
5
6 as-put-scaling-policy \
--auto-scaling-group test-asgroup-1 \
--cooldown 300 \
--adjustment=-1 \
--type ChangeInCapacity \
--name test-1-policy-down

Scale Up

The equals sign is not required but would be consistent.

Console - user@hostname ~ $

1
2
3
4
5
6 as-put-scaling-policy \
--auto-scaling-group test-asgroup-1 \
--cooldown 300 \
--adjustment=1 \
--type ChangeInCapacity \
--name test-1-policy-up

Check Instances

As soon as you add that policy, you should see the instances start in the zones you specified and eventually the load balancer will show them. Check with:

Console - user@hostname ~ $

1 ec2-describe-instances | grep pending

Console - user@hostname ~ $

1 ec2-describe-instances | grep running

if you’re using a fast instance type/OS

Post Setup

Once those are up, you can assign your elastic load balancer a domain name using Route 53.

Redeployment

On to deployments!

When you’ve got to redeploy your cluster, it’s pretty simple.

New Launch Configuration

First, make a new launch config:

Console - user@hostname ~ $

1
2
3
4
5
6
7
8 as-create-launch-config \
--image-id ami-dc921fec \
--instance-type t1.micro \
--monitoring-enabled \
--kernel aki-c2e26ff2 \
--group sg-3666ef06 \
--user-data "Created by Launch Config test-lc-2" \
--launch-config test-lc-2

Note, the name needs to be unique. I am open to suggestions as to how to efficiently do this. I’ve been deleting the old Launch config when I make a third launch config, so I rotate between lc-1 and lc-2. A better solution is to use the date plus a random string, and never delete them so you could see the history, but that’s an advanced topic.

Update Scaling Group

Next, update your scaling group with the new launch config:

Console - user@hostname ~ $

1
2
3 as-update-auto-scaling-group \
--name test-asgroup-1 \
--launch-configuration test-lc-2

Note, the name is the same but the launch config changes, here

Remove Old Instances

Next, remove the old instances. You can script this too, but for example:

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9 as-set-instance-health \
i-5e8fde6e \
--status Unhealthy \
--no-respect-grace-period
sleep 120
as-set-instance-health \
i-747e463a \
--status Unhealthy \
--no-respect-grace-period

Note, this will shut off the old instances, and spin up replacements!

Do this at 1+ minute intervals to prevent loss of connection to the end user experience. Basically, wait until the new instance that is created (with your new AMI) is in the load balancer to run it on the next old instance you want to un-deploy.

Removal

To undeploy the entire scaling group:

Take Instances Offline

This takes all instances in this group offline:

Console - user@hostname ~ $

1
2
3
4 as-update-auto-scaling-group \
test-asgroup-1 \
--min-size 0 \
--max-size 0

Remove Group and Launch Configurations

Then remove the autoscaling and launch configs for this cluster:

Console - user@hostname ~ $

1
2
3 as-delete-auto-scaling-group test-asgroup-1
as-delete-launch-config test-lc-1
as-delete-launch-config test-lc-2

And that’s it.

Script to accomplish this in an automated fashion

I literally use this in a t1.micro production web server cluster, named “CLUSTER” on the AWS console:

deploy.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61 #!/bin/bash

#this script is provided as-is with no warranty implied or expressed.

#Use at your own risk!

 
CURDATE=`date +%d%m%Y`
 
## Get current instance ID:
CLUSTERID=`ec2-describe-instances |grep CLUSTER |awk '{ print $3 }'`
echo "Current CLUSTER AMI: ${CLUSTERID}"
 
##snapshot the instance without rebooting
NEWAMI=`ec2-create-image $CLUSTERID \
     --no-reboot \
     --name "CLUSTER-$CURDATE-deploy" \
     |awk '{ print $2 }'`
 
##Create launch config for this new AMI
as-create-launch-config CLUSTER-lc-$CURDATE \
     --image-id $NEWAMI \
     --instance-type t1.micro \
     --group CLUSTER-SG \
     --user-data "created by CLUSTER-asgroup"
 
##wait for the image to not be pending (this command will not show anything until the image is complete!)
TESTVAR=`ec2-describe-images |grep pending |wc -l |awk '{ print $1 }'`
sleep 10
 
##
until [[ ${TESTVAR} = "0" ]]; do
 
        TESTVAR=`ec2-describe-images |grep pending |wc -l |awk '{ print $1 }'`
        echo "Checking if the AMI snapshot is completed..."
        sleep 20
 
done
 
##update autoscaling group:
as-update-auto-scaling-group CLUSTER-asgroup \
     --launch-configuration CLUSTER-lc-$CURDATE
 
 
FNR=`ec2-describe-instances |grep "CLUSTER-asgroup"|awk '{ print $3 }'`
echo "Please wait up to 10 minutes for this section to complete! Seriously. Ten of them."
 
##
for i in $FNR; do
 
        as-set-instance-health $i --status Unhealthy --no-respect-grace-period
        echo "Sleeping for two minutes, please don't cancel this script..."
        sleep 120
        until [[ `as-describe-scaling-activities |grep InProgress \
             |wc -l |awk '{ print $1 }'` == "0" ]]; do
 
                echo "Waiting for new instance to finish deploying."
                sleep 10
 
        done
done