Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the AWS CLI tool. In this article, we will look specifically at running a Cascading cluster.

Elastic MapReduce ruby client

Credentials

~/.aws/credentials.json

1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}

Create the job flow

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
elastic-mapreduce -v \
--create \
--name "Test Cascading" \
--instance-group MASTER \
--bid-price 0.06 \
--instance-count 1 \
--instance-type m1.small \
--instance-group CORE \
--bid-price 0.06 \
--instance-count 2 \
--instance-type m1.small \
--jar "s3n://elasticmapreduce/samples/cloudfront/logprocessor.jar" \
--args \
"-input","s3n://elasticmapreduce/samples/cloudfront/input",\
"-start","any",\
"-end","2010-12-27-02 300",\
"-output","s3n://my-bucket/cloudfront/output/2010-12-27-02",\
"-overallVolumeReport",\
"-objectPopularityReport",\
"-clientIPReport",\
"-edgeLocationReport" \
-c ~/.aws/credentials.json

Output

1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Steps.member.1.HadoopJarStep.Args.member.7=-output&Instances.KeepJobFlowAliveWhenNoSteps=false&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Steps.member.1.HadoopJarStep.Args.member.5=-end&Steps.member.1.HadoopJarStep.Args.member.4=any&Instances.Ec2KeyName=my-key&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Instances.InstanceGroups.member.2.InstanceType=m1.small&Name=Test%20Cascading&Steps.member.1.HadoopJarStep.Args.member.3=-start&Steps.member.1.HadoopJarStep.Jar=s3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fcloudfront%2Flogprocessor.jar&Steps.member.1.HadoopJarStep.Args.member.9=-overallVolumeReport&Instances.InstanceGroups.member.1.Market=SPOT&Timestamp=2013-05-16T00%3A28%3A39%2B00%3A00&Instances.InstanceGroups.member.1.BidPrice=0.06&Instances.InstanceGroups.member.2.Market=SPOT&VisibleToAllUsers=false&Steps.member.1.HadoopJarStep.Args.member.10=-objectPopularityReport&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Steps.member.1.HadoopJarStep.Args.member.8=s3n%3A%2F%2Fmy-bucket%2Fcloudfront%2Foutput%2F2010-12-27-02&Instances.InstanceGroups.member.2.InstanceRole=CORE&Instances.TerminationProtected=false&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.1.HadoopJarStep.Args.member.11=-clientIPReport&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT&Steps.member.1.Name=Example%20Jar%20Step&Instances.InstanceGroups.member.1.InstanceType=m1.small&ContentType=JSON&Steps.member.1.HadoopJarStep.Args.member.2=s3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fcloudfront%2Finput&Signature=BpHRNVUCIPnfi%2B8rLQEpdr3chl7Bjiw5AOh4GZzChbs%3D&Instances.InstanceGroups.member.2.InstanceCount=2&Action=RunJobFlow&Instances.InstanceGroups.member.2.BidPrice=0.06&Steps.member.1.HadoopJarStep.Args.member.1=-input&Steps.member.1.HadoopJarStep.Args.member.12=-edgeLocationReport&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&Steps.member.1.HadoopJarStep.Args.member.6=2010-12-27-02%20300&AmiVersion=latest&SignatureMethod=HmacSHA256&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group
Headers:
x-amzn-RequestIdc96170a8-24d5-41fd-bd68-4a32cc7cf85dHostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-0G438CH39THZW

Formatted Output

Output - Requesting URL

1
https://us-east-1.elasticmapreduce.amazonaws.com/

Output - Parameters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.BidPrice=0.06
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.small
Instances.InstanceGroups.member.1.Market=SPOT
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.BidPrice=0.06
Instances.InstanceGroups.member.2.InstanceCount=2
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.small
Instances.InstanceGroups.member.2.Market=SPOT
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=false
Instances.TerminationProtected=false
LogUri=s3n://my-bucket/hadoop/
Name=Test Cascading
Signature=BpHRNVUCIPnfi+8rLQEpdr3chl7Bjiw5AOh4GZzChbs=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
Steps.member.1.HadoopJarStep.Args.member.1=-input
Steps.member.1.HadoopJarStep.Args.member.10=-objectPopularityReport
Steps.member.1.HadoopJarStep.Args.member.11=-clientIPReport
Steps.member.1.HadoopJarStep.Args.member.12=-edgeLocationReport
Steps.member.1.HadoopJarStep.Args.member.2=s3n://elasticmapreduce/samples/cloudfront/input
Steps.member.1.HadoopJarStep.Args.member.3=-start
Steps.member.1.HadoopJarStep.Args.member.4=any
Steps.member.1.HadoopJarStep.Args.member.5=-end
Steps.member.1.HadoopJarStep.Args.member.6=2010-12-27-02 300
Steps.member.1.HadoopJarStep.Args.member.7=-output
Steps.member.1.HadoopJarStep.Args.member.8=s3n://my-bucket/cloudfront/output/2010-12-27-02
Steps.member.1.HadoopJarStep.Args.member.9=-overallVolumeReport
Steps.member.1.HadoopJarStep.Jar=s3n://elasticmapreduce/samples/cloudfront/logprocessor.jar
Steps.member.1.Name=Example Jar Step
Timestamp=2013-05-16T00:28:39+00:00
VisibleToAllUsers=false

Output - Headers

1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: c96170a8-24d5-41fd-bd68-4a32cc7cf85d

Output - Non-verbose output

1
Created job flow j-0G438CH39THZW

API Request

Example API Request

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Test Cascading
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.small
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=SPOT
&Instances.InstanceGroups.member.1.BidPrice=0.06
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.small
&Instances.InstanceGroups.member.2.InstanceCount=2
&Instances.InstanceGroups.member.2.Market=SPOT
&Instances.InstanceGroups.member.2.BidPrice=0.06
&Instances.KeepJobFlowAliveWhenNoSteps=false
&Instances.TerminationProtected=false
&Steps.member.1.Name=Example Jar Step
&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
&Steps.member.1.HadoopJarStep.Jar=s3n://elasticmapreduce/samples/cloudfront/logprocessor.jar
&Steps.member.1.HadoopJarStep.Args.member.1=-input
&Steps.member.1.HadoopJarStep.Args.member.2=s3n://elasticmapreduce/samples/cloudfront/input
&Steps.member.1.HadoopJarStep.Args.member.3=-start
&Steps.member.1.HadoopJarStep.Args.member.4=any
&Steps.member.1.HadoopJarStep.Args.member.5=-end
&Steps.member.1.HadoopJarStep.Args.member.6=2010-12-27-02 300
&Steps.member.1.HadoopJarStep.Args.member.7=-output
&Steps.member.1.HadoopJarStep.Args.member.8=s3n://my-bucket/cloudfront/output/2010-12-27-02
&Steps.member.1.HadoopJarStep.Args.member.9=-overallVolumeReport
&Steps.member.1.HadoopJarStep.Args.member.10=-objectPopularityReport
&Steps.member.1.HadoopJarStep.Args.member.11=-clientIPReport
&Steps.member.1.HadoopJarStep.Args.member.12=-edgeLocationReport
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=false
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
aws --region us-east-1 emr \
run-job-flow \
--name "Test Cascading" \
--instances "{
    \"ec_2_key_name\": \"my-key\",
    \"instance_groups\": [
        {
            \"name\": \"Master Instance Group\",
            \"instance_role\": \"MASTER\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 1,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        },
        {
            \"name\": \"Core Instance Group\",
            \"instance_role\": \"CORE\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 2,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        }
    ],
    \"keep_job_flow_alive_when_no_steps\": false,
    \"termination_protected\": false
}" \
--steps "[
    {
        \"name\": \"Example Jar Step\",
        \"action_on_failure\": \"CANCEL_AND_WAIT\",
        \"hadoop_jar_step\": {
            \"jar\": \"s3n://elasticmapreduce/samples/cloudfront/logprocessor.jar\",
            \"args\": [
                \"-input\",
                \"s3n://elasticmapreduce/samples/cloudfront/input\",
                \"-start\",
                \"any\",
                \"-end\",
                \"2010-12-27-02 300\",
                \"-output\",
                \"s3n://my-bucket/cloudfront/output/2010-12-27-02\",
                \"-overallVolumeReport\",
                \"-objectPopularityReport\",
                \"-clientIPReport\",
                \"-edgeLocationReport\"
            ]
        }
    }
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest"

Output

1
2
3
4
5
6
{
    "ResponseMetadata": {
        "RequestId": "b1c37304-8d77-42d3-a678-97518e3dc3b1"
    }, 
    "JobFlowId": "j-Y0KOFGCVBPO87"
}

Resources

Parts in this series

Comments