Dowd and Associates

HowTo: AWS CLI Elastic MapReduce - Custom JAR Job Flow

Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the aws cli tool. In this article, we will look specifically at creating a job flow using a custom jar file.

Elastic MapReduce ruby client

Credentials

~/.credentials.json
1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}

Create Job Flow

Console - user@hostname ~ $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
elastic-mapreduce -v \
--create \
--name "Test custom JAR" \
--instance-group MASTER \
--bid-price 0.06 \
--instance-count 1 \
--instance-type m1.small \
--instance-group CORE \
--bid-price 0.06 \
--instance-count 2 \
--instance-type m1.small \
--jar s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar \
--arg s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br \
--arg s3n://elasticmapreduce/samples/cloudburst/input/100k.br \
--arg s3n://my-bucket/cloud \
--arg 36 \
--arg 3 \
--arg 0 \
--arg 1 \
--arg 240 \
--arg 48 \
--arg 24 \
--arg 24 \
--arg 128 \
--arg 16 \
-c ~/.credentials.json

edowd@development ~/bitbucket/edowd/howto-elastic-mapreduce/src $ ./ElasticMapReduce CustomJarJob

Output
1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Steps.member.1.HadoopJarStep.Args.member.7=1&Instances.KeepJobFlowAliveWhenNoSteps=false&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Steps.member.1.HadoopJarStep.Args.member.5=3&Steps.member.1.HadoopJarStep.Args.member.4=36&Instances.Ec2KeyName=my-key&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Instances.InstanceGroups.member.2.InstanceType=m1.small&Name=Test%20custom%20JAR&Steps.member.1.HadoopJarStep.Args.member.3=s3n%3A%2F%2Fmy-bucket%2Fcloud&Steps.member.1.HadoopJarStep.Jar=s3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fcloudburst%2Fcloudburst.jar&Steps.member.1.HadoopJarStep.Args.member.9=48&Instances.InstanceGroups.member.1.Market=SPOT&Timestamp=2013-05-16T00%3A10%3A56%2B00%3A00&Instances.InstanceGroups.member.1.BidPrice=0.06&Instances.InstanceGroups.member.2.Market=SPOT&VisibleToAllUsers=false&Steps.member.1.HadoopJarStep.Args.member.10=24&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Steps.member.1.HadoopJarStep.Args.member.8=240&Instances.InstanceGroups.member.2.InstanceRole=CORE&Instances.TerminationProtected=false&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.1.HadoopJarStep.Args.member.11=24&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT&Steps.member.1.Name=Example%20Jar%20Step&Steps.member.1.HadoopJarStep.Args.member.13=16&Instances.InstanceGroups.member.1.InstanceType=m1.small&ContentType=JSON&Steps.member.1.HadoopJarStep.Args.member.2=s3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fcloudburst%2Finput%2F100k.br&Signature=wRJJpbGJBTsm4dkAoCBzthsLriWoVwY9igX%2BrSp47dI%3D&Instances.InstanceGroups.member.2.InstanceCount=2&Action=RunJobFlow&Instances.InstanceGroups.member.2.BidPrice=0.06&Steps.member.1.HadoopJarStep.Args.member.1=s3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fcloudburst%2Finput%2Fs_suis.br&Steps.member.1.HadoopJarStep.Args.member.12=128&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&Steps.member.1.HadoopJarStep.Args.member.6=0&AmiVersion=latest&SignatureMethod=HmacSHA256&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group
Headers:
x-amzn-RequestId4e95ac98-20db-445f-b8ad-883f26b10007Hostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-NRW49YLR1JJEN

Formatted Output

Requesting URL
1
https://us-east-1.elasticmapreduce.amazonaws.com/
Parameters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.BidPrice=0.06
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.small
Instances.InstanceGroups.member.1.Market=SPOT
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.BidPrice=0.06
Instances.InstanceGroups.member.2.InstanceCount=2
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.small
Instances.InstanceGroups.member.2.Market=SPOT
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=false
Instances.TerminationProtected=false
LogUri=s3n://my-bucket/hadoop/
Name=Test custom JAR
Signature=wRJJpbGJBTsm4dkAoCBzthsLriWoVwY9igX+rSp47dI=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
Steps.member.1.HadoopJarStep.Args.member.1=s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br
Steps.member.1.HadoopJarStep.Args.member.10=24
Steps.member.1.HadoopJarStep.Args.member.11=24
Steps.member.1.HadoopJarStep.Args.member.12=128
Steps.member.1.HadoopJarStep.Args.member.13=16
Steps.member.1.HadoopJarStep.Args.member.2=s3n://elasticmapreduce/samples/cloudburst/input/100k.br
Steps.member.1.HadoopJarStep.Args.member.3=s3n://my-bucket/cloud
Steps.member.1.HadoopJarStep.Args.member.4=36
Steps.member.1.HadoopJarStep.Args.member.5=3
Steps.member.1.HadoopJarStep.Args.member.6=0
Steps.member.1.HadoopJarStep.Args.member.7=1
Steps.member.1.HadoopJarStep.Args.member.8=240
Steps.member.1.HadoopJarStep.Args.member.9=48
Steps.member.1.HadoopJarStep.Jar=s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar
Steps.member.1.Name=Example Jar Step
Timestamp=2013-05-16T00:10:56+00:00
VisibleToAllUsers=false
Headers
1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: 4e95ac98-20db-445f-b8ad-883f26b10007
Non-verbose output
1
Created job flow j-NRW49YLR1JJEN

API Request

Example API Request
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Test custom JAR
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.small
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=SPOT
&Instances.InstanceGroups.member.1.BidPrice=0.06
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.small
&Instances.InstanceGroups.member.2.InstanceCount=2
&Instances.InstanceGroups.member.2.Market=SPOT
&Instances.InstanceGroups.member.2.BidPrice=0.06
&Instances.KeepJobFlowAliveWhenNoSteps=false
&Instances.TerminationProtected=false
&Steps.member.1.Name=Example Jar Step
&Steps.member.1.ActionOnFailure=CANCEL_AND_WAIT
&Steps.member.1.HadoopJarStep.Jar=s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar
&Steps.member.1.HadoopJarStep.Args.member.1=s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br
&Steps.member.1.HadoopJarStep.Args.member.2=s3n://elasticmapreduce/samples/cloudburst/input/100k.br
&Steps.member.1.HadoopJarStep.Args.member.3=s3n://my-bucket/cloud
&Steps.member.1.HadoopJarStep.Args.member.4=36
&Steps.member.1.HadoopJarStep.Args.member.5=3
&Steps.member.1.HadoopJarStep.Args.member.6=0
&Steps.member.1.HadoopJarStep.Args.member.7=1
&Steps.member.1.HadoopJarStep.Args.member.8=240
&Steps.member.1.HadoopJarStep.Args.member.9=48
&Steps.member.1.HadoopJarStep.Args.member.10=24
&Steps.member.1.HadoopJarStep.Args.member.11=24
&Steps.member.1.HadoopJarStep.Args.member.12=128
&Steps.member.1.HadoopJarStep.Args.member.13=16
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=false
&AUTHPARAMS

AWS CLI

Console - user@hostname ~ $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
aws --region us-east-1 emr run-job-flow \
--name "Test custom JAR" \
--instances "{
    \"ec_2_key_name\": \"my-key\",
    \"instance_groups\": [
        {
            \"name\": \"Master Instance Group\",
            \"instance_role\": \"MASTER\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 1,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        },
        {
            \"name\": \"Core Instance Group\",
            \"instance_role\": \"CORE\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 2,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        }
    ],
    \"keep_job_flow_alive_when_no_steps\": false,
    \"termination_protected\": false
}" \
--steps "[
    {
        \"name\": \"Example Jar Step\",
        \"action_on_failure\": \"CANCEL_AND_WAIT\",
        \"hadoop_jar_step\": {
            \"jar\": \"s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar\",
            \"args\": [
                \"s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br\",
                \"s3n://elasticmapreduce/samples/cloudburst/input/100k.br\",
                \"s3n://my-bucket/cloud\",
                \"36\",
                \"3\",
                \"0\",
                \"1\",
                \"240\",
                \"48\",
                \"24\",
                \"24\",
                \"128\",
                \"16\"
            ]
        }
    }
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest"
Output
1
2
3
4
5
6
{
    "ResponseMetadata": {
        "RequestId": "a2a9fe89-82e5-4bd6-be65-f243e37826cb"
    },
    "JobFlowId": "j-8VY74ZRDWZ7O8"
}

Resources

Parts in this series