Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the AWS CLI tool. In this article, we will look specifically at running a Pig script.

Elastic MapReduce ruby client

Credentials

~/.aws/credentials.json

1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}

Create the job flow

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
elastic-mapreduce -v \
--create \
--name "Test Pig" \
--instance-group MASTER \
--bid-price 0.06 \
--instance-count 1 \
--instance-type m1.small \
--instance-group CORE \
--bid-price 0.06 \
--instance-count 2 \
--instance-type m1.small \
--pig-script "s3n://elasticmapreduce/samples/pig-apache/do-reports2.pig" \
--args "-p,INPUT=s3n://elasticmapreduce/samples/pig-apache/input,-p,OUTPUT=s3n://my-bucket/pig-apache/output" \
-c ~/.aws/credentials.json

Output

1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Steps.member.2.HadoopJarStep.Args.member.2=--base-path&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Instances.InstanceGroups.member.2.InstanceType=m1.small&Steps.member.2.HadoopJarStep.Args.member.9=s3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fpig-apache%2Fdo-reports2.pig&Steps.member.2.HadoopJarStep.Args.member.8=-f&Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar&Steps.member.2.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2Fpig-script&Steps.member.2.HadoopJarStep.Args.member.10=-p&Instances.InstanceGroups.member.2.InstanceCount=2&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.2.HadoopJarStep.Args.member.11=INPUT%3Ds3n%3A%2F%2Felasticmapreduce%2Fsamples%2Fpig-apache%2Finput&Steps.member.1.HadoopJarStep.Args.member.2=--base-path&Steps.member.2.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar&Instances.InstanceGroups.member.1.InstanceType=m1.small&AmiVersion=latest&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group&VisibleToAllUsers=false&Steps.member.2.ActionOnFailure=CANCEL_AND_WAIT&Instances.InstanceGroups.member.2.InstanceRole=CORE&SignatureMethod=HmacSHA256&Steps.member.2.HadoopJarStep.Args.member.5=latest&Instances.InstanceGroups.member.1.Market=SPOT&ContentType=JSON&Instances.InstanceGroups.member.1.BidPrice=0.06&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Steps.member.1.HadoopJarStep.Args.member.6=latest&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&Signature=%2BpIP%2FWvrGNUqt%2Bo0wWWtLvEHC7GzPUgRyrORDA9cYAI%3D&Steps.member.2.HadoopJarStep.Args.member.13=OUTPUT%3Ds3n%3A%2F%2Fmy-bucket%2Fpig-apache%2Foutput&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Instances.InstanceGroups.member.2.BidPrice=0.06&Instances.KeepJobFlowAliveWhenNoSteps=false&Steps.member.2.Name=Run%20Pig%20Script&Name=Test%20Pig&Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2Fpig-script&Steps.member.2.HadoopJarStep.Args.member.7=--args&Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW&Instances.InstanceGroups.member.2.Market=SPOT&Steps.member.2.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2F&Steps.member.2.HadoopJarStep.Args.member.4=--pig-versions&Steps.member.1.Name=Setup%20Pig&Steps.member.1.HadoopJarStep.Args.member.5=--pig-versions&Timestamp=2013-05-09T06%3A58%3A43%2B00%3A00&Steps.member.1.HadoopJarStep.Args.member.4=--install-pig&Instances.Ec2KeyName=my-key&Instances.TerminationProtected=false&Steps.member.2.HadoopJarStep.Args.member.12=-p&Steps.member.2.HadoopJarStep.Args.member.6=--run-pig-script&Steps.member.1.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2F&Action=RunJobFlow
Headers:
x-amzn-RequestIde8e143bf-91b4-400c-93de-9e7560a5500bHostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-9Q84YCFIZE4LV

Formatted Output

Output - Requesting URL

1
https://us-east-1.elasticmapreduce.amazonaws.com/

Output - Parameters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.BidPrice=0.06
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.small
Instances.InstanceGroups.member.1.Market=SPOT
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.BidPrice=0.06
Instances.InstanceGroups.member.2.InstanceCount=2
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.small
Instances.InstanceGroups.member.2.Market=SPOT
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=false
Instances.TerminationProtected=false
LogUri=s3n://my-bucket/hadoop/
Name=Test Pig
Signature=+pIP/WvrGNUqt+o0wWWtLvEHC7GzPUgRyrORDA9cYAI=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW
Steps.member.1.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/pig/pig-script
Steps.member.1.HadoopJarStep.Args.member.2=--base-path
Steps.member.1.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/pig/
Steps.member.1.HadoopJarStep.Args.member.4=--install-pig
Steps.member.1.HadoopJarStep.Args.member.5=--pig-versions
Steps.member.1.HadoopJarStep.Args.member.6=latest
Steps.member.1.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
Steps.member.1.Name=Setup Pig
Steps.member.2.ActionOnFailure=CANCEL_AND_WAIT
Steps.member.2.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/pig/pig-script
Steps.member.2.HadoopJarStep.Args.member.10=-p
Steps.member.2.HadoopJarStep.Args.member.11=INPUT=s3n://elasticmapreduce/samples/pig-apache/input
Steps.member.2.HadoopJarStep.Args.member.12=-p
Steps.member.2.HadoopJarStep.Args.member.13=OUTPUT=s3n://my-bucket/pig-apache/output
Steps.member.2.HadoopJarStep.Args.member.2=--base-path
Steps.member.2.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/pig/
Steps.member.2.HadoopJarStep.Args.member.4=--pig-versions
Steps.member.2.HadoopJarStep.Args.member.5=latest
Steps.member.2.HadoopJarStep.Args.member.6=--run-pig-script
Steps.member.2.HadoopJarStep.Args.member.7=--args
Steps.member.2.HadoopJarStep.Args.member.8=-f
Steps.member.2.HadoopJarStep.Args.member.9=s3n://elasticmapreduce/samples/pig-apache/do-reports2.pig
Steps.member.2.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
Steps.member.2.Name=Run Pig Script
Timestamp=2013-05-09T06:58:43+00:00
VisibleToAllUsers=false

Output - Headers

1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: e8e143bf-91b4-400c-93de-9e7560a5500b

Output - Non-verbose output

1
Created job flow j-9Q84YCFIZE4LV

API Request

Example API Request

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Test Pig
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.small
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=SPOT
&Instances.InstanceGroups.member.1.BidPrice=0.06
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.small
&Instances.InstanceGroups.member.2.InstanceCount=2
&Instances.InstanceGroups.member.2.Market=SPOT
&Instances.InstanceGroups.member.2.BidPrice=0.06
&Instances.KeepJobFlowAliveWhenNoSteps=false
&Instances.TerminationProtected=false
&Steps.member.1.Name=Setup Pig
&Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW
&Steps.member.1.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
&Steps.member.1.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/pig/pig-script
&Steps.member.1.HadoopJarStep.Args.member.2=--base-path
&Steps.member.1.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/pig/
&Steps.member.1.HadoopJarStep.Args.member.4=--install-pig
&Steps.member.1.HadoopJarStep.Args.member.5=--pig-versions
&Steps.member.1.HadoopJarStep.Args.member.6=latest
&Steps.member.2.Name=Run Pig Script
&Steps.member.2.ActionOnFailure=CANCEL_AND_WAIT
&Steps.member.2.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
&Steps.member.2.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/pig/pig-script
&Steps.member.2.HadoopJarStep.Args.member.2=--base-path
&Steps.member.2.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/pig/
&Steps.member.2.HadoopJarStep.Args.member.4=--pig-versions
&Steps.member.2.HadoopJarStep.Args.member.5=latest
&Steps.member.2.HadoopJarStep.Args.member.6=--run-pig-script
&Steps.member.2.HadoopJarStep.Args.member.7=--args
&Steps.member.2.HadoopJarStep.Args.member.8=-f
&Steps.member.2.HadoopJarStep.Args.member.9=s3n://elasticmapreduce/samples/pig-apache/do-reports2.pig
&Steps.member.2.HadoopJarStep.Args.member.10=-p
&Steps.member.2.HadoopJarStep.Args.member.11=INPUT=s3n://elasticmapreduce/samples/pig-apache/input
&Steps.member.2.HadoopJarStep.Args.member.12=-p
&Steps.member.2.HadoopJarStep.Args.member.13=OUTPUT=s3n://my-bucket/pig-apache/output
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=false
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
aws --region us-east-1 emr \
run-job-flow \
--name "Test Pig" \
--instances "{
    \"ec_2_key_name\": \"my-key\",
    \"instance_groups\": [
        {
            \"name\": \"Master Instance Group\",
            \"instance_role\": \"MASTER\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 1,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        },
        {
            \"name\": \"Core Instance Group\",
            \"instance_role\": \"CORE\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 2,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        }
    ],
    \"keep_job_flow_alive_when_no_steps\": false,
    \"termination_protected\": false
}" \
--steps "[
    {
        \"name\": \"Setup Pig\",
        \"action_on_failure\": \"TERMINATE_JOB_FLOW\",
        \"hadoop_jar_step\": {
            \"jar\": \"s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar\",
            \"args\": [
                \"s3://us-east-1.elasticmapreduce/libs/pig/pig-script\",
                \"--base-path\",
                \"s3://us-east-1.elasticmapreduce/libs/pig/\",
                \"--install-pig\",
                \"--pig-versions\",
                \"latest\"
            ]
        }
    },
    {
        \"name\": \"Run Pig Script\",
        \"action_on_failure\": \"CANCEL_AND_WAIT\",
        \"hadoop_jar_step\": {
            \"jar\": \"s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar\",
            \"args\": [
                \"s3://us-east-1.elasticmapreduce/libs/pig/pig-script\",
                \"--base-path\",
                \"s3://us-east-1.elasticmapreduce/libs/pig/\",
                \"--pig-versions\",
                \"latest\",
                \"--run-pig-script\",
                \"--args\",
                \"-f\",
                \"s3n://elasticmapreduce/samples/pig-apache/do-reports2.pig\",
                \"-p\",
                \"INPUT=s3n://elasticmapreduce/samples/pig-apache/input\",
                \"-p\",
                \"OUTPUT=s3n://my-bucket/pig-apache/output\"
            ]
        }
    }
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest"

Output

1
2
3
4
5
6
{
    "ResponseMetadata": {
        "RequestId": "869280b9-7e8e-43e9-a0ad-6d285ba28831"
    }, 
    "JobFlowId": "j-1I6JEYV4HCTD8"
}

Resources

Parts in this series