Through this series we see how to extract information from the Elastic MapReduce ruby client, and use it to create the same command using the AWS CLI tool. In this article, we will look specifically at running an interactive pig session.

Credentials

~/.aws/credentials.json

1
2
3
4
5
6
7
8
{
"access_id": "C99F5C7EE00F1EXAMPLE",
"private_key": "a63xWEj9ZFbigxqA7wI3Nuwj3mte3RDBdEXAMPLE",
"keypair": "my-key",
"key-pair-file": "~/.ssh/my-key.pem",
"log_uri": "s3n://my-bucket/hadoop/",
"region": "us-east-1"
}

Start cluster

Elastic MapReduce Ruby Client

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
elastic-mapreduce -v \
--create \
--name "Interactive Pig" \
--alive \
--instance-group MASTER \
--bid-price 0.06 \
--instance-count 1 \
--instance-type m1.small \
--instance-group CORE \
--bid-price 0.06 \
--instance-count 2 \
--instance-type m1.small \
--pig-interactive \
--visible-to-all-users \
-c ~/.aws/credentials.json

Output

1
2
3
4
5
6
7
Requesting URL:
https://us-east-1.elasticmapreduce.amazonaws.com/
Query string:
Instances.KeepJobFlowAliveWhenNoSteps=true&LogUri=s3n%3A%2F%2Fmy-bucket%2Fhadoop%2F&Steps.member.1.HadoopJarStep.Args.member.5=--pig-versions&Steps.member.1.HadoopJarStep.Args.member.4=--install-pig&Instances.Ec2KeyName=my-key&Instances.InstanceGroups.member.1.InstanceRole=MASTER&Instances.InstanceGroups.member.2.InstanceType=m1.small&Name=Interactive%20Pig&Steps.member.1.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2F&Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar&Instances.InstanceGroups.member.1.Market=SPOT&Timestamp=2013-05-15T23%3A00%3A39%2B00%3A00&Instances.InstanceGroups.member.1.BidPrice=0.06&Instances.InstanceGroups.member.2.Market=SPOT&VisibleToAllUsers=true&SignatureVersion=2&AWSAccessKeyId=C99F5C7EE00F1EXAMPLE&Instances.InstanceGroups.member.2.InstanceRole=CORE&Instances.TerminationProtected=false&Instances.InstanceGroups.member.1.InstanceCount=1&Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW&Steps.member.1.Name=Setup%20Pig&Instances.InstanceGroups.member.1.InstanceType=m1.small&ContentType=JSON&Steps.member.1.HadoopJarStep.Args.member.2=--base-path&Signature=6Dy2%2BAiRbk6Hq8RGyoKb1imcy94Xm9ESEzN1jEH7FVc%3D&Instances.InstanceGroups.member.2.InstanceCount=2&Action=RunJobFlow&Instances.InstanceGroups.member.2.BidPrice=0.06&Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2Fpig-script&Instances.InstanceGroups.member.1.Name=Master%20Instance%20Group&Steps.member.1.HadoopJarStep.Args.member.6=latest&AmiVersion=latest&SignatureMethod=HmacSHA256&Instances.InstanceGroups.member.2.Name=Core%20Instance%20Group
Headers: 
x-amzn-RequestId1a1b9c8c-3b5c-4ef2-bf20-249c4b7c4fdaHostus-east-1.elasticmapreduce.amazonaws.com:443User-Agentruby-client
Created job flow j-WWF2N0603H0D9

Formatted Output

Output - Requesting URL

1
https://us-east-1.elasticmapreduce.amazonaws.com/

Output - Parameters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
AWSAccessKeyId=C99F5C7EE00F1EXAMPLE
Action=RunJobFlow
AmiVersion=latest
ContentType=JSON
Instances.Ec2KeyName=my-key
Instances.InstanceGroups.member.1.BidPrice=0.06
Instances.InstanceGroups.member.1.InstanceCount=1
Instances.InstanceGroups.member.1.InstanceRole=MASTER
Instances.InstanceGroups.member.1.InstanceType=m1.small
Instances.InstanceGroups.member.1.Market=SPOT
Instances.InstanceGroups.member.1.Name=Master Instance Group
Instances.InstanceGroups.member.2.BidPrice=0.06
Instances.InstanceGroups.member.2.InstanceCount=2
Instances.InstanceGroups.member.2.InstanceRole=CORE
Instances.InstanceGroups.member.2.InstanceType=m1.small
Instances.InstanceGroups.member.2.Market=SPOT
Instances.InstanceGroups.member.2.Name=Core Instance Group
Instances.KeepJobFlowAliveWhenNoSteps=true
Instances.TerminationProtected=false
LogUri=s3n://my-bucket/hadoop/
Name=Interactive Pig
Signature=6Dy2+AiRbk6Hq8RGyoKb1imcy94Xm9ESEzN1jEH7FVc=
SignatureMethod=HmacSHA256
SignatureVersion=2
Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW
Steps.member.1.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/pig/pig-script
Steps.member.1.HadoopJarStep.Args.member.2=--base-path
Steps.member.1.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/pig/
Steps.member.1.HadoopJarStep.Args.member.4=--install-pig
Steps.member.1.HadoopJarStep.Args.member.5=--pig-versions
Steps.member.1.HadoopJarStep.Args.member.6=latest
Steps.member.1.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
Steps.member.1.Name=Setup Pig
Timestamp=2013-05-15T23:00:39+00:00
VisibleToAllUsers=true

Output - Headers

1
2
3
Host: us-east-1.elasticmapreduce.amazonaws.com:443
User-Agent: ruby-client
x-amzn-RequestId: 1a1b9c8c-3b5c-4ef2-bf20-249c4b7c4fda

Output - Non-verbose output

1
Created job flow j-WWF2N0603H0D9

API Request

Example API Request

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=RunJobFlow
&Name=Interactive Pig
&Instances.Ec2KeyName=my-key
&Instances.InstanceGroups.member.1.Name=Master Instance Group
&Instances.InstanceGroups.member.1.InstanceRole=MASTER
&Instances.InstanceGroups.member.1.InstanceType=m1.small
&Instances.InstanceGroups.member.1.InstanceCount=1
&Instances.InstanceGroups.member.1.Market=SPOT
&Instances.InstanceGroups.member.1.BidPrice=0.06
&Instances.InstanceGroups.member.2.Name=Core Instance Group
&Instances.InstanceGroups.member.2.InstanceRole=CORE
&Instances.InstanceGroups.member.2.InstanceType=m1.small
&Instances.InstanceGroups.member.2.InstanceCount=2
&Instances.InstanceGroups.member.2.Market=SPOT
&Instances.InstanceGroups.member.2.BidPrice=0.06
&Instances.KeepJobFlowAliveWhenNoSteps=true
&Instances.TerminationProtected=false
&Steps.member.1.Name=Setup Pig
&Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW
&Steps.member.1.HadoopJarStep.Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
&Steps.member.1.HadoopJarStep.Args.member.1=s3://us-east-1.elasticmapreduce/libs/pig/pig-script
&Steps.member.1.HadoopJarStep.Args.member.2=--base-path
&Steps.member.1.HadoopJarStep.Args.member.3=s3://us-east-1.elasticmapreduce/libs/pig/
&Steps.member.1.HadoopJarStep.Args.member.4=--install-pig
&Steps.member.1.HadoopJarStep.Args.member.5=--pig-versions
&Steps.member.1.HadoopJarStep.Args.member.6=latest
&LogUri=s3n://my-bucket/hadoop/
&AmiVersion=latest
&VisibleToAllUsers=true
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
aws --region us-east-1 emr \
run-job-flow \
--name "Interactive Pig" \
--instances "{
    \"ec_2_key_name\": \"my-key\",
    \"instance_groups\": [
        {
            \"name\": \"Master Instance Group\",
            \"instance_role\": \"MASTER\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 1,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        },
        {
            \"name\": \"Core Instance Group\",
            \"instance_role\": \"CORE\",
            \"instance_type\": \"m1.small\",
            \"instance_count\": 2,
            \"market\": \"SPOT\",
            \"bid_price\": \"0.06\"
        }
    ],
    \"keep_job_flow_alive_when_no_steps\": true,
    \"termination_protected\": false
}" \
--steps "[
    {
        \"name\": \"Setup Pig\",
        \"action_on_failure\": \"TERMINATE_JOB_FLOW\",
        \"hadoop_jar_step\": {
            \"jar\": \"s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar\",
            \"args\": [
                \"s3://us-east-1.elasticmapreduce/libs/pig/pig-script\",
                \"--base-path\",
                \"s3://us-east-1.elasticmapreduce/libs/pig/\",
                \"--install-pig\",
                \"--pig-versions\",
                \"latest\"
            ]
        }
    }
]" \
--log-uri "s3n://my-bucket/hadoop/" \
--ami-version "latest"

Output

1
2
3
4
5
6
{
    "ResponseMetadata": {
        "RequestId": "9c8dbce7-bdb9-11e2-965a-07fb1be53dc4"
    },
    "JobFlowId": "j-3TYHC7VKXA235"
}

Describe Cluster

Elastic MapReduce Ruby Client

Console - user@hostname ~ $

1
elastic-mapreduce --describe j-3TYHC7VKXA235

API Request

Example API Request

1
2
3
4
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=DescribeJobFlows
&JobFlowIds.member.1=j-3TYHC7VKXA235
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3TYHC7VKXA235\"]"

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
{
    "JobFlows": [
        {
            "Name": "Interactive Pig", 
            "BootstrapActions": [], 
            "Instances": {
                "InstanceCount": 3, 
                "Placement": {
                    "AvailabilityZone": "us-east-1e"
                }, 
                "MasterPublicDnsName": "ec2-23-23-54-39.compute-1.amazonaws.com", 
                "NormalizedInstanceHours": 0, 
                "MasterInstanceId": "i-062c1b6f", 
                "InstanceGroups": [
                    {
                        "ReadyDateTime": "2013-05-15T23:56:30Z", 
                        "InstanceType": "m1.small", 
                        "InstanceRole": "MASTER", 
                        "InstanceRunningCount": 1, 
                        "State": "RUNNING", 
                        "BidPrice": "0.06", 
                        "Market": "SPOT", 
                        "StartDateTime": "2013-05-15T23:54:39Z", 
                        "InstanceGroupId": "ig-3A46UIUD1WED7", 
                        "CreationDateTime": "2013-05-15T23:46:06Z", 
                        "InstanceRequestCount": 1, 
                        "LastStateChangeReason": "", 
                        "Name": "Master Instance Group"
                    }, 
                    {
                        "ReadyDateTime": "2013-05-15T23:56:43Z", 
                        "InstanceType": "m1.small", 
                        "InstanceRole": "CORE", 
                        "InstanceRunningCount": 2, 
                        "State": "RUNNING", 
                        "BidPrice": "0.06", 
                        "Market": "SPOT", 
                        "StartDateTime": "2013-05-15T23:56:43Z", 
                        "InstanceGroupId": "ig-OWLA04KCES02", 
                        "CreationDateTime": "2013-05-15T23:46:06Z", 
                        "InstanceRequestCount": 2, 
                        "LastStateChangeReason": "", 
                        "Name": "Core Instance Group"
                    }
                ], 
                "MasterInstanceType": "m1.small", 
                "TerminationProtected": false, 
                "HadoopVersion": "1.0.3", 
                "KeepJobFlowAliveWhenNoSteps": true, 
                "SlaveInstanceType": "m1.small", 
                "Ec2KeyName": "my-key"
            }, 
            "Steps": [
                {
                    "ExecutionStatusDetail": {
                        "State": "COMPLETED", 
                        "EndDateTime": "2013-05-15T23:57:47Z", 
                        "CreationDateTime": "2013-05-15T23:46:06Z", 
                        "StartDateTime": "2013-05-15T23:56:42Z"
                    }, 
                    "StepConfig": {
                        "HadoopJarStep": {
                            "Args": [
                                "s3://us-east-1.elasticmapreduce/libs/pig/pig-script", 
                                "--base-path", 
                                "s3://us-east-1.elasticmapreduce/libs/pig/", 
                                "--install-pig", 
                                "--pig-versions", 
                                "latest"
                            ], 
                            "Jar": "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar", 
                            "Properties": []
                        }, 
                        "Name": "Setup Pig", 
                        "ActionOnFailure": "TERMINATE_JOB_FLOW"
                    }
                }
            ], 
            "ExecutionStatusDetail": {
                "State": "WAITING", 
                "ReadyDateTime": "2013-05-15T23:56:43Z", 
                "CreationDateTime": "2013-05-15T23:46:06Z", 
                "StartDateTime": "2013-05-15T23:56:43Z", 
                "LastStateChangeReason": "Waiting after step completed"
            }, 
            "VisibleToAllUsers": false, 
            "JobFlowId": "j-3TYHC7VKXA235", 
            "LogUri": "s3n://my-bucket/hadoop/", 
            "AmiVersion": "2.3.5", 
            "SupportedProducts": []
        }
    ], 
    "ResponseMetadata": {
        "RequestId": "936faeca-bdbb-11e2-8815-b3eb27409c27"
    }
}

Connect to Master

Wait until the execution state is WAITING

Console - user@hostname ~ $

1
2
3
4
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3TYHC7VKXA235\"]" \
| jq -r '.JobFlows[0].ExecutionStatusDetail.State'

Output

1
WAITING

Get the master public DNS name

Console - user@hostname ~ $

1
2
3
4
aws --region us-east-1 emr \
describe-job-flows \
--job-flow-ids "[\"j-3TYHC7VKXA235\"]" \
| jq -r '.JobFlows[0].Instances.MasterPublicDnsName'

Output

1
ec2-204-236-247-160.compute-1.amazonaws.com

SSH to the master using the SSH key specified when starting the cluster and with the username hadoop.

Console - user@hostname ~ $

1
ssh -i ~/.ssh/my-key.pem hadoop@ec2-204-236-247-160.compute-1.amazonaws.com

Run pig on the master for our interactive session.

Console - hadoop@master ~ $

1
pig

Terminate Cluster

Elastic MapReduce Ruby Client

Console - user@hostname ~ $

1
elastic-mapreduce --terminate j-3TYHC7VKXA235

API Request

Example API Request

1
2
3
4
https://us-east-1.elasticmapreduce.amazonaws.com/
?Action=TerminateJobFlows
&JobFlowIds.member.1=j-3TYHC7VKXA235
&*AUTHPARAMS*

AWS CLI

Console - user@hostname ~ $

1
2
3
aws --region us-east-1 emr \
terminate-job-flows \
--job-flow-ids "[\"j-3TYHC7VKXA235\"]"

Output

1
2
3
4
5
{
    "ResponseMetadata": {
        "RequestId": "c4b9efa2-bdbb-11e2-b959-a99f5a815d16"
    }
}

Parts in this series