EMR Deployment

Requirements

Components

The following components (for EMR 5.28.1) must be selected when spinning up the cluster:

  • Hadoop 2.8.5
  • Hive 2.3.6
  • Spark 2.4.4
  • HBase 1.4.10
  • Sqoop 1.4.7
  • HCatalog 2.3.6
  • Zookeeper 3.4.14

Node Services

Ensure that the following node services are running on the respective nodes:

  • Master Node: Name Node, Resource Manager, Hive Servers, Application Timeline Server, Spark History Server and Zookeeper Server.
  • Core Nodes: Data Nodes, Node Managers and Region Servers.
  • EdgeNode: All Clients and IWX.

Edge Node

Infoworks requires a compute instance to set up as an edge node.

  • This instance must be in the same subnet as the subnet of the EMR Cluster.
  • This instance must have an IAM role associated with EMR (AmazonElasticMapReduceRole) and S3 (Amazon S3 Full Access).
  • Note the Security Group ID of Edge Node that must be added as an inbound rule for master and slave security groups.

Ensure the following are available:

  • Security Group with inbound rules for which traffic must be allowed internally with EMR and Infoworks edge node.
  • EMR cluster master node Private IP address.

NOTES:

  • Building marketplace solution in AWS is in progress and currently not available.
  • Currently, only EMR 5.28.1 version is supported.

Prerequisites

Perform the following prerequisite procedures to configure EMR with multiple master nodes:

EMR Cluster Deployment

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard