EMR Deployment
Requirements
Components
The following components (for EMR 5.28.1) must be selected when spinning up the cluster:
- Hadoop 2.8.5
- Hive 2.3.6
- Spark 2.4.4
- HBase 1.4.10
- Sqoop 1.4.7
- HCatalog 2.3.6
- Zookeeper 3.4.14
Node Services
Ensure that the following node services are running on the respective nodes:
- Master Node: Name Node, Resource Manager, Hive Servers, Application Timeline Server, Spark History Server and Zookeeper Server.
- Core Nodes: Data Nodes, Node Managers and Region Servers.
- EdgeNode: All Clients and IWX.
Edge Node
Infoworks requires a compute instance to set up as an edge node.
- This instance must be in the same subnet as the subnet of the EMR Cluster.
- This instance must have an IAM role associated with EMR (AmazonElasticMapReduceRole) and S3 (Amazon S3 Full Access).
- Note the Security Group ID of Edge Node that must be added as an inbound rule for master and slave security groups.
Ensure the following are available:
- Security Group with inbound rules for which traffic must be allowed internally with EMR and Infoworks edge node.
- EMR cluster master node Private IP address.
NOTES:
- Building marketplace solution in AWS is in progress and currently not available.
- Currently, only EMR 5.28.1 version is supported.
Prerequisites
Perform the following prerequisite procedures to configure EMR with multiple master nodes:
- Configuring External Kerberos Server for HA EMR
- Configuring External Hive Metastore Database for HA EMR