Secured EMR Cluster Deployment

Cluster Configuration

  • Select Security Configurations from Amazon EMR.
  • Provide a name in Name section.
  • Enable Enable at-rest encryption for EMRFS data in Amazon S3. Select SSE-S3.
  • Select Enable at-rest encryption for local disks and provide the respective key provider type. For example, select Key provider type as AWS KMS and AWS KMS customer master key as EMR/KMS.
  • Select Enable in-transit encryption, and provide the certificates.
  • Enable authentication by selecting the Enable Kerberos Authentication checkbox.
  • Select Authentication as External KDC, set the ticket lifetime and (optional) cross-realm between two different cluster KDCs.
  • Click Create.

Software and Steps

  • Select the EMR release version as 5.28.1 from drop-down list and select the required applications for Infoworks.
  • Select the Use multiple master nodes to improve cluster availability checkbox.
  • To enable EMRFS, select Enter Configuration under Edit software settings and add the following:
JSON
Copy

Hardware

The hardware blade includes the networking, AZ/Subnet and instance type sections.

  • Select Uniform instance groups for same type of instances and purchasing options for each node type. Other Option - instance fleets for different instance type (mostly used for Spot Instances type).
  • Select VPC from the drop-down list in the Network section and select the appropriate subnet in the Subnet section.
  • Root device EBS volume size - Set the root device volume from the range of 10-100 GB.
  • Configure the machine type of Master and Core nodes. Infoworks recommends minimum of m4.4xlarge for Master Node and m4.2xlarge for Core nodes and Task nodes.

Following are the recommended Minimum Machine type for Master, Core and Task nodes.

M4.4xlarge or similar of 32 vCPUs with 64GB of RAM for Master node.

M4.2xlarge or similar of 16 vCPUs with 32GB of RAM for Core and Task nodes.

  • Enable Cluster Scaling.

NOTE For setting the scaling metrics, see the AWS Documentation.

  • Click Next to proceed with General Cluster Configurations blade.

General Cluster Configurations

  • Provide the naming convention to the cluster in Cluster name section.
  • Termination protected - This option avoids accidental deletion of the instances.
  • Create tags for the resources in the Tags section.
  • In Script location, select the S3 location where the downloaded script is stored.
  • Click Next to proceed to the final blade, Security.

Security

  • In the Security Options section, select the respective EC2 key pair to access the cluster through SSH.
  • Select default for Permissions and EC2 Security Group to be automatically created/updated by EMR.
  • Enlarge Security Configuration to configure security to EMR cluster.
  • Select the name of security configuration as created in the Cluster Configuration section.
  • Enter the realm and password for the KDC.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard