Prerequisites

On-premise

  • High configuration machine must be available for edge node to run the jobs faster.
  • Hadoop, Hive, Spark2 and HBase must be installed and must be running in the cluster.
  • Minimum disk space of 50GB must be available.
  • Java 8 must be installed on the cluster.

Hortonworks

If security like Ranger and Kerberos is enabled, following are the prerequisites:

  • Kerberos: A Principal and Keytab must be created for the <Infoworks_User> and the Keytab must be available in the edge node.
  • Ranger: <Infoworks_User> must have permissions to Hadoop policy for the /user/<Infoworks_User> directory.

Supported version: HDP-2.6.4.0

Cloudera

If security is enabled like Sentry and Kerberos, following are the prerequisites:

  • Kerberos: A principal and Keytab should be created for the <Infoworks_User> and the Keytab must be available in the edge node.
  • Sentry: <Infoworks_User> must have permissions to Hive policy.

Supported Version: CDH 5.13.0

MapR

If security is enabled, perform the following on the edge node to generate a ticket from the <Infoworks_User> terminal:

  • Run the maprlogin password command.
  • Enter the password of <Infoworks_User> when prompted.

Supported Version: MapR 6.0.1.20180404222005.GA

Cloud

GCP

  • Quota limit for CPU cores must be greater than 72 for the region that Infoworks is spinning up.
  • API must be enabled for the DataProc, Compute Engine, Deployment Manager and Runtime Configuration services.

Microsoft Azure

  • Quota limit for CPU cores must be greater than 60 for the region that Infoworks is spinning up.
  • ADL storage, if used, must be created in a resource group before spinning up Infoworks.
  • Vnet must be created in a resource group before spinning up Infoworks.

EMR

EMR Version: 5.17

Components

The following components (for EMR 5.17) must be selected when spinning up the cluster:

  • Hadoop 2.8.4
  • HBase 1.4.6
  • Hive and HCatalog 2.3.3
  • Spark 2.3.1
  • Tez 0.8.4
  • Zookeeper 3.4.12

Node Services

Ensure that the following node services are running on the respective nodes:

  • Master Node: Name Node, Resource Manager, Hive Servers, Application Timeline Server, Spark History Server and Zookeeper Server.
  • Core Nodes: Data Nodes, Node Managers and Region Servers.
  • EdgeNode: All Clients and IWX.

Edge Node

Infoworks requires a compute instance to set up as an edge node.

  • This instance must be in the same subnet as the subnet of the EMR Cluster.
  • This instance must have an IAM role associated with EMR and S3.
  • If security is enabled, Infoworks requires principal and keytab for the user, which is used to install and run Infoworks.

Additional Information

  • User must provide AWS account ID for Infoworks to provide access to the edge node image.
  • User must provide Private DNS of the master node of the EMR cluster.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard