Prerequisites

On-premise

  • High configuration machine must be available for edge node to run the jobs faster.
  • Hadoop, Hive, Spark2 and HBase must be installed and must be running in the cluster.
  • Minimum disk space of 50GB must be available.
  • Java 8 must be installed on the cluster.

Hortonworks

If security like Ranger and Kerberos is enabled, following are the prerequisites:

  • Kerberos: A Principal and Keytab must be created for the <Infoworks_User> and the Keytab must be available in the edge node.
  • Ranger: <Infoworks_User> must have permissions to Hadoop policy for the /user/<Infoworks_User> directory.

Supported version: HDP-2.6.4.0

Cloudera

If security is enabled like Sentry and Kerberos, following are the prerequisites:

  • Kerberos: A principal and Keytab should be created for the <Infoworks_User> and the Keytab must be available in the edge node.
  • Sentry: <Infoworks_User> must have permissions to Hive policy.

Supported Version: CDH 5.13.0

MapR

If security is enabled, perform the following on the edge node to generate a ticket from the <Infoworks_User> terminal:

  • Run the maprlogin password command.
  • Enter the password of <Infoworks_User> when prompted.

Supported Version: MapR 6.0.1.20180404222005.GA

Cloud

GCP

  • Quota limit for CPU cores must be greater than 72 for the region that Infoworks is spinning up.
  • API must be enabled for the DataProc, Compute Engine, Deployment Manager and Runtime Configuration services.

Microsoft Azure

  • Quota limit for CPU cores must be greater than 60 for the region that Infoworks is spinning up.
  • ADL storage, if used, must be created in a resource group before spinning up Infoworks.
  • Vnet must be created in a resource group before spinning up Infoworks.

EMR

EMR Version: 5.17

Components

The following components (for EMR 5.17) must be selected when spinning up the cluster:

  • Hadoop 2.8.4
  • HBase 1.4.6
  • Hive and HCatalog 2.3.3
  • Spark 2.3.1
  • Tez 0.8.4
  • Zookeeper 3.4.12

Node Services

Ensure that the following node services are running on the respective nodes:

  • Master Node: Name Node, Resource Manager, Hive Servers, Application Timeline Server, Spark History Server and Zookeeper Server.
  • Core Nodes: Data Nodes, Node Managers and Region Servers.
  • EdgeNode: All Clients and IWX.

Edge Node

Infoworks requires a compute instance to set up as an edge node.

  • This instance must be in the same subnet as the subnet of the EMR Cluster.
  • This instance must have an IAM role associated with EMR and S3.
  • If security is enabled, Infoworks requires principal and keytab for the user, which is used to install and run Infoworks.

Additional Information

  • User must provide AWS account ID for Infoworks to provide access to the edge node image.
  • User must provide Private DNS of the master node of the EMR cluster.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches