Title
Create new category
Edit page index title
Edit category
Edit link
Prerequisites
On-premise
- High configuration machine must be available for edge node to run the jobs faster.
- Hadoop, Hive, Spark2 and HBase must be installed and must be running in the cluster.
- Minimum disk space of 50GB must be available.
- Java 8 must be installed on the cluster.
Hortonworks
If security like Ranger and Kerberos is enabled, following are the prerequisites:
- Kerberos: A Principal and Keytab must be created for the
<Infoworks_User>and the Keytab must be available in the edge node. - Ranger:
<Infoworks_User>must have permissions to Hadoop policy for the/user/<Infoworks_User>directory.
Supported version: HDP-2.6.4.0
Cloudera
If security is enabled like Sentry and Kerberos, following are the prerequisites:
- Kerberos: A principal and Keytab should be created for the
<Infoworks_User>and the Keytab must be available in the edge node. - Sentry:
<Infoworks_User>must have permissions to Hive policy.
Supported Version: CDH 5.13.0
MapR
If security is enabled, perform the following on the edge node to generate a ticket from the <Infoworks_User> terminal:
- Run the maprlogin password command.
- Enter the password of
<Infoworks_User>when prompted.
Supported Version: MapR 6.0.1.20180404222005.GA
Cloud
GCP
- Quota limit for CPU cores must be greater than 72 for the region that Infoworks is spinning up.
- API must be enabled for the DataProc, Compute Engine, Deployment Manager and Runtime Configuration services.
Microsoft Azure
- Quota limit for CPU cores must be greater than 60 for the region that Infoworks is spinning up.
- ADL storage, if used, must be created in a resource group before spinning up Infoworks.
- Vnet must be created in a resource group before spinning up Infoworks.
EMR
EMR Version: 5.17
Components
The following components (for EMR 5.17) must be selected when spinning up the cluster:
- Hadoop 2.8.4
- HBase 1.4.6
- Hive and HCatalog 2.3.3
- Spark 2.3.1
- Tez 0.8.4
- Zookeeper 3.4.12
Node Services
Ensure that the following node services are running on the respective nodes:
- Master Node: Name Node, Resource Manager, Hive Servers, Application Timeline Server, Spark History Server and Zookeeper Server.
- Core Nodes: Data Nodes, Node Managers and Region Servers.
- EdgeNode: All Clients and IWX.
Edge Node
Infoworks requires a compute instance to set up as an edge node.
- This instance must be in the same subnet as the subnet of the EMR Cluster.
- This instance must have an IAM role associated with EMR and S3.
- If security is enabled, Infoworks requires principal and keytab for the user, which is used to install and run Infoworks.
Additional Information
- User must provide AWS account ID for Infoworks to provide access to the edge node image.
- User must provide Private DNS of the master node of the EMR cluster.
For more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
(C) 2015-2022 Infoworks.io, Inc. and Confidential