Prerequisites to Spin-up Infoworks on Azure HDInsight

Most deployments require more than 60 CPU cores; check the core limit for your subscription in the required region. If required, reach out to Microsoft support.

Infoworks Spin-up with HDInsight Cluster

NOTE: Use Infoworks Marketplace Offer (Infoworks Installer) which spins up a new HDInsight cluster with Infoworks edge node.

HDInsight Requirements

The Infoworks installation depends on successful deployment of Azure HDInsight cluster, which is invoked by the solution template. Infoworks assumes that the Azure environment is configured such that the HDInsight cluster can be spinned up.

Network Security Group/Route Table Requirements

  • Azure health and management services in the Azure cloud must make inbound connection to the HDInsight cluster on port 443. These inbound connections are from few public IP addresses, which are predefined by Azure for each region. These IP addresses published by Microsoft are available here.
  • HDInsight does not work if Force Tunneling is enabled in the subnet. Microsoft recommends to either remove forced tunneling or create a new subnet for HDInsight.

To validate other rules related to network security, see Add HDInsight to an existing virtual network.

Storage Account

The wizard provides options to create a new Blob Storage account or use an existing account.

NOTE: If an existing Blob Storage account is used, it must be of type Standard_LRS. HDInsight does not support Premium_LRS.

Infoworks Spin-up on Existing HDInsight Cluster

Infoworks Edge Node is recommended to have D14 V2 configuration, unless informed by Infoworks team. This includes 16 VCPUs and 112 GB RAM.

Infoworks is working on supporting installation on an existing HDInsight cluster. This imposes the following constraints on the HDInsight cluster:

Custer Configurations

  • Cluster Type: Hbase
  • Operating System: Linux
  • HDInsight Version: 3.6

Spark 2 External Installation

Some Infoworks functionalities require external installation of Spark 2 on the cluster. Infoworks must be allowed to perform the same with root access.

Existence of Vnet

The cluster must be spinned-up in a Virtual Network. A shared (mostly existing) or dedicated Vnet (mostly newly created) is recommended if it meets the network security requirements mentioned above.

Cluster Access Details

Infoworks installation requires the following:

  • Name of the cluster
  • Cluster access credentials (username and password for Ambari)

Ingesting Data from External Azure Data Lake Store

Infoworks uses Service to Service Authorization where Service Principal is used to provide access to Azure Data Lake Store.

Access to Ambari

Infoworks might manually add some properties for the purpose of authorization.

NOTE: This step is also required for ingesting data from an external Azure Blob Storage.

Service Principal

Infoworks requires the following:

  • Service Principal object which will be listed in the registered apps in the Active Directory
  • Privileges to add service principal in the Data Lake Store access manually

Outbound Internet Access from Cluster

The cluster connects to the Azure Data Lake store using the Internet. A reference document for the same is available here.

NOTE: Infoworks uses Blob storage as the underlying storage system for HDInsight.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard