Prerequisites to Spin-up Infoworks on Azure HDInsight
Most deployments require more than 60 CPU cores; check the core limit for your subscription in the required region. If required, reach out to Microsoft support.
Infoworks Spin-up with HDInsight Cluster
NOTE: Use Infoworks Marketplace Offer (Infoworks Installer) which spins up a new HDInsight cluster with Infoworks edge node.
HDInsight Requirements
The Infoworks installation depends on successful deployment of Azure HDInsight cluster, which is invoked by the solution template. Infoworks assumes that the Azure environment is configured such that the HDInsight cluster can be spinned up.
Network Security Group/Route Table Requirements
- Azure health and management services in the Azure cloud must make inbound connection to the HDInsight cluster on port 443. These inbound connections are from few public IP addresses, which are predefined by Azure for each region. These IP addresses published by Microsoft are available here.
- HDInsight does not work if Force Tunneling is enabled in the subnet. Microsoft recommends to either remove forced tunneling or create a new subnet for HDInsight.
To validate other rules related to network security, see Add HDInsight to an existing virtual network.
Storage Account
The wizard provides options to create a new Blob Storage account or use an existing account.
NOTE: If an existing Blob Storage account is used, it must be of type Standard_LRS. HDInsight does not support Premium_LRS.
Infoworks Spin-up on Existing HDInsight Cluster
Infoworks Edge Node is recommended to have D14 V2 configuration, unless informed by Infoworks team. This includes 16 VCPUs and 112 GB RAM.
Infoworks is working on supporting installation on an existing HDInsight cluster. This imposes the following constraints on the HDInsight cluster:
Custer Configurations
- Cluster Type: Hbase
- Operating System: Linux
- HDInsight Version: 3.6

Spark 2 External Installation
Some Infoworks functionalities require external installation of Spark 2 on the cluster. Infoworks must be allowed to perform the same with root access.
Existence of Vnet
The cluster must be spinned-up in a Virtual Network. A shared (mostly existing) or dedicated Vnet (mostly newly created) is recommended if it meets the network security requirements mentioned above.
Cluster Access Details
Infoworks installation requires the following:
- Name of the cluster
- Cluster access credentials (username and password for Ambari)
Ingesting Data from External Azure Data Lake Store
Infoworks uses Service to Service Authorization where Service Principal is used to provide access to Azure Data Lake Store.
Access to Ambari
Infoworks might manually add some properties for the purpose of authorization.
NOTE: This step is also required for ingesting data from an external Azure Blob Storage.
Service Principal
Infoworks requires the following:
- Service Principal object which will be listed in the registered apps in the Active Directory
- Privileges to add service principal in the Data Lake Store access manually
Outbound Internet Access from Cluster
The cluster connects to the Azure Data Lake store using the Internet. A reference document for the same is available here.
NOTE: Infoworks uses Blob storage as the underlying storage system for HDInsight.
Locating Subscription ID
To deploy Infoworks in Azure cloud using Infoworks installer, the subscription ID of the Azure cloud must be shared with the Infoworks support team.
Following are the steps to obtain the subscription ID:
- Login to the Azure portal.
- Search for subscriptions in the search box and click Subscriptions.

- Copy the Subscription ID and send to the Infoworks support team.

