On-premise Installation
Prerequisites
Supported Operating Systems
- CentOS - Versions 6.6+, 7.3
- Red Hat Enterprise Linux - Versions 7.5
- Ubuntu - Version 16.04 (supported for HDInsight only)
- Debian - 8.1 (supported for DataProc only)
- SUSE Linux Enterprise Server - Version 12
- EMR Operating System - Amazon Linux
Supported Hadoop Distributions
- HDP - Versions 2.5.5, 2.6.4, 3.x
- MAPR - Version 6.0.1
- Azure - HDI 3.6
- GCP - 1.2 (Unsecured), 1.3 (Secured) Dataproc
- EMR - Version 5.17.0
Installation Procedure
Perform the following:
Step 1: Download and Extract Installer
- Navigate to a temporary directory:
cd <temporary_installer_directory>
- Download the installer tar ball by running the following command:
wget <link-to-download>
NOTE: Contact support@infoworks.io to get the <link-to-download>
.
Step 2: Extract the installer by running the following command: tar -xf deploy_<version_number>.tar.gz
Step 3: Navigate to the installer directory by running the following command: cd iw-installer
This creates a directory named iw-installer.
Step 4: Configure installation
- Run the following command:
./configure_install.sh
Enter the details for the following queries prompted:
- Hadoop distribution name and installation path (If not auto-detected).
- Infoworks user
- Infoworks user group
- Infoworks installation path where you need to install Infoworks. This location will be referred as IW_HOME.
- Infoworks HDFS home (path of home folder for Infoworks artifacts)
- Hive schema for Infoworks sample data
- IP address for accessing Infoworks UI (when in doubt use the FQDN of the Infoworks host)
- HiveServer2 thrift server hostname: Hostname of the instance where the HiveServer2 service is running.
- Hive user name
- Hive user password
If Hadoop distro is GCP:
- Managed MongoDB URL, if the MongoDB is not managed on the same machine.
- Are Infoworks directories already extracted in IW_HOME?
Run Installation
Run the following command to run the Installation of Infoworks: ./install.sh -v <version_number>
NOTE: For machines without certificate setup, --certificate-check
parameter can be entered as false as described in the following syntax: ./install.sh -v <version_number> --certificate-check <true/false>
. The default value is true. If you set it to false, this performs insecure request calls. This is not a recommended setup.
To exclude a particular service, use the following command: --exclude-services cube-engine
. For example, to exclude Cube engine, use ./install.sh -v <version_number> --exclude-services cube-engine
- For HDP, CentOS/RHEL6, replace <version-number> with 2.9.0-hdp-rhel6
- For HDP, CentOS/RHEL7, replace with 2.9.0-hdp-rhel7
- For MapR, CentOS/RHEL6, replace <version_number> with 2.9.0-rhel6
- For MapR, CentOS/RHEL7, replace <version_number> with 2.9.0-rhel7
- For Azure, replace <version_number>with 2.9.0-azure
- For GCP, replace <version_number>with 2.9.0-gcp
- For EMR, replace <version_number>with 2.9.0-emr
NOTE: To find the rhel version, run the following command: cat /etc/os-release or lsb_release -r
The installation logs are available in <temporary-installer-directory>/iw-installer/logs/installer.log
Silent Installation Procedure
To perform the installation offline, follow the steps below:
Step 1: Get the installer tar ball locally.
Step 2: Extract the installer by running the following command:tar -xf deploy_<version_number>.tar.gz
Step 3: Get the Infoworks DataFoundry tar ball.
Step 4: Run the following commands to place the Infoworks DataFoundry tar ball in the correct location:
mkdir iw-installer/downloads
cp infoworks-x.tar.gz iw-installer/downloads/
Step 5: Navigate to the installer directory by running the following command: cd iw-installer
Step 6: Go to Step 4 of the Installation Procedure.
Post Installation
If the target machine is Kerberos enabled, performed the following post installation steps:
- Go to
<IW_HOME>/conf/conf.properties
- Edit the Kerberos security settings as follows (ensure that these settings are uncommented):

NOTE: Kerberos tickets are renewed before running all the Infoworks DataFoundry jobs. Infoworks DataFoundry platform supports single Kerberos principal for a Kerberized cluster. Hence, all Infoworks DataFoundry jobs work using the same Kerberos principal, which must have access to all the artifacts in Hive, Spark, and HDFS.