On-premise Installation
Prerequisites
Supported Operating Systems
- CentOS - Versions 6.6+, 7.3
- Red Hat Enterprise Linux - Versions 6.6+, 7.3
- Ubuntu - Version 16.04 (supported for HDInsight only)
- Debian - 8.1 (supported for DataProc only)
Supported Hadoop Distributions
- HDP - Versions 2.5.5, 2.6.4, 3.x
- MAPR - Version 6.0.1
- Cloudera - Version 5.13
- Azure - HDI 3.6
- GCP - 1.2 (Unsecured), 1.3 (Secured) Dataproc
- EMR - Version 5.17.0
Installation Procedure
The installation logs are available in <path_to_Infoworks_home>/iw-installer/logs/installer.log
.
Perform the following:
Download and Extract Installer
- Download the installer tar ball:
wget <link-to-download>
- Extract the installer:
tar -xf deploy_<version_number>.tar.gz
- Navigate to installer directory:
cd iw-installer
Configure Installation
- Run the following command:
./configure_install.sh
Enter the details for each prompt:
- Hadoop distro name and installation path (If not auto-detected)
- Infoworks user
- Infoworks user group
- Infoworks installation path
- Infoworks HDFS home (path of home folder for Infoworks artifacts)
- Hive schema for Infoworks sample data
- IP address for accessing Infoworks UI (when in doubt use the FQDN of the Infoworks host)
- HiveServer2 thrift server hostname
- Hive user name
- Hive user password
If Hadoop distro is Cloudera (CDH):
- Impala hostname
- Impala port number
- Impala user name
- Impala password
- Is Impala Kerberized?
If Impala is Kerberized:
- Kerberos Realm
- Kerberos host FQDN
If Hadoop distro is GCP:
- Managed Mongo URL
- Are infoworks directories already extracted in IW_HOME?
Run Installation
- Install Infoworks:
./install.sh -v <version_number>
NOTE: For machines without certificate setup, --certificate-check
parameter can be entered as false as described in the following syntax: ./install.sh -v <version_number> --certificate-check <true/false>
. The default value is true. If you set it to false, this performs insecure request calls. This is not a recommended setup.
NOTE:
For HDP, CentOS/RHEL6, replace <version_number>
with 2.9.0-hdp-rhel6
For HDP, CentOS/RHEL7, replace <version_number>
with 2.9.0-hdp-rhel7
For MapR or Cloudera, CentOS/RHEL6, replace <version_number>
with 2.9.0-rhel6
For MapR or Cloudera, CentOS/RHEL7, replace <version_number>
with 2.9.0-rhel7
For Azure, replace <version_number>
with 2.9.0-azure
For GCP, replace <version_number>
with 2.9.0-gcp
For EMR, replace <version_number>
with 2.9.0-emr
Post Installation
If the target machine is Kerberos enabled, performed the following post installation steps:
- Go to
<IW_HOME>/conf/conf.properties
- Edit the Kerberos security settings as follows (ensure these settings are uncommented):
- Restart the Infoworks services.
NOTE: Kerberos tickets are renewed before running all the Infoworks DataFoundry jobs. Infoworks DataFoundry platform supports single Kerberos principal for a Kerberized cluster. Hence, all Infoworks DataFoundry jobs work using the same Kerberos principal, which must have access to all the artifacts in Hive, Spark, and HDFS.