Release Notes 3.1

Date: 26 MAR 2020

New Features and Enhancements

Component: Data Ingestion and Synchronization

IPD-8635 - SQL Server BCP Export Optimization: Infoworks DataFoundry now supports faster, multi-threaded export for SQL Server BCP sources. The advanced configuration, BCP_POOL_SIZE, can be used to parallelize the transfer to SQL Server which ensures that the export jobs run faster.
IPD-9040 - BigQuery Ingestion Improvements: Infoworks DataFoundry supports BigQuery ingestion which allows crawling data in a highly parallelized method. Data can be parallelized using SplitBy for high volume of data. Also, BigQuery Ingestion now supports the Segmented Load feature.
IPD-8977 - Handling Uncommitted Transactions in Oracle Log-Based Ingestion: Oracle log-based ingestion now supports ingestion of uncommitted data. The uncommitted transactions are staged in a table in Oracle. When subsequent jobs run, the uncommitted transactions that have been rolled back are discarded and the transactions that have been committed are brought into the datalake.
IPD-9101 - Enabling Schema Synchronization in CSV and Fixed Width Ingestion: An option, Enable Schema Synchronization, has been added to synchronize the new sources and tables that have been truncated and reloaded. After enabling this option, if a new column is added in the files that are being crawled, the job fails. In the Infoworks DataFoundry User Interface, users can add new column name/datatype and restart the ingestion job to obtain the data for the new column in the datalake.

Component: Data Transformation

IPD-9063 - Azure SQL Datawarehouse Target Support: A new target node, Azure SQL Datawarehouse, has been added. Azure SQL Data Warehouse is a cloud-based enterprise data warehouse that leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data and can be used as a key component of big data solution.
IPD-9039 - Pipeline Utility Script Support: Users can now run a script with table lists and properties as parameters. This script can be used to easily create multiple pipelines with the same structure.
IPD-9789 - Support for WITH SQL Statements: SQL import now supports WITH SQL statements. Statements which contain WITH and one level of subquery are also supported.

Component: Orchestration

IPD-8959 - Enhanced Options To Schedule Workflow: Users can now perform the following: schedule a workflow job to run once or recursively, define a specific time for execution of job, and define the frequency and interval of the recurrence.

Component: User Interface

IPD-8956 - Engagement Metrics Dashboard Enhancement: The Engagement Metrics Dashboard has been enhanced to allow users to view an aggregated data for all artifacts, starting from the time Infoworks DataFoundry is being launched.

Installation

Refer Installation and Configuration to install Infoworks DataFoundry 3.1.

Upgrading to This Release

To upgrade your current Infoworks DataFoundry version, execute the following commands on the edge node:

NOTE: Before starting the upgrade, ensure that no Infoworks jobs are running.

Run the following command: source $IW_HOME/bin/env.sh
Navigate to the scripts directory using the following command: cd $IW_HOME/scripts; where, $IW_HOME is the directory where the Infoworks DataFoundry is installed. If scripts folder is not available (2.4.x, 2.5.x, 2.6.x base versions), create scripts folder in $IW_HOME.
Download the update script using the following command: wget <link-to-download>; reach out to your Infoworks support representative to get the link to download and replace with the link.
Upgrade the Infoworks DataFoundry version using the following commands: ./update.sh -v <version_number>

NOTE: For machines without certificate setup, -``-certificate-check parameter can be entered as false as described in the following syntax: ./update.sh -v <version_number> --certificate-check <true/false>. The default value is true. If you set it to false, this performs insecure request calls. This is not a recommended setup.

NOTES:

For HDP, CentOS/RHEL6, replace <version_number> with 3.1-hdp-rhel6

For HDP, CentOS/RHEL7, replace <version_number> with 3.1-hdp-rhel7

For MapR or Cloudera, CentOS/RHEL6, replace <version_number> with 3.1-rhel6

For MapR or Cloudera, CentOS/RHEL7, replace <version_number> with 3.1-rhel7

For Azure, replace <version_number>with 3.1-azure

For GCP, replace <version_number>with 3.1-gcp

NOTE: If MongoDB is not managed locally, the MongoDB server must be updated to the latest version (4.0) manually.

For EMR, replace <version_number>with 3.1-emr

If the base version is below version 2.7.0, the upgrade procedure upgrades Metadata DB (Mongo) from 3.6 to 4.0 version. The upgrade of metadata DB includes the following:

updates the metadata DB binaries
sets up feature compatibility version

NOTE: If you are upgrading from Infoworks DataFoundry versions 2.8 or 2.9 to Infoworks DataFoundry version 3.1, then run the following commands:

source /bin/env.sh
cd $IW_HOME/scripts
wget https://infoworks-setup.s3.amazonaws.com/3.1/conf_migrate_azure_310.sh
chmod +x conf_migrate_azure_310.sh
./conf_migrate_azure_310.sh

Post-upgrade Procedure

Modifications must be made in the $IW_HOME/conf/conf.properties file. The properties must be modified when upgrading Infoworks DataFoundry from previous versions to 3.1 version.

NOTE: New installations of Infoworks DataFoundry 3.1 works automatically, without the modifications.

Environments: HDP 2.x, CDH, GCP, EMR

Step 1

Navigate to the $IW_HOME/conf/conf.properties file.
Remove $IW_HOME/lib/shared/* from the df_batch_classpath key value.
Ensure that the additional : at the end of the value is removed.

Step 2

Navigate to the $IW_HOME/conf/conf.properties file.
Remove $IW_HOME/lib/shared/* from the df_tomcat_classpath key value.
Ensure that the additional : at the end of the value is removed.
Stop and start the transformation service using the following commands: source $IW_HOME/bin/env.sh; $IW_HOME/bin/stop.sh df; $IW_HOME/bin/start.sh df

Environment: HDP 3.1.4

Post-installation Steps for HDP 3.1.4

Navigate to the $IW_HOME/conf/conf.properties file.
Append hive.optimize.index.filter=false for hiveConfigurationVariables.

For SQL Server log-based ingestion and OGG-based ingestion, perform Initialize and Ingest (full load ingestion) for all tables.

Custom Jar Files

From Infoworks DataFoundry version 2.7.1, during an upgrade the $IW_HOME/lib/extras directory and the subdirectories are preserved to place the custom jar files. Users having custom jar files must manually place them in these folders. For more details, see Custom Jar Files.

Last updated on