Release Notes 2.9.0
Date: 06 NOV 2019
New Features and Enhancements
Component: Data Ingestion and Synchronization
- IPD-7868 - RestAPI Ingestion Enhancements: The RestAPI ingestion mechanism has been enhanced to provide a better user experience, including User Interface and functional improvements. For details, see Generic REST API Ingestion and Custom REST API Ingestion.
- IPD-8594 - Support to Ingest User-defined Subsets of Data for Teradata Sources: Users can now specify complex filter conditions to extract a subset of data from Teradata source tables. For details, see Filter Query for Teradata Sources.
- IPD-8926 - SQL Server Source Support on GCP Platform: SQL Server ingestion via JDBC is now supported on GCP Dataproc Platform.
Component: Data Transformation
- IPD-8388 - Fuzzy Match Transformation: Transformation designers can now use a pre-built, Fuzzy Matching transformation component to match and score similar data, by overcoming spelling, phonetic and other data quality issues. For more details, see Fuzzy Match.
- IPD-8364 - Python Custom Transformation for Spark Pipelines: Data Transformation now supports custom Python extensions and Java extensions in Python pipelines. This can be used to create custom transformations that can be executed as a part of pipelines to allow integration with proprietary or third party libraries. For more details, see Python Custom Transformation.
- IPD-8468 - Support for CosmosDB Target: Data transformation pipelines can now create and incrementally synchronize data models and tables to Azure CosmosDB. For more details, see Cosmos DB Target.
Component: Cube
- IPD-8390 - Cube Support for HDP 3.1: Cubes are now supported in HDP 3.1.
Component: Orchestration
- IPD-8160 - DFS Directory Replicator Support: A new workflow node, DFS Directory Replicator, has been added which compares and copies data from the source DFS directory to destination DFS directory. For details, see DFS Directory Replicator.
Component: Data Export
- IPD-7863 - Delimited File Export to GCS, HDFS and Azure: Data can now be exported as delimited files to Google Cloud Storage, HDFS and Azure WASB and Azure ADLS. Previously, export to only S3 was supported. For more details, see Delimited File Export.
- IPD-8413: Support for PostgreSQL Export: Ingested tables and data models can now be exported and incrementally synchronized to PostgreSQL databases. For more details, see PostgreSQL Export.
Component: User Interface
- IPD-8232 - Infoworks DataFoundry Engagement Dashboard: The new dashboard displays the summary of key metrics and actions performed by users on the Infoworks platform. This provides a statistical framework to view and analyze data for usage performance. This also provides details on data and user engagement across different roles, within Infoworks DataFoundry. For more details, see Engagement Dashboard.
Bug Fix
- IPD-8920 - Segmented Load Ingestion Issue Fix: During segmented load ingestion, if the Select All option was used and if any of the segments was already loaded, the unloaded segments were skipped. This issue has now been fixed.
Limitations
- Support for HDP 2.5 has been discontinued.
- Netezza and Teradata exports are not currently supported in HDP 3.1.
Installation
Refer Installation to install Infoworks DataFoundry 2.9.0.
Upgrading to This Release
To upgrade your current Infoworks DataFoundry version, execute the following commands on the edge node:
NOTE: Before starting the upgrade, ensure that no Infoworks jobs are running.
- Run the following command:
source $IW_HOME/bin/env.sh
- Navigate to the scripts directory using the following command:
cd $IW_HOME/scripts
; where, $IW_HOME is the directory where the Infoworks DataFoundry is installed. If scripts folder is not available (2.4.x, 2.5.x, 2.6.x base versions), create scripts folder in $IW_HOME. - Download the update script using the following command:
wget <link-to-download>
; reach out to your Infoworks support representative to get the link to download and replace with the link. - Upgrade the Infoworks DataFoundry version using the following commands:
./update.sh -v <version_number>
NOTE: For machines without certificate setup, --certificate-check
parameter can be entered as false as described in the following syntax: ./update.sh -v <version_number> --certificate-check <true/false>
. The default value is true. If you set it to false, this performs insecure request calls. This is not a recommended setup.
NOTES:
For HDP, CentOS/RHEL6, replace <version_number>
with 2.9.0-hdp-rhel6
For HDP, CentOS/RHEL7, replace <version_number>
with 2.9.0-hdp-rhel7
For MapR or Cloudera, CentOS/RHEL6, replace <version_number>
with 2.9.0-rhel6
For MapR or Cloudera, CentOS/RHEL7, replace <version_number>
with 2.9.0-rhel7
For Azure, replace <version_number>
with 2.9.0-azure
For GCP, replace <version_number>
with 2.9.0-gcp
NOTE: If MongoDB is not managed locally, the MongoDB server must be updated to the latest version (4.0) manually.
For EMR, replace <version_number>
with 2.9.0-emr
If the base version is below version 2.7.0, the upgrade procedure upgrades Metadata DB (Mongo) from 3.6 to 4.0 version. The upgrade of metadata DB includes the following:
- updates the metadata DB binaries
- sets up feature compatibility version
Post-upgrade Procedure
Infoworks DataFoundry now supports HDP 3.1, in addition to HDP 2.5.5 and HDP 2.6.4 versions. And, Python version has been upgraded to Python 3.6.9. This requires modifications in the $IW_HOME/conf/conf.properties
The properties must be modified when upgrading Infoworks DataFoundry from previous versions to 2.9.0 version (irrespective of whether the HDP version is 3.1 distro or not).
NOTE: New installations of Infoworks DataFoundry 2.9.0 works automatically, without the modifications.
Environment: HDP
Change Request 1
- Navigate to the $IW_HOME/conf/conf.properties file.
- Remove $IW_HOME//lib/parquet-support/* from the iw_jobs_classpath key value.
- Ensure that the additional : at the end of the value is removed.
Change Request 2
- Navigate to the $IW_HOME/conf/conf.properties file.
- If the base version is below 2.8, replace the Hive client libraries (like /usr/hdp/current/hive-client/lib/…) in the iw_jobs_classpath key value with /usr/hdp/current/hive-client/lib/*.
- Ensure that the additional : at the end of the value is removed.
Environments: HDP, MapR, CDH, Azure, GCP, EMR
Change Request 3
- Navigate to the $IW_HOME/conf/conf.properties file.
- Remove $IW_HOME/lib/shared/* from the df_batch_classpath key value.
- Ensure that the additional : at the end of the value is removed.
Change Request 4
- Navigate to the $IW_HOME/conf/conf.properties file.
- Remove $IW_HOME/lib/shared/* from the df_tomcat_classpath key value.
- Ensure that the additional : at the end of the value is removed.
- Stop and start the transformation service using the following commands:
source $IW_HOME/bin/env.sh; $IW_HOME/bin/stop.sh df; $IW_HOME/bin/start.sh df
For SQL Server log-based ingestion and OGG-based ingestion, perform Initialize and Ingest (full load ingestion) for all tables.
Custom Jar Files
From Infoworks DataFoundry version 2.7.1, during an upgrade the $IW_HOME/lib/extras directory and the subdirectories are preserved to place the custom jar files. Users having custom jar files must manually place them in these folders. For more details, see Custom Jar Files.