Release Notes 2.9.1
Date: 03 JAN 2020
New Features and Enhancements
Component: Replication
- IPD-8953 - Support for Data Replication from S3 to GCS: Infoworks Replicator now supports replication of data from S3 to Google Cloud Storage (GCS). This provides easy and fast means to migrate data from AWS to GCP, facilitating cloud migration. A user with admin or data modeller privilege can select the files to be replicated as a part of the replication job.
Following are the modifications for this feature:
- Admin - A new tab, Object Storage, has been added to the Admin > Replicator page which allows the user to create and manage replication entities. When creating a replication entity, the user can now select the Object Storage Type as S3 or Google Cloud Storage and configure the authentication details. These connection objects can be used in the object storage copy node in workflow.
- Replicator - A new tab, Object Storage, has been added to the Replicator page which allows user to view all the replication entities.
- Orchestrator - A new node, Object Store Copy, has been added to orchestrator which allows the user to configure the source and destination connection details.
Component: Data Export
- IPD-9714 - SQL Server BCP Export Optimization: Infoworks DataFoundry now supports multi-threaded export for SQL Server BCP sources. The advanced configuration, BCP_POOL_SIZE, can be used to parallelize the transfer to SQL Server which ensures that the export jobs run faster.
Bug Fixes
- IPD-9099 - Issue in Executing Node Under a Workflow: A Governor Operation Timed Out error occurred while executing a node under a workflow. This occurred as a result of communication failure between Infoworks DataFoundry services. This issue has now been fixed by adding a new advanced system configuration parameter, governor_retry_count. This parameter defines the number of retries to initiate communication between the Infoworks DataFoundry services. The retry option enables the node in a workflow to attempt to retrieve the status of Infoworks DataFoundry jobs. For more details, see System Configuration Keys and Descriptions topic in the System Configuration page of Infoworks DataFoundry 2.9.1 Product Documentation.
- IPD-8991 - Issue with Workflow Nodes: Some of the workflow nodes failed instantaneously when the underlying Infoworks DataFoundry jobs were triggered and were still running. This occurred when the node was unable to retrieve the status of the Infoworks DataFoundry jobs. This issue has now been fixed by adding a new advanced system configuration parameter, polling_failure_retries. This parameter defines the number of retries to retrieve the status of the Infoworks DataFoundry jobs. For more details, see System Configuration Keys and Descriptions topic in the System Configuration page of Infoworks DataFoundry 2.9.1 Product Documentation.
- IPD-9638 - Nginx Version Upgrade: Infoworks DataFoundry now supports Nginx version 1.16.1.
- IPD-9288 - Issue with Workflow Leaf Node: Infoworks DataFoundry was unable to stop or cancel the workflow leaf node, or one of the intermediate nodes from running state. This issue has now been fixed.
- IPD-9796 - Issue in Creating Directory or File Through WASB: An error occurred when a directory or file was created with a colon in the WASB name. This issue occurred because colon is not supported by the file system. The advanced configuration, ENCODE_PRIMARY_PARTITION, can now be set to true at the source level for encoding.
- IPD-9719 - Issue in Using Interactive Pipeline in Spark Yarn-Cluster Mode: An issue occurred while using interactive pipeline in Spark yarn-cluster mode. This issue occurred because the same Spark configuration file was being used for both the batch and interactive modes. Modifications have been made to use separate Spark configuration files for batch and interactive modes. For details, see Submitting Spark Pipelines.
- IPD-9091 - MapReduce Job Call Timeout Issue: The MapReduce Job status calls failed intermittently due to no response from the Hadoop NameNode. This issue has been fixed by adding retries to get the job status.
Limitation
- The file preview feature functions only if the number of files and folders are less than 5000.
Installation
Refer Installation and Configuration to install Infoworks DataFoundry 2.9.1.
Upgrading to This Release
To upgrade your current Infoworks DataFoundry version, execute the following commands on the edge node:
NOTE: Before starting the upgrade, ensure that no Infoworks jobs are running.
- Run the following command:
source $IW_HOME/bin/env.sh
- Navigate to the scripts directory using the following command:
cd $IW_HOME/scripts
; where, $IW_HOME is the directory where the Infoworks DataFoundry is installed. If scripts folder is not available (2.4.x, 2.5.x, 2.6.x base versions), create scripts folder in $IW_HOME. - Download the update script using the following command:
wget <link-to-download>
; reach out to your Infoworks support representative to get the link to download and replace with the link. - Upgrade the Infoworks DataFoundry version using the following commands:
./update.sh -v <version_number>
NOTE: For machines without certificate setup, --certificate-check
parameter can be entered as false as described in the following syntax: ./update.sh -v <version_number> --certificate-check <true/false>
. The default value is true. If you set it to false, this performs insecure request calls. This is not a recommended setup.
NOTES:
For HDP, CentOS/RHEL6, replace <version_number>
with 2.9.1-hdp-rhel6
For HDP, CentOS/RHEL7, replace <version_number>
with 2.9.1-hdp-rhel7
For MapR or Cloudera, CentOS/RHEL6, replace <version_number>
with 2.9.1-rhel6
For MapR or Cloudera, CentOS/RHEL7, replace <version_number>
with 2.9.1-rhel7
For Azure, replace <version_number>
with 2.9.1-azure
For GCP, replace <version_number>
with 2.9.1-gcp
NOTE: If MongoDB is not managed locally, the MongoDB server must be updated to the latest version (4.0) manually.
For EMR, replace <version_number>
with 2.9.1-emr
If the base version is below version 2.7.0, the upgrade procedure upgrades Metadata DB (Mongo) from 3.6 to 4.0 version. The upgrade of metadata DB includes the following:
- updates the metadata DB binaries
- sets up feature compatibility version
Post-upgrade Procedure
Infoworks DataFoundry now supports HDP 3.1, in addition to HDP 2.5.5 and HDP 2.6.4 versions. And, Python version has been upgraded to Python 3.6.9. This requires modifications in the $IW_HOME/conf/conf.properties
The properties must be modified when upgrading Infoworks DataFoundry from previous versions to 2.9.1 version (irrespective of whether the HDP version is 3.1 distro or not).
NOTE: New installations of Infoworks DataFoundry 2.9.1 works automatically, without the modifications.
Environment: HDP
Change Request 1
- Navigate to the $IW_HOME/conf/conf.properties file.
- Remove $IW_HOME//lib/parquet-support/* from the iw_jobs_classpath key value.
- Ensure that the additional : at the end of the value is removed.
Change Request 2
- Navigate to the $IW_HOME/conf/conf.properties file.
- If the base version is below 2.8, replace the Hive client libraries (like /usr/hdp/current/hive-client/lib/…) in the iw_jobs_classpath key value with /usr/hdp/current/hive-client/lib/*.
- Ensure that the additional : at the end of the value is removed.
Environments: HDP, MapR, CDH, Azure, GCP, EMR
Change Request 3
- Navigate to the $IW_HOME/conf/conf.properties file.
- Remove $IW_HOME/lib/shared/* from the df_batch_classpath key value.
- Ensure that the additional : at the end of the value is removed.
Change Request 4
- Navigate to the $IW_HOME/conf/conf.properties file.
- Remove $IW_HOME/lib/shared/* from the df_tomcat_classpath key value.
- Ensure that the additional : at the end of the value is removed.
- Stop and start the transformation service using the following commands:
source $IW_HOME/bin/env.sh; $IW_HOME/bin/stop.sh df; $IW_HOME/bin/start.sh df
Custom Jar Files
From Infoworks DataFoundry version 2.7.1, during an upgrade the $IW_HOME/lib/extras directory and the subdirectories are preserved to place the custom jar files. Users having custom jar files must manually place them in these folders. For more details, see Custom Jar Files.
Release Notes 2.9.1.1
Date: 03 MAR 2020
Enhancement
Component: Data Export
IPD-10206 - Support for Column Names in Header Rows For Delimited Files Exported to S3: While exporting delimited files to S3, the column names in the header rows of the .csv file is supported in S3. To enable this, user must navigate to Admin > Configuration page, and add a new configuration with Key as append_header and Value as True. If the key is not set, the system defaults the append_header parameter as False.
Bug Fixes
- IPD-10201 - Issue in LDAP Support for Cubes: Cube connectivity from Kylin ODBC driver failed when the LDAP authentication was enabled. This issue has now been fixed.
- IPD-10240 - Issue with Incremental Export: While processing an incremental export in table groups having more than one table, the last table always used to run full export. This issue has now been fixed.
- IPD-10257 - Issue with Incremental Export on Alternate Runs: Incremental export incorrectly ran as full export on alternate runs. This occurred because the data was overwritten in the export table. This issue has now been fixed.
References
- Refer Installation and Configuration to install Infoworks DataFoundry 2.9.1.
- Refer Upgradation to upgrade to Infoworks DataFoundry 2.9.1.1.
NOTE: Infoworks DataFoundry 2.9.1.1 is currently only supported for installation on SUSE Linux edge nodes.
Release Notes 2.9.1.2
Date: 09 MAR 2020
Bug Fixes
- IPD-10321 - Issue in Export of Delimited Files to S3: While exporting delimited files to S3, the export failed for tables with large data, with the following exception: <Directory_name> already exists. This issue has now been fixed.
Limitation
In RDBMS ingestion, tables that include a period ( . ) in the table name are not supported. For example, name.table.
References
- Refer Installation and Configuration to install Infoworks DataFoundry 2.9.1
- Refer Upgradation to upgrade to Infoworks DataFoundry 2.9.1.2
NOTE: Infoworks DataFoundry 2.9.1.2 is currently only supported for installation on SUSE Linux edge nodes.