Additional Configurations

Configurations can be added in either the mr.conf file for batch and incremental replication, or in the file_transfer.xml file for HDFS File transfer. The mr.conf follows the the key=value format and the file_tansfer.xml follows the Hadoop configuration files.

Following are the configurations:

  • use.temp.path: This configuration indicates the temporary path specified while creating the destination cluster entity in Infoworks ADE. The data is returned to the temp directory and then the file is renamed to the actual path. This is not applicable for encryption zones and hence must be set to false. The default value is true.
  • zookeeper.connection.string: If this property is set, dynamic throttling obtains the latest configuration properties from zookeeper servers specified in this connection string. This value is not set by default. For more details, see the Throttling section.
  • infoworks.replication.encryption.zones: Set this to a JSON array of encryption zones. If this value is not set and checksum checking is ON, transfer to encryption zone fails. Sample value: ["/user/hive/warehouse/tpcds_bin_partitioned_parquet_3.db","/user/ec2-user/encr"]
  • infoworks.replication.encryption.zones.rb.cksum: This configuration is only applicable to encryption zones.

When this configuration is OFF (by default), the checksum validation is performed as follows:

  • Before file transfer the in-memory checksum of the file is calculated.
  • The transfer is started and while transferring the in-memory checksum is calculated again.
  • After transfer, the two checksums are compared. If the values do not match, the transfer is marked as failure and the transferred file is deleted.

When this configuration is turned ON, the checksum validation is performed as follows:

  • Before file transfer the in-memory checksum of the file is calculated.
  • The file is read back to calculate the second checksum.
  • After transfer, the two checksums are compared. If the values do not match, the transfer is marked as failure and the transferred file is deleted.

Limitation

Infoworks Replicator does not support replication of Hive Managed Tables from ADLS (Azure Data Lake Storage) as source to ADLS as destination. This is because, by default the location of the Managed Table on the source ADLS is adl://home/ and the destination misinterprets the home directory of the source with the home directory of the destination. This issue does not occur with the External Tables created with a fully qualified path.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard