System Configuration

The System Configuration page allows you to add/delete configurations and change the values and descriptions of configuration keys. The autocomplete feature allows you to select the required configuration from the drop-down list based on your input.

  • Click the Admin menu and click the Configuration icon. The System Configuration page will be displayed.

The following table lists all the configuration parameters and their descriptions.

System Configuration Keys and Descriptions

Configuration ParameterDescriptionDefault Value
hiveConfigurationVariablesExtra parameters for the hive connection. Default when we ship must be 'hive.auto.convert.join=false; hive.insert.into.multilevel.dirs=true''hive.auto.convert.join=falsehive.insert.into.multilevel.dirs=true' NOTE: Other values can be:hive.auto.convert.join=false;hive.insert.into.multilevel.dirs=true;hive.mapred.supports.subdirectories=true; mapred.input.dir.recursive=true; hive.optimize.insert.dest.volume=true; hive.exec.parallel=true;
numRowsSampleDataNumber of rows to be retrieved as part of sample data.100
numGrpsRangeNumber of groups to be created while computing categorical data.
dw_join_numreducersNumber of reducers for data warehouse join.10
dw_join_parallelismNumber of parallel joins for data warehouse build.3
disable_categorical_dataIf true, categorical data computation will be skipped.false
source_ingest_parallelismNumber of tables to crawl in parallel.2
disable_additional_table_infoIf true, fetching sample data and computing categorical data will be skipped.false
source_ingest_mappersNumber of mappers to crawl a table.1
sqoopNumberOfReducersNumber of reducers to crawl a table20
number_of_secondary_partitionNumber of secondary partitions in each primary partition2
extract_row_limitNumber of rows to crawl from a table per mapper, -1 will do full table crawl without any limit.0
END_DATE_STRINGRepresents expiration date for each record on Hive. This is an IW audit column.9999-09-09
END_TIMESTAMP_STRINGRepresents expiration timestamp for each record on Hive. This is an IW audit column.9999-09-09 09:09:09
disable_row_countIf true, metadata crawl will not fetch row count for source tables.false
iw_job_timeout_secsJob timeout in milliseconds for Infoworks MR jobs.
iw_jobs_default_mr_map_mem_mbDefault map memory for Infoworks jobs.
iw_jobs_default_mr_red_mem_mbDefault reducer memory for Infoworks jobs.
fetch_sizeIn RDBMS ingestion using JDBC, this parameter can be added as a constant at the table level. This takes an integer (default size: 5000). This parameter is used to fetch at once, the number of rows using JDBC, which improves the performance for large table ingestions.5000
cube_star_schema_job_map_mem_mbCube starschema job mapper memory.
cube_star_schema_job_red_mem_mbCube starschema job reducer memory.
max_number_of_chunksMaximum number of chunks allowed while chunking a table.2000
SKIP_RELOADED_CHUNKSIf true, already loaded chunks will be skipped during chunk loading.true
enable_schema_synchronizationIf true, table schema synchronization will be triggered before data synchronization.false
db_time_zoneDatabase server time zone.MST
NUM_PARALLEL_MERGE_TASKSNumber of parallel merge tasks while merging a table.1
MERGE_CACHE_MBCache size for merging delta files.0
source_crawl_cdc_mapmerge_mr_map_mem_mb
check_number_of_partitionsIf false, checking available number of partitions during full load of a table will be disabled.false
NUM_PARALLEL_MERGE_INMEM_LOADSNumber of parallel merge jobs that will be loaded in memory.1
USE_COMBINE_INPUT_FORMATtrue
source_crawl_cdc_mapmerge_mr_red_mem_mb
parallel_table_mergeNumber of different tables merge that can happen in parallel.5
num_parallel_jobs_per_entityNumber of jobs that can run in parallel per entity (source crawl, data warehouse build, cube build).2
disable_row_count_for_hivefalse
recrawl_table_schema_on_table_truncateIf set to true, on table truncate, table schema will be re-crawled to get latest schema from source.true
validate_chunksIf set to true, after every chunk load, data validation at chunk level will be triggered.true
enable_job_logs_to_mongoIf set to true, job logs will go to file and mongo.false
hive_conn_string_delimiterDelimiter to use while forming Hive JDBC connection string.?
df_workspace_schemaSchema where Data Transformation stores all intermediate tables.iw_df_workspace
df_workspace_base_pathHDFS path where Data Transformation stores data for all intermediate data./iw/df/workspace
df_shared_connection_enabledWhether to reuse the connection for interactive mode. Typically true for Hive+Tez and false for Hive+MR.true
df_shared_connection_timeout_msAfter how long should an idle shared connection be closed. Shared connection should be enabled.180000
df_interactive_exec_pool_sizeThread pool size for executing tasks per request in the interactive mode.10
df_interactive_table_maximum_sizeMaximum size of any of the intermediate tables in interactive mode.10000
df_batch_exec_pool_sizeThread pool size for executing tasks per request in the batch mode.5
df_sampling_num_threadsNumber of sampling tasks to run in parallel.5
df_interactive_hive_settingsSemi-colon separated list of Entity level Hive configurations which includes the Hive configuration parameters that can be modified.key1=value1 key2=value2 key3=value3
df_batch_hive_settingsIf the builds are slow, use this key to set or change Hive parameters at a pipeline level. These are semi-colon separated list of Entity level Hive configurations.key1=value1 key2=value2 key3=value3
df_compute_stats_enabledWhen enabled, table-statistics are computated for merge or overwrite targets at the end of pipeline build job.true
PIPELINE_SOURCES_AUTO_SYNC_CHECK_DISABLEDIf set to true, it disables the error icon on the pipeline editor page indicating source schema mismatch. If set to false, the error icon displays on the pipeline editor page.
random_sampling_enabledDetermines whether sample generation, for source or pipeline target, needs to be random. To be disabled when sampling is taking too long to complete or is failing.true
CUBE_DIMENSIONS_ENABLE_DICTIONARYIf dimension tables need to be stored in memory.true
sqoop_job_map_mem_mbMapper mem setting for sqoop. Might need to increase for tables with huge rows.
sqoop_job_red_mem_mbReducer mem setting for sqoop.
append_date_to_logfileAppend current date to IW job log file name.true
LOGMINER_TABLESPACE_NAMETable space name.
use_new_tablespaceIf we want to start logminer in different tablespace.false
use_temp_table_for_log_based_cdcTo use temp table approach for log based cdc.true
TEMP_DATABASE_NAMETemp database name if we want to use temp database for log based cdc.
build_dictionary_before_cdcIf we want to build database dictionary before every cdc for log-based cdc.false
oracle_logminer_dictionary_file_nameOracle dictionary file name.dictionary.ora
oracle_logminer_dictionary_file_pathPath on oracle server to dictionary file.
SOURCE_TIME_FORMATTime format of data for log-based cdc.
SOURCE_DATE_FORMATDate format of data for log-based cdc.
sqoop_export_job_map_mem_mbMapper mem setting for sqoop. Might need to increase for tables with huge rows.
sqoop_export_job_red_mem_mbPresent export jobs have no reducer, so not necessary now. This configuration will be useful in future.
export_multiplication_factorMultiplication factor arrays for creation of target table.2.0,1.5,1.0
shorten_td_export_destination_column_nameSpecifies if the column names should be shortened while creating the destination table.false
shorten_td_export_destination_table_nameSpecifies if the table names should be shortened while creating the destination table.false
td_table_and_column_name_limitThe limit for table name.128
netezza_export_escape_charThe escape char for stging file in NZ export, only applicable for external tables./
netezza_export_enclose_charThe enclose char for stging file in NZ export, only applicable for external tables. Possible options are single quote and double quote."
netezza_export_null_valueThe null value for stging file in NZ export, only applicable for external tables.""
netezza_export_should_allow_control_characters_in_dataShould control characters ASCII (0-31) be part of data.false
fail_job_on_post_hook_failureIf the job should fail when post hook fails.true
notification_mail_authentication_enabledFor the SMTP server, specifies if authentication is enabled.true
notification_mail_tls_enabledFor the SMTP server, specifies if TLS is enabled.true
default_driver_xmx_mbDefault driver memory setting for any Job without the job configurations512
iw_jobs_default_mr_java_opts_ratioDefault ratio of container xmx to container memory0.8
iw_jobs_default_mr_map_mem_mbdefault map memory for infoworks jobs2048
iw_jobs_default_mr_red_mem_mbdefault reducer memory for infoworks jobs2048
iw_jobs_default_mr_io_sort_ratioDefault ratio of MR Job io.sort.mb to minimum of mapper and reducer memories0.3
source_fetch_metadata_rdbms_driver_xmx_mbDriver Xmx memory for RDBMS Metadata crawl
source_fetch_metadata_sftp_driver_xmx_mbDriver Xmx memory for SFTP Metadata crawl
source_crawl_rdbms_driver_xmx_mbDriver Xmx memory for RDBMS and DFI Crawl
source_crawl_sftp_driver_xmx_mbDriver Xmx memory for SFTP Crawl
source_cdc_driver_xmx_mbDriver Xmx memory for CDC
source_merge_driver_xmx_mbDriver Xmx memory for Merge
source_export_driver_xmx_mbDriver Xmx memory for Export
default_driver_xmx_mbDriver Xmx memory for Sample Data Generation
pipeline_build_driver_xmx_mbDriver Xmx memory for DF batch build driver xmx
cube_build_driver_xmx_mbDriver Xmx memory for Cube build
default_driver_xmx_mbDriver Xmx memory for Delete entity
source_crawl_cdc_mapmerge_mr_map_mem_mbMapper memory CDC
source_crawl_cdc_mapmerge_mr_red_mem_mbReducer memory CDC
dfi_job_map_memMapper memory DFI crawl
dfi_job_red_memReducer memory DFI crawl
json_job_map_memMapper memory JSON crawl
json_job_red_memReducer memory JSON crawl
cube_star_schema_job_map_mem_mbMapper memory Cube build
cube_star_schema_job_red_mem_mbReducer memory Cube build
sqoop_export_job_map_mem_mbMapper memory Sqoop Export Data
sqoop_export_job_red_mem_mbReducer memory Sqoop Export Data
td_export_mr_map_mem_mbMapper memory Teradata Export Data
td_export_mr_red_mem_mbReducer memory Teradata Export Data
ADVANCED_ANALYTICS_DISABLEDTo disable advanced analytics node, this should be true.false
XML_ERROR_THRESHHOLDIf the number of error records increases this threshold, the mr job will fail.100
XML_KEEP_FILESIf the host type is local before the mr job runs, the xml files are copied to the tableId/xml directory. If this config is true, then the files are not deleted after the crawl.true
xml_job_map_memMapper memory for the crawl map reduce.Value of iw_jobs_default_mr_map_mem_mb in conf.properties
xml_job_red_memReducer memory for the crawl map reduce.Value of iw_jobs_default_mr_red_mem_mb in conf.properties
CSV_ERROR_THRESHHOLDIf the number of error records increases this threshold, the mr job will fail.100
CSV_KEEP_FILESIf the host type is local before the mr job runs, the csv files are copied to the tableId/csv directory. If this config is true, the files are not deleted after the crawl.true
CSV_TYPE_DETECTION_ROW_COUNTNumber of rows to be read for type detection/metacrawl.100
CSV_PARSER_LIBThe parser to be used for csv crawl.Default COMMONS, recommended UNIVOCITY.
CSV_SPLIT_SIZE_MBSplit size to be used for mr for every file.128
dfi_job_map_memMapper memory for the crawl map reduce.Value of iw_jobs_default_mr_map_mem_mb in conf.properties
dfi_job_red_memReducer memory for the crawl map reduce.Value of iw_jobs_default_mr_red_mem_mb in conf.properties
calc_file_level_ing_metricsIf this is set to true, the file level ingestion metrics are calculated at the end of the job.true
modified_time_as_cksumIf this is true, the modified time is used to determine if the file has been changed or not. If it is set to false, the actual checksum is calculated.false
delete_table_query_enabledBy default, Delete Query feature is available at the table level. Set IW Constant delete_table_query_enabled to false from UI to hide Delete Query feature.true
Ufi_max_failure_percentage_per_tableThe percentage of files for which the file ingestion is failed before the job is shown as failed.0.0
fetch_null_timestamped_recordsRecords having NULL values will be fetched for timestamp based incremental tables during full ingestion if the value is true.false
MAP_ORACLE_DATE_TO_TIMESTAMPIn earlier versions of Oracle, date can hold timestamps also. If the user wants to depict this into target as well, this constant must be set to truefalse
DFI_DELETE_ORIGINAL_FILE_AFTER_DECOMPRESSIONAfter the file ingestion, the original file will be deleted if this constant is set to true.False
MIN_ROWS_FOR_MERGEMRMinimum number of rows in the current data in a secondary partition for the merge. If a secondary partition has number of rows less than this value, the merge jobs will be combined until they reach this threshold.1000000
pipeline_interactivity_modeThis is a pipeline validation configuration, which when set to 'manual', removes automatic data type validation while saving node properties and enable a Validate option in the node properties page. If this value is set to 'auto', it automatically validates the node data.
cdh_impala_supportSet to true if Impala support is required
fail_ingestion_on_impala_cache_refresh_failureSet to true if the ingestion job is to be failed in case Impala cache metadata is not refreshed.
iw_hive_ssl_enabledTo Enable or Disable SSL on Hivefalse
iw_hive_ssl_truststore_pathPath to Hive Trust Store File Location
iw_hive_ssl_truststore_passwdEncrypted Password for Trust Store
bcp_crawl_separatorThe field separator for bcp import/export
bcp_row_delimiterThe row separator for bcp export
filesystem_schemeThe hdfs path will be pre-populated with the value set in this configuration. The values can be: s3, s3a, s3n, adls, wasb, gs.-
df_batch_sparkapp_settingsThis configuration is set to overwrite spark configurations like executor memory, driver memory, etc during pipeline batch build.
df_disable_sample_jobThis configurations is set to disable sample jobs after pipeline build.False
df_disable_cache_jobThis configurations is set to disable cache jobs after pipeline build.False
CREATE_DROP_TARGET_SCHEMAUsed to allow Infoworks to run create database and drop database commands. If this value is set to false, the create database and drop database commands will not be executed.True
target_schema_permissionUsed to enable or disable creation of database in pipelines. If this value is set to false, creation of database in pipelines will be disabled.True
df_dynamic_hive_udf_enabledUsed to enable or disable UDFs on Hive for pipelines. If this value is set to false, UDFs on Hive for pipelines will be disabled.True
number_of_parallel_jobs_per_entity

The number of jobs that can run in parallel for each entity.

Entity refers to either of the following: Source, Datamodel, Cube.

For example, if this value is set to 3, then a maximum of 3 source crawls or 3 data model builds or 3 cube builds can occur in parallel.

NA
max_number_of_connectionsThe maximum number of connections allowed between each table and the source, for example, RDBMS.NA
df_auto_exclusion_enabledUsed to set pipeline target column projection optimization on all nodes.True
df_merge_exec_pool_sizeNumber of concurrent tasks to run while performing merge on pipeline targets.5
df_fail_on_null_or_empty_partition_valueUsed to fail the pipeline jobs when partition values are Null or Empty.False
iw_udfs_dirIndicates the UDFs directory.NA
iw_hdfs_prefixDefault HDFS access protocol prefix.hdfs://
df_hive_analyze_worksUsed to enable analyze tables for Hive targets.True
df_label_auto_cast_enabledUsed to enable Auto Cast mode for Advanced Analytics Model Label Column.True
df_label_cast_typeUsed to indicate the default value for Advanced Analytics Model Label Column.Double
df_spark_masterIndicates the Spark master mode.local
df_hive_logging_freq_msUsed to poll for Hive logs and update on job logs.60000
df_target_hdfs_cleanupTo retain data on HDFS when the pipeline is removed, set this false.True
df_batch_spark_settingsSemi-colon separated list of Entity level Hive configurations. Applicable for batch mode pipeline build and to generate sample data in source.key1=value1;key2=value2;key3=value3
df_scd2_granularitySCD2 record change granularity on timestamp can be Second,Minute,Hour,Day,Month,Year. This can be set using advance configuration to overwrite configuration for all targets at once.second
df_custom_udfs_force_copyCustom UDFs in pipelines are only copied to HDFS when changes are detected. This configuration is used to overwrite changes to custom UDFs.False
df_disable_sample_jobUsed to disable sample job for pipeline targets after pipeline build.False
df_disable_cache_jobUsed to disable cache job for pipeline targets after pipeline build .False
df_spark_configfileSpark configuration file path for Interactive mode Spark pipelines./etc/spark2/conf/spark-defaults.conf
df_spark_configfile_batchSpark configuration file path for Batch mode Spark pipelines./etc/spark2/conf/spark-defaults.conf
df_batch_spark_coalesce_partitionsSpark coalesce configuration to create lesser files while writing to disk.NA
df_disable_current_loaderCustom Transformations cannot be loaded from current class loader. To load Custom Transformation from current classloader, set this configuration to False.True
df_overwrite_log_levelLog Level overwrite at pipeline level. For example, rootLogger=ERROR;io.infoworks=TRACE;infoworks=DEBUG;org.apache.spark=ERROR.NA
df_dynamic_hive_udf_enabledUsed to disable Hive UDF from loading for every job or request when add jar permissions are disabled. Set this configuration to false when permissions are disabled.True
df_validation_progress_percentUsed to set the validation process percent in pipeline batch jobs.10
df_schemasync_progress_percentUsed to set the Schema Sync progress percent in pipeline batch jobs.10
df_spark_merge_file_numSpark configuration to merge files using Coalesce option on dataframe during merge process.1
iw_df_ext_prefixPipeline extension prefix.hdfs://
storage_formatUsed to set the default storage format value for pipeline target.ORC
user_extensions_base_pathPipeline extension base path.NA
source_crawl_schema_merge_mr_map_mem_mbMapper memory for schema merge jobs.NA
source_crawl_schema_merge_mr_red_mem_mbReducer memory for schema merge jobs.NA
source_crawl_schema_mapmerge_mr_map_mem_mbMapper memory for map-only schema merge jobs.NA
SUPPORT_RESERVED_KEYWORDSUsed to enable or disable support for reserved keywords. To enables Hive connection on HDP 3.1, this value must be set to false.True
df_cutpoint_optimization_enabledEnables caching when the value is set to true.False
CSV_NULL_STRINGUsed to set the NULL string. This configuration is available on table, source and global level.NULL
logpathPath to logfile for mongo$IW_HOME/logs/mongod.log
logappend-true
forkIf set to true, fork and run in backgroundtrue
portMongo port27017
dbpath-$IW_HOME/resources/mongodb/data
pidfilepathLocation of pidfile$IW_HOME/resources/mongodb/mongod
bind_ipIf set to loopback address, local interface only. If not specified, listen on all interfaces0.0.0.0
authEnable authentication for mongo when settrue
noauthDisable authentication for mongo when settrue
wiredTigerCacheSizeGB-4
replSetIn replicated mongo databases, specify the replica set namers0
oplogSizeMaximum size in megabytes for replication operation log128
keyFilePath to a key file storing authentication info for connections between replica set members$IW_HOME/resources/mongodb/mongodb/
NETEZZA_EXPORT_DELIMITERThe record delimiter in the temporary file while transferring (Only applicable for external tables mode).
df_impala_incremental_stats_enabled-
max_connectionsDetermines the maximum number of concurrent connections to the database server.100
listen_addressesSpecifies the TCP/IP address(es) on which the server is to listen for connections from application. It can be a comma-separated list of addresses(host names and/or numeric IP addresses). Use * to specify all available IP interfaces; 0.0.0.0 to specify all IPv4 addresses; localhost to specify only local TCP/IP “loopback” connections to be made; empty to specify that server does not listen on any IP interfacelocalhost
shared_buffersSets the amount of memory the database server uses for shared memory buffers. Min 128kB128MB
dynamic_shared_memory_typeSpecifies the dynamic shared memory implementation that the server should use. First option supported by OS (posix, sysv, windows, mmap). Use none to disable dynamic shared memory.
log_timezoneSets the time zone used for timestamps written in the server logUTC
datestyleSets the display format for date and time values, as well as the rules for interpreting ambiguous date input values. It contains two independent components- the output format specification (ISO, Postgres, SQL, or German) and the input/output specification for year/month/day ordering (DMY, MDY, or YMD)'iso, mdy'
timezoneSets the time zone for displaying and interpreting time stampsUTC
lc_messagesSets the language in which messages are displayed.en_US
lc_monetarySets the locale to use for formatting monetary amountsen_US
lc_numericSets the locale to use for formatting numbersen_US
lc_timeSets the locale to use for formatting dates and timesen_US
default_text_search_configDefault configuration for text searchpg_catalog.english
wal_levelDetermines how much information is written to the WALminimal
archive_modeEnables archiving. When archive_mode is enabled, completed WAL segments are sent to archive storage by setting archive commandoff
archive_commandCommand to use to archive a logfile segment. Placeholders: %p = path of file to archive. %f = file name only. (e.g. 'test ! -f /mnt/server/archivedir/%f cp %p /mnt/server/archivedir/%f')
max_wal_sendersSpecifies the maximum number of concurrent connections from standby servers or streaming base backup clients. Set this on the master and on any standby that will send replication data. Must be less than max_connections0
data_directoryThis config allows use of data in another directoryConfigDir
hba_fileThis config specifies the file location for the host-based authentication configurationConfigDir/pg_hba.conf
ident_file-ConfigDir/pg_ident.conf
external_pid_fileThis config specifies the file location extra PID file. If external_pid file is not explicitly set, no extra PID file is written
minConnectionsThis config controls number of connections on postgres. If less number of workflows are executing at the same time increase its value100
superuser_reserved_connections-3
unix_socket_directoriesComma separated list of directories/tmp
unix_socket_groupRestart postgres and orchestrator services
unix_socket_permissionsBegin with 0 to use octal notation0777
bonjourSet this configuration to 'on' to advertise server via Bonjouroff
bonjour_name-computer_name
authentication_timeout-1min
ssl-off
ssl_ciphersAllowed SSL ciphersHIGH:MEDIUM:+3DES:!aNULL
ssl_prefer_server_ciphers-on
ssl_ecdh_curve-prime256v1
ssl_cert_file-server.crt
ssl_key_file-server.key
ssl_ca_file-
ssl_crl_file-
password_encryption-on
db_user_namespace-off
row_security-on
krb_server_keyfile-off
krb_caseins_users-off
tcp_keepalives_idleTCP_KEEPIDLE (in seconds) - The time the connection needs to remain idle before TCP starts sending keepalive probes. 0 selects the system default0
tcp_keepalives_intervalTCP_KEEPINTVL (in seconds) - The time between individual keepalive probes. 0 selects the system default0
tcp_keepalives_countTCP_KEEPCNT (in seconds) - The maximum number of keepalive probes TCP should send before dropping the connection. 0 selects the system default0
shared_buffersMin 128kB128MB
huge_pages-try
temp_buffersMin 800kB8MB
SQOOPLIMIT-
FETCHSIZE-
ORACLE_LOGMINER_DICTIONARY_FILE_PATH-
ORACLE_ARCHIVE_LOG_PATHThe absolute path to directory where archive logs are stored.
ORACLE_LOGMINER_DICTIONARY_FILE_NAME-
BUILD_DICTIONARY_BEFOR_CDCSetting the value to true builds a dictionary before every CDC.false
INCLUDE_CURRENT_LOG-
DATABASE_OBJECT_TYPES-
TEMP_LOG_TABLE_NAME-
LOGMINER_TEMP_TABLE_TABLESPACE_NAME-
CREATE_TEMP_TABLE_PARALLELISM-
TEMP_DATABASE_NAME-
TEMP_TABLE_INDEX_NAME-
CREATE_INDEX_ON_TEMP_TABLE-
USE_NEW_TABLESPACE_FOR_LOGMINER-
USE_REDO_LOG_DICTIONARYSetting the value to true uses redo log dictionary to read archive logs with DDL tracking. The value must be set to true in case of schema synchronization.False
FETCH_NULL_TIMESTAMPED_RECORDS-
ORACLE_ARCHIVE_LOG_INFO_OBJECT_NAME-
ENABLE_DDL_TRACKING-
IS_STAGING_ORACLE_SERVER-
DROP_HIVE_SCHEMA-
DISABLECATEGORICAL-
DISABLEADDITIONALTABLEINFO-
DISABLEROWCOUNT-
TPT_SCRIPT_PATH-
TPT_CHECKPOINTS_PATH-
TPT_LOG_PATH-
TPT_CHARACTERSET-
SOURCE_DATE_FORMAT-
SOURCE_TIME_FORMAT-
SFTP_BUFFER_SIZE-
GZIP_FILE_EXTENSION-
ENCODE_PRIMARY_PARTITION-
UFI_INGEST_HIDDEN_FILES-
UFI_MAX_FAILURE_PERCENTAGE_PER_TABLE-
XML_INPUT_FILE_ENCODING-
FIXED_WIDTH_INPUT_FILE_ENCODING-
FIXED_WIDTH_RECORD_SEPARATOR-
FIXED_WIDTH_PADDING_CHARACTER-
FIXED_WIDTH_PARSER_LIB-
MAX_FIXED_WIDTH_RECORD_SIZE-
FIXED_WIDTH_ERROR_THRESHHOLDIf the number of error records increases this threshold, the MR job fails.100
FIXED_WIDTH_KEEP_FILESIf the host type is local before the MR job runs, the csv files are copied to the tableId/CSV directory. If this config is true, then the files are not deleted after the crawl.True
FIXED_WIDTH_COMMENT_START_CHARACTER-
FIXED_WIDTH_SKIP_CHARACTERS_UNTIL_NEW_LINE-
FIXED_WIDTH_ROW_ENDS_WITH_NEW_LINE-
FIXED_WIDTH_TYPE_DETECTION_ROW_COUNT-
CDC_START_TIMESTAMP-
CDC_END_TIMESTAMP-
LOG_BASED_CDC_SPLIT_BY_COLUMN-
IS_WIDE_TABLE-
VALIDATECHUNKS-
FAILCHUNKWHENCOUNTDOESNTMATCH-
CSV_INPUT_FILE_ENCODING-
JSON_INPUT_FILE_ENCODING-
JSON_ERROR_THRESHHOLDIf the number of error records increases this threshold, the MR job fails.100
JSON_KEEP_FILESIf the host type is local before the MR job runs, the CSV files are copied to the tableId/CSV directory. If this value is true, the files are not deleted after the crawl.True
JSON_TYPE_DETECTION_ROW_COUNTNumber of rows to be read for type detection/metacrawl.100
CONTROL_FILE_READER-
VALIDATE_AFTER_SFTP_TO_LOCAL-
VALIDATE_AFTER_LOCAL_TO_HDFS-
SEC_PARTITION_MERGE_RED-
VALIDATE_AFTER_HDFS_TO_HIVE-
CSV_MULTILINE_MODE-
TPT_DELIMITER-
TPT_FILE_ESCAPE_CHAR-
TPT_FILE_QUOTE_ESCAPE_CHAR-
TPT_FILE_ENCLOSE_CHAR-
TPT_EXPORT_SPOOL_MODE-
TPT_JOB_RESTART_LIMIT-
TPT_CHECKPOINT_INTERVAL_IN_SECONDS-
TPT_EXPORT_BLOCK_SIZE-
TPT_EXPORT_TENACITY_HOURS-
TPT_EXPORT_RETRY_INTERVAL_MINS-
TPT_IO_BUFFER_SIZE-
TPT_HADOOP_BLOCK_SIZE-
TPT_EXPORT_MAX_SESSIONS-
TPT_EXPORT_MIN_SESSIONS-
TPT_EXPORT_OPERATOR_MAX_DECIMALDIGITS-
TPT_EXPORT_READER_INSTANCES-
TPT_EXPORT_WRITER_INSTANCES-
TPT_UNICODE_MULTIPLICATION_FACTOR-
TPT_ASCII_MULTIPLICATION_FACTOR-
TPT_UTF8_MULTIPLICATION_FACTOR-
TPT_UTF16_MULTIPLICATION_FACTOR-
USE_TPT_SELECTOR_OPERATOR-
USE_TPT_GENERATED_SCHEMA-
ENABLE_TPT_TRACE_LEVEL-
USE_TPT_DESTINATION_AS_HDFS-
USE_TPT_TDCH_INTERFACE-
TPT_FILE_FORMAT-
SFTP_STAGING_BASE_PATH-

Admin Configurations Moved from Global to Entity Levels

The following table lists the admin configurations that are moved from global to entity levels:

IWConstantCurrent LevelMoved to
UFI_INGEST_HIDDEN_FILESGlobalTable
UFI_MAX_FAILURE_PERCENTAGE_PER_TABLEGlobalTable
EXPORT_PARALLELIZATION_FACTORGlobalGlobal
XML_INPUT_FILE_ENCODINGGlobalTable
XML_ERROR_THRESHHOLDGlobalTable
XML_KEEP_FILESGlobalTable
FIXED_WIDTH_INPUT_FILE_ENCODINGGlobalTable
FIXED_WIDTH_RECORD_SEPARATORGlobalTable
FIXED_WIDTH_PADDING_CHARACTERGlobalTable
FIXED_WIDTH_PARSER_LIBGlobalTable
MAX_FIXED_WIDTH_RECORD_SIZEGlobalTable
FIXED_WIDTH_THRESHHOLDGlobalTable
FIXED_WIDTH_KEEP_FILESGlobalTable
FIXED_WIDTH_COMMENT_START_CHARACTERGlobalTable
FIXED_WIDTH_SKIP_CHARACTERS_UNTIL_NEW_LINEGlobalTable
FIXED_WIDTH_ROW_ENDS_WITH_NEW_LINEGlobalTable
FIXED_WIDTH_TYPE_DETECTION_ROW_COUNTGlobalTable
sqoopLimitSource
fetchSizeSource
CDC_START_TIMESTAMPTable
CDC_END_TIMESTAMPTable
LOG_BASED_CDC_SPLIT_BY_COLUMNTable
ORACLE_LOGMINER_DICTIONARY_FILE_PATHSource
ORACLE_ARCHIVE_LOG_PATHSource
ORACLE_LOGMINER_DICTIONARY_FILE_NAMESource
BUILD_DICTIONARY_BEFOR_CDCSource
TABLE_SCHEMA_FULL_REFRESHGlobal
ORACLE_DATE_FUNCTIONSGlobalGlobal
INCLUDE_CURRENT_LOGGlobalSource
DATABASE_OBJECT_TYPESGlobalSource
TEMP_LOG_TABLE_NAMESourceSource
LOGMINER_TABLESPACE_NAMESource
LOGMINER_TEMP_TABLE_TABLESPACE_NAMESource
CREATE_TEMP_TABLE_PARALLELISMSource
TEMP_DATABASE_NAMESourceSource
TEMP_TABLE_INDEX_NAMESource
CREATE_INDEX_ON_TEMP_TABLESource
USE_TEMP_TABLE_FOR_LOG_BASED_CDCSourceSource
USE_NEW_TABLESPACE_FOR_LOGMINERSource
IS_WIDE_TABLEGlobalTable
USE_REDO_LOG_DICTIONARYSourceSource
MAP_ORACLE_DATE_TO_TIMESTAMPSourceSource
FETCH_NULL_TIMESTAMPED_RECORDSSource
ORACLE_ARCHIVE_LOG_INFO_OBJECT_NAMESourceSource
ENABLE_DDL_TRACKINGSource
IS_STAGING_ORACLE_SERVERSourceSource
DROP_HIVE_SCHEMAGlobalSource
disableCategoricalGlobalSource
disableAdditionalTableInfoGlobalSource
disableRowCountGlobalSource
SKIP_RELOADED_CHUNKSTableTable
enable_schema_synchronizationTableTable
validateChunksAdminTable
failChunkWhenCountDoesntMatchTableTable
CSV_PARSER_LIBAdminTable
CSV_SPLIT_SIZE_MBAdminTable
CSV_INPUT_FILE_ENCODINGAdminTable
JSON_INPUT_FILE_ENCODINGAdminTable
CSV_ERROR_THRESHHOLDAdminTable
JSON_ERROR_THRESHHOLDAdminTable
CSV_KEEP_FILESAdminTable
JSON_KEEP_FILESAdmintable
CSV_TYPE_DETECTION_ROW_COUNTAdminTable
JSON_TYPE_DETECTION_ROW_COUNTAdminTable
CONTROL_FILE_READERAdminTable
VALIDATE_AFTER_SFTP_TO_LOCALTableTable
VALIDATE_AFTER_LOCAL_TO_HDFSAdminTable
SEC_PARTITION_MERGE_REDAdminTable
VALIDATE_AFTER_HDFS_TO_HIVEAdminTable
CSV_MULTILINE_MODEAdminTable
TPT_DEFAULT_CHARSETGlobal
TPT_DELIMITERTable
TPT_FILE_ESCAPE_CHARTable
TPT_FILE_QUOTE_ESCAPE_CHARTable
TPT_FILE_ENCLOSE_CHARTable
TPT_SCRIPT_PATHSource
TPT_EXPORT_SPOOL_MODETable
TPT_CHECKPOINTS_PATHSource
TPT_LOG_PATHSource
TPT_CHARACTERSETSource
TPT_JOB_RESTART_LIMITTable
TPT_CHECKPOINT_INTERVAL_IN_SECONDSTable
TPT_EXPORT_BLOCK_SIZETable
TPT_EXPORT_TENACITY_HOURSTable
TPT_EXPORT_RETRY_INTERVAL_MINSTable
TPT_IO_BUFFER_SIZETable
TPT_HADOOP_BLOCK_SIZETable
TPT_EXPORT_MAX_SESSIONSTable
TPT_EXPORT_MIN_SESSIONSTable
TPT_EXPORT_OPERATOR_MAX_DECIMALDIGITSTable
TPT_EXPORT_READER_INSTANCESTable
TPT_EXPORT_WRITER_INSTANCESTable
TPT_UNICODE_MULTIPLICATION_FACTORTable
TPT_ASCII_MULTIPLICATION_FACTORTable
TPT_UTF8_MULTIPLICATION_FACTORTable
TPT_UTF16_MULTIPLICATION_FACTORTable
USE_TPT_SELECTOR_OPERATORTable
IS_TIME_FORAMT_NEEDEDGlobal
USE_TPT_GENERATED_SCHEMATable
ENABLE_TPT_TRACE_LEVELTable
USE_TPT_DESTINATION_AS_HDFSTable
USE_TPT_TDCH_INTERFACETable
TPT_FILE_FORMATTable
SOURCE_DATE_FORMATSource
SOURCE_TIME_FORMATSource
SFTP_STAGING_BASE_PATHTable
SFTP_BUFFER_SIZESource
GZIP_FILE_EXTENSIONSource
ENCODE_PRIMARY_PARTITIONSource
DFI_DELETE_ORIGINAL_FILE_AFTER_DECOMPRESSIONSource
export_multiplication_factorTable/ Pipeline
netezza_export_escape_charTable/ Pipeline
netezza_export_enclose_charTable/ Pipeline
netezza_export_null_valueTable/ Pipeline
netezza_export_delimiterTable/ Pipeline
bcp_crawl_separatorTable/ Pipeline
bcp_row_delimiterTable/ Pipeline
DF_SNOWFLAKE_VALIDATE_ROW_COUNTPipeline

Updating Password for MongoDB and HIVE

Perform the following to encrypt and update password for MongoDB or Hive in the Infoworks DataFoundry system:

  • Execute the following interactive bash script: $IW_HOME/apricot-meteor/infoworks_python/infoworks/bin/infoworks_security.sh -encrypt -p
  • Enter the new plain text password when prompted.
  • Copy the encrypted password displayed in the result and update in the Infoworks conf.properties file located in the $IW_HOME/conf folder.

Backup

This feature allows administrators to take a backup of the current metadata stored in MongoDB.

  • In the System Configuration page, click Backup.

The Metadata Database Backup page includes two sections: Backup Schedule, Recent Backups

Backup Schedule

The backup can be taken immediately or it can be scheduled to run whenever required.

  • The Backup Now button takes an immediate backup of the metadata and is disabled by default until the target path to store the backup is specified.
  • To specify the target path click Edit Schedule, add absolute target path of local file system, configure any schedule if needed by selecting Enabled in Status dropdown.
  • Click Save to save the configuration.
  • Click Backup Now to run the backup metadata job in the background. After successful completion of backup, an entry will be added in the Recent Backup view.

Recent Backup

The recent backup view displays the list of all previous backup of the Infoworks DataFoundry metadata.

  • Date: Timestamp on which the backup was taken.
  • Filename: Path of the file where the backup was taken.
  • Status: Whether the backup was successful or not.

Notification

The Notification feature allows the admin to configure emails of success or failure of various jobs, and any issues with Infoworks DataFoundry installation.

NOTE: Email notifications require an SMTP server and an email account to send emails from.

This section describes the steps to configure the notifications.

  • Open the platform-config.json file from the $IW_HOME/ platform/conf folder.
  • Set the following configurations in the messaging-service section as required.
Copy
  • Edit the smtpUsername and smtpHost parameters with required values. If the email address provided in the smtpUsername parameter is of a Gmail account, do not make any change to the smtpHost value.
  • Provide the AES encrypted email password for the specified email address as the smtpPassword. If no password is setup for the specified email ID, provide the AES encrypted empty string.
  • To encrypt the password, run the following command : $IW_HOME/apricot-meteor/infoworks_python/infoworks/bin/infoworks_security.sh -encrypt -p <yourpassword>
  • Restart services using the following commands:

$IW_HOME/bin/stop.sh platform

$IW_HOME/bin/start.sh platform

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard