System Configuration

The System Configuration page allows you to add/delete configurations and change the values and descriptions of configuration keys. The autocomplete feature allows you to select the required configuration from the drop-down list based on your input.

Click the Admin menu and click the Configuration icon. The System Configuration page will be displayed.

The following table lists all the configuration parameters and their descriptions.

System Configuration Keys and Descriptions

Configuration Parameter	Description	Default Value
hiveConfigurationVariables	Extra parameters for the hive connection. Default when we ship must be 'hive.auto.convert.join=false; hive.insert.into.multilevel.dirs=true'	'hive.auto.convert.join=falsehive.insert.into.multilevel.dirs=true' NOTE: Other values can be:hive.auto.convert.join=false;hive.insert.into.multilevel.dirs=true;hive.mapred.supports.subdirectories=true; mapred.input.dir.recursive=true; hive.optimize.insert.dest.volume=true; hive.exec.parallel=true;
numRowsSampleData	Number of rows to be retrieved as part of sample data.	100
numGrpsRange	Number of groups to be created while computing categorical data.
dw_join_numreducers	Number of reducers for data warehouse join.	10
dw_join_parallelism	Number of parallel joins for data warehouse build.	3
disable_categorical_data	If true, categorical data computation will be skipped.	false
source_ingest_parallelism	Number of tables to crawl in parallel.	2
disable_additional_table_info	If true, fetching sample data and computing categorical data will be skipped.	false
source_ingest_mappers	Number of mappers to crawl a table.	1
sqoopNumberOfReducers	Number of reducers to crawl a table	20
number_of_secondary_partition	Number of secondary partitions in each primary partition	2
extract_row_limit	Number of rows to crawl from a table per mapper, -1 will do full table crawl without any limit.	0
END_DATE_STRING	Represents expiration date for each record on Hive. This is an IW audit column.	9999-09-09
END_TIMESTAMP_STRING	Represents expiration timestamp for each record on Hive. This is an IW audit column.	9999-09-09 09:09:09
disable_row_count	If true, metadata crawl will not fetch row count for source tables.	false
iw_job_timeout_secs	Job timeout in milliseconds for Infoworks MR jobs.
iw_jobs_default_mr_map_mem_mb	Default map memory for Infoworks jobs.
iw_jobs_default_mr_red_mem_mb	Default reducer memory for Infoworks jobs.
fetch_size	In RDBMS ingestion using JDBC, this parameter can be added as a constant at the table level. This takes an integer (default size: 5000). This parameter is used to fetch at once, the number of rows using JDBC, which improves the performance for large table ingestions.	5000
cube_star_schema_job_map_mem_mb	Cube starschema job mapper memory.
cube_star_schema_job_red_mem_mb	Cube starschema job reducer memory.
max_number_of_chunks	Maximum number of chunks allowed while chunking a table.	2000
SKIP_RELOADED_CHUNKS	If true, already loaded chunks will be skipped during chunk loading.	true
enable_schema_synchronization	If true, table schema synchronization will be triggered before data synchronization.	false
db_time_zone	Database server time zone.	MST
NUM_PARALLEL_MERGE_TASKS	Number of parallel merge tasks while merging a table.	1
MERGE_CACHE_MB	Cache size for merging delta files.	0
source_crawl_cdc_mapmerge_mr_map_mem_mb
check_number_of_partitions	If false, checking available number of partitions during full load of a table will be disabled.	false
NUM_PARALLEL_MERGE_INMEM_LOADS	Number of parallel merge jobs that will be loaded in memory.	1
USE_COMBINE_INPUT_FORMAT	true
source_crawl_cdc_mapmerge_mr_red_mem_mb
parallel_table_merge	Number of different tables merge that can happen in parallel.	5
num_parallel_jobs_per_entity	Number of jobs that can run in parallel per entity (source crawl, data warehouse build, cube build).	2
disable_row_count_for_hive		false
recrawl_table_schema_on_table_truncate	If set to true, on table truncate, table schema will be re-crawled to get latest schema from source.	true
validate_chunks	If set to true, after every chunk load, data validation at chunk level will be triggered.	true
enable_job_logs_to_mongo	If set to true, job logs will go to file and mongo.	false
hive_conn_string_delimiter	Delimiter to use while forming Hive JDBC connection string.	?
df_workspace_schema	Schema where Data Transformation stores all intermediate tables.	iw_df_workspace
df_workspace_base_path	HDFS path where Data Transformation stores data for all intermediate data.	/iw/df/workspace
df_shared_connection_enabled	Whether to reuse the connection for interactive mode. Typically true for Hive+Tez and false for Hive+MR.	true
df_shared_connection_timeout_ms	After how long should an idle shared connection be closed. Shared connection should be enabled.	180000
df_interactive_exec_pool_size	Thread pool size for executing tasks per request in the interactive mode.	10
df_interactive_table_maximum_size	Maximum size of any of the intermediate tables in interactive mode.	10000
df_batch_exec_pool_size	Thread pool size for executing tasks per request in the batch mode.	5
df_sampling_num_threads	Number of sampling tasks to run in parallel.	5
df_interactive_hive_settings	Semi-colon separated list of Entity level Hive configurations which includes the Hive configuration parameters that can be modified.	key1=value1 key2=value2 key3=value3
df_batch_hive_settings	If the builds are slow, use this key to set or change Hive parameters at a pipeline level. These are semi-colon separated list of Entity level Hive configurations.	key1=value1 key2=value2 key3=value3
df_compute_stats_enabled	When enabled, table-statistics are computated for merge or overwrite targets at the end of pipeline build job.	true
PIPELINE_SOURCES_AUTO_SYNC_CHECK_DISABLED	If set to true, it disables the error icon on the pipeline editor page indicating source schema mismatch. If set to false, the error icon displays on the pipeline editor page.
random_sampling_enabled	Determines whether sample generation, for source or pipeline target, needs to be random. To be disabled when sampling is taking too long to complete or is failing.	true
CUBE_DIMENSIONS_ENABLE_DICTIONARY	If dimension tables need to be stored in memory.	true
sqoop_job_map_mem_mb	Mapper mem setting for sqoop. Might need to increase for tables with huge rows.
sqoop_job_red_mem_mb	Reducer mem setting for sqoop.
append_date_to_logfile	Append current date to IW job log file name.	true
LOGMINER_TABLESPACE_NAME	Table space name.
use_new_tablespace	If we want to start logminer in different tablespace.	false
use_temp_table_for_log_based_cdc	To use temp table approach for log based cdc.	true
TEMP_DATABASE_NAME	Temp database name if we want to use temp database for log based cdc.
build_dictionary_before_cdc	If we want to build database dictionary before every cdc for log-based cdc.	false
oracle_logminer_dictionary_file_name	Oracle dictionary file name.	dictionary.ora
oracle_logminer_dictionary_file_path	Path on oracle server to dictionary file.
SOURCE_TIME_FORMAT	Time format of data for log-based cdc.
SOURCE_DATE_FORMAT	Date format of data for log-based cdc.
sqoop_export_job_map_mem_mb	Mapper mem setting for sqoop. Might need to increase for tables with huge rows.
sqoop_export_job_red_mem_mb	Present export jobs have no reducer, so not necessary now. This configuration will be useful in future.
export_multiplication_factor	Multiplication factor arrays for creation of target table.	2.0,1.5,1.0
shorten_td_export_destination_column_name	Specifies if the column names should be shortened while creating the destination table.	false
shorten_td_export_destination_table_name	Specifies if the table names should be shortened while creating the destination table.	false
td_table_and_column_name_limit	The limit for table name.	128
netezza_export_escape_char	The escape char for stging file in NZ export, only applicable for external tables.	/
netezza_export_enclose_char	The enclose char for stging file in NZ export, only applicable for external tables. Possible options are single quote and double quote.	"
netezza_export_null_value	The null value for stging file in NZ export, only applicable for external tables.	""
netezza_export_should_allow_control_characters_in_data	Should control characters ASCII (0-31) be part of data.	false
fail_job_on_post_hook_failure	If the job should fail when post hook fails.	true
notification_mail_authentication_enabled	For the SMTP server, specifies if authentication is enabled.	true
notification_mail_tls_enabled	For the SMTP server, specifies if TLS is enabled.	true
default_driver_xmx_mb	Default driver memory setting for any Job without the job configurations	512
iw_jobs_default_mr_java_opts_ratio	Default ratio of container xmx to container memory	0.8
iw_jobs_default_mr_map_mem_mb	default map memory for infoworks jobs	2048
iw_jobs_default_mr_red_mem_mb	default reducer memory for infoworks jobs	2048
iw_jobs_default_mr_io_sort_ratio	Default ratio of MR Job io.sort.mb to minimum of mapper and reducer memories	0.3
source_fetch_metadata_rdbms_driver_xmx_mb	Driver Xmx memory for RDBMS Metadata crawl
source_fetch_metadata_sftp_driver_xmx_mb	Driver Xmx memory for SFTP Metadata crawl
source_crawl_rdbms_driver_xmx_mb	Driver Xmx memory for RDBMS and DFI Crawl
source_crawl_sftp_driver_xmx_mb	Driver Xmx memory for SFTP Crawl
source_cdc_driver_xmx_mb	Driver Xmx memory for CDC
source_merge_driver_xmx_mb	Driver Xmx memory for Merge
source_export_driver_xmx_mb	Driver Xmx memory for Export
default_driver_xmx_mb	Driver Xmx memory for Sample Data Generation
pipeline_build_driver_xmx_mb	Driver Xmx memory for DF batch build driver xmx
cube_build_driver_xmx_mb	Driver Xmx memory for Cube build
default_driver_xmx_mb	Driver Xmx memory for Delete entity
source_crawl_cdc_mapmerge_mr_map_mem_mb	Mapper memory CDC
source_crawl_cdc_mapmerge_mr_red_mem_mb	Reducer memory CDC
dfi_job_map_mem	Mapper memory DFI crawl
dfi_job_red_mem	Reducer memory DFI crawl
json_job_map_mem	Mapper memory JSON crawl
json_job_red_mem	Reducer memory JSON crawl
cube_star_schema_job_map_mem_mb	Mapper memory Cube build
cube_star_schema_job_red_mem_mb	Reducer memory Cube build
sqoop_export_job_map_mem_mb	Mapper memory Sqoop Export Data
sqoop_export_job_red_mem_mb	Reducer memory Sqoop Export Data
td_export_mr_map_mem_mb	Mapper memory Teradata Export Data
td_export_mr_red_mem_mb	Reducer memory Teradata Export Data
ADVANCED_ANALYTICS_DISABLED	To disable advanced analytics node, this should be true.	false
XML_ERROR_THRESHHOLD	If the number of error records increases this threshold, the mr job will fail.	100
XML_KEEP_FILES	If the host type is local before the mr job runs, the xml files are copied to the tableId/xml directory. If this config is true, then the files are not deleted after the crawl.	true
xml_job_map_mem	Mapper memory for the crawl map reduce.	Value of iw_jobs_default_mr_map_mem_mb in conf.properties
xml_job_red_mem	Reducer memory for the crawl map reduce.	Value of iw_jobs_default_mr_red_mem_mb in conf.properties
CSV_ERROR_THRESHHOLD	If the number of error records increases this threshold, the mr job will fail.	100
CSV_KEEP_FILES	If the host type is local before the mr job runs, the csv files are copied to the tableId/csv directory. If this config is true, the files are not deleted after the crawl.	true
CSV_TYPE_DETECTION_ROW_COUNT	Number of rows to be read for type detection/metacrawl.	100
CSV_PARSER_LIB	The parser to be used for csv crawl.	Default COMMONS, recommended UNIVOCITY.
CSV_SPLIT_SIZE_MB	Split size to be used for mr for every file.	128
dfi_job_map_mem	Mapper memory for the crawl map reduce.	Value of iw_jobs_default_mr_map_mem_mb in conf.properties
dfi_job_red_mem	Reducer memory for the crawl map reduce.	Value of iw_jobs_default_mr_red_mem_mb in conf.properties
calc_file_level_ing_metrics	If this is set to true, the file level ingestion metrics are calculated at the end of the job.	true
modified_time_as_cksum	If this is true, the modified time is used to determine if the file has been changed or not. If it is set to false, the actual checksum is calculated.	false
delete_table_query_enabled	By default, Delete Query feature is available at the table level. Set IW Constant delete_table_query_enabled to false from UI to hide Delete Query feature.	true
Ufi_max_failure_percentage_per_table	The percentage of files for which the file ingestion is failed before the job is shown as failed.	0.0
fetch_null_timestamped_records	Records having NULL values will be fetched for timestamp based incremental tables during full ingestion if the value is true.	false
MAP_ORACLE_DATE_TO_TIMESTAMP	In earlier versions of Oracle, date can hold timestamps also. If the user wants to depict this into target as well, this constant must be set to true	false
DFI_DELETE_ORIGINAL_FILE_AFTER_DECOMPRESSION	After the file ingestion, the original file will be deleted if this constant is set to true.	False
MIN_ROWS_FOR_MERGEMR	Minimum number of rows in the current data in a secondary partition for the merge. If a secondary partition has number of rows less than this value, the merge jobs will be combined until they reach this threshold.	1000000
pipeline_interactivity_mode	This is a pipeline validation configuration, which when set to 'manual', removes automatic data type validation while saving node properties and enable a Validate option in the node properties page. If this value is set to 'auto', it automatically validates the node data.
cdh_impala_support	Set to true if Impala support is required
fail_ingestion_on_impala_cache_refresh_failure	Set to true if the ingestion job is to be failed in case Impala cache metadata is not refreshed.
iw_hive_ssl_enabled	To Enable or Disable SSL on Hive	false
iw_hive_ssl_truststore_path	Path to Hive Trust Store File Location
iw_hive_ssl_truststore_passwd	Encrypted Password for Trust Store
bcp_crawl_separator	The field separator for bcp import/export
bcp_row_delimiter	The row separator for bcp export
filesystem_scheme	The hdfs path will be pre-populated with the value set in this configuration. The values can be: s3, s3a, s3n, adls, wasb, gs.	-
df_batch_sparkapp_settings	This configuration is set to overwrite spark configurations like executor memory, driver memory, etc during pipeline batch build.
df_disable_sample_job	This configurations is set to disable sample jobs after pipeline build.	False
df_disable_cache_job	This configurations is set to disable cache jobs after pipeline build.	False
CREATE_DROP_TARGET_SCHEMA	Used to allow Infoworks to run create database and drop database commands. If this value is set to false, the create database and drop database commands will not be executed.	True
target_schema_permission	Used to enable or disable creation of database in pipelines. If this value is set to false, creation of database in pipelines will be disabled.	True
df_dynamic_hive_udf_enabled	Used to enable or disable UDFs on Hive for pipelines. If this value is set to false, UDFs on Hive for pipelines will be disabled.	True
number_of_parallel_jobs_per_entity	The number of jobs that can run in parallel for each entity. Entity refers to either of the following: Source, Datamodel, Cube. For example, if this value is set to 3, then a maximum of 3 source crawls or 3 data model builds or 3 cube builds can occur in parallel.	NA
max_number_of_connections	The maximum number of connections allowed between each table and the source, for example, RDBMS.	NA
df_auto_exclusion_enabled	Used to set pipeline target column projection optimization on all nodes.	True
df_merge_exec_pool_size	Number of concurrent tasks to run while performing merge on pipeline targets.	5
df_fail_on_null_or_empty_partition_value	Used to fail the pipeline jobs when partition values are Null or Empty.	False
iw_udfs_dir	Indicates the UDFs directory.	NA
iw_hdfs_prefix	Default HDFS access protocol prefix.	hdfs://
df_hive_analyze_works	Used to enable analyze tables for Hive targets.	True
df_label_auto_cast_enabled	Used to enable Auto Cast mode for Advanced Analytics Model Label Column.	True
df_label_cast_type	Used to indicate the default value for Advanced Analytics Model Label Column.	Double
df_spark_master	Indicates the Spark master mode.	local
df_hive_logging_freq_ms	Used to poll for Hive logs and update on job logs.	60000
df_target_hdfs_cleanup	To retain data on HDFS when the pipeline is removed, set this false.	True
df_batch_spark_settings	Semi-colon separated list of Entity level Hive configurations. Applicable for batch mode pipeline build and to generate sample data in source.	key1=value1;key2=value2;key3=value3
df_scd2_granularity	SCD2 record change granularity on timestamp can be Second,Minute,Hour,Day,Month,Year. This can be set using advance configuration to overwrite configuration for all targets at once.	second
df_custom_udfs_force_copy	Custom UDFs in pipelines are only copied to HDFS when changes are detected. This configuration is used to overwrite changes to custom UDFs.	False
df_disable_sample_job	Used to disable sample job for pipeline targets after pipeline build.	False
df_disable_cache_job	Used to disable cache job for pipeline targets after pipeline build .	False
df_spark_configfile	Spark configuration file path for Interactive mode Spark pipelines.	/etc/spark2/conf/spark-defaults.conf
df_spark_configfile_batch	Spark configuration file path for Batch mode Spark pipelines.	/etc/spark2/conf/spark-defaults.conf
df_batch_spark_coalesce_partitions	Spark coalesce configuration to create lesser files while writing to disk.	NA
df_disable_current_loader	Custom Transformations cannot be loaded from current class loader. To load Custom Transformation from current classloader, set this configuration to False.	True
df_overwrite_log_level	Log Level overwrite at pipeline level. For example, rootLogger=ERROR;io.infoworks=TRACE;infoworks=DEBUG;org.apache.spark=ERROR.	NA
df_dynamic_hive_udf_enabled	Used to disable Hive UDF from loading for every job or request when add jar permissions are disabled. Set this configuration to false when permissions are disabled.	True
df_validation_progress_percent	Used to set the validation process percent in pipeline batch jobs.	10
df_schemasync_progress_percent	Used to set the Schema Sync progress percent in pipeline batch jobs.	10
df_spark_merge_file_num	Spark configuration to merge files using Coalesce option on dataframe during merge process.	1
iw_df_ext_prefix	Pipeline extension prefix.	hdfs://
storage_format	Used to set the default storage format value for pipeline target.	ORC
user_extensions_base_path	Pipeline extension base path.	NA
source_crawl_schema_merge_mr_map_mem_mb	Mapper memory for schema merge jobs.	NA
source_crawl_schema_merge_mr_red_mem_mb	Reducer memory for schema merge jobs.	NA
source_crawl_schema_mapmerge_mr_map_mem_mb	Mapper memory for map-only schema merge jobs.	NA
SUPPORT_RESERVED_KEYWORDS	Used to enable or disable support for reserved keywords. To enables Hive connection on HDP 3.1, this value must be set to false.	True
df_cutpoint_optimization_enabled	Enables caching when the value is set to true.	False
CSV_NULL_STRING	Used to set the NULL string. This configuration is available on table, source and global level.	NULL
logpath	Path to logfile for mongo	$IW_HOME/logs/mongod.log
logappend	-	true
fork	If set to true, fork and run in background	true
port	Mongo port	27017
dbpath	-	$IW_HOME/resources/mongodb/data
pidfilepath	Location of pidfile	$IW_HOME/resources/mongodb/mongod
bind_ip	If set to loopback address, local interface only. If not specified, listen on all interfaces	0.0.0.0
auth	Enable authentication for mongo when set	true
noauth	Disable authentication for mongo when set	true
wiredTigerCacheSizeGB	-	4
replSet	In replicated mongo databases, specify the replica set name	rs0
oplogSize	Maximum size in megabytes for replication operation log	128
keyFile	Path to a key file storing authentication info for connections between replica set members	$IW_HOME/resources/mongodb/mongodb/
NETEZZA_EXPORT_DELIMITER	The record delimiter in the temporary file while transferring (Only applicable for external tables mode).
df_impala_incremental_stats_enabled	-
max_connections	Determines the maximum number of concurrent connections to the database server.	100
listen_addresses	Specifies the TCP/IP address(es) on which the server is to listen for connections from application. It can be a comma-separated list of addresses(host names and/or numeric IP addresses). Use * to specify all available IP interfaces; 0.0.0.0 to specify all IPv4 addresses; localhost to specify only local TCP/IP “loopback” connections to be made; empty to specify that server does not listen on any IP interface	localhost
shared_buffers	Sets the amount of memory the database server uses for shared memory buffers. Min 128kB	128MB
dynamic_shared_memory_type	Specifies the dynamic shared memory implementation that the server should use. First option supported by OS (posix, sysv, windows, mmap). Use none to disable dynamic shared memory.
log_timezone	Sets the time zone used for timestamps written in the server log	UTC
datestyle	Sets the display format for date and time values, as well as the rules for interpreting ambiguous date input values. It contains two independent components- the output format specification (ISO, Postgres, SQL, or German) and the input/output specification for year/month/day ordering (DMY, MDY, or YMD)	'iso, mdy'
timezone	Sets the time zone for displaying and interpreting time stamps	UTC
lc_messages	Sets the language in which messages are displayed.	en_US
lc_monetary	Sets the locale to use for formatting monetary amounts	en_US
lc_numeric	Sets the locale to use for formatting numbers	en_US
lc_time	Sets the locale to use for formatting dates and times	en_US
default_text_search_config	Default configuration for text search	pg_catalog.english
wal_level	Determines how much information is written to the WAL	minimal
archive_mode	Enables archiving. When archive_mode is enabled, completed WAL segments are sent to archive storage by setting archive command	off
archive_command	Command to use to archive a logfile segment. Placeholders: %p = path of file to archive. %f = file name only. (e.g. 'test ! -f /mnt/server/archivedir/%f cp %p /mnt/server/archivedir/%f')
max_wal_senders	Specifies the maximum number of concurrent connections from standby servers or streaming base backup clients. Set this on the master and on any standby that will send replication data. Must be less than max_connections	0
data_directory	This config allows use of data in another directory	ConfigDir
hba_file	This config specifies the file location for the host-based authentication configuration	ConfigDir/pg_hba.conf
ident_file	-	ConfigDir/pg_ident.conf
external_pid_file	This config specifies the file location extra PID file. If external_pid file is not explicitly set, no extra PID file is written
minConnections	This config controls number of connections on postgres. If less number of workflows are executing at the same time increase its value	100
superuser_reserved_connections	-	3
unix_socket_directories	Comma separated list of directories	/tmp
unix_socket_group	Restart postgres and orchestrator services
unix_socket_permissions	Begin with 0 to use octal notation	0777
bonjour	Set this configuration to 'on' to advertise server via Bonjour	off
bonjour_name	-	computer_name
authentication_timeout	-	1min
ssl	-	off
ssl_ciphers	Allowed SSL ciphers	HIGH:MEDIUM:+3DES:!aNULL
ssl_prefer_server_ciphers	-	on
ssl_ecdh_curve	-	prime256v1
ssl_cert_file	-	server.crt
ssl_key_file	-	server.key
ssl_ca_file	-
ssl_crl_file	-
password_encryption	-	on
db_user_namespace	-	off
row_security	-	on
krb_server_keyfile	-	off
krb_caseins_users	-	off
tcp_keepalives_idle	TCP_KEEPIDLE (in seconds) - The time the connection needs to remain idle before TCP starts sending keepalive probes. 0 selects the system default	0
tcp_keepalives_interval	TCP_KEEPINTVL (in seconds) - The time between individual keepalive probes. 0 selects the system default	0
tcp_keepalives_count	TCP_KEEPCNT (in seconds) - The maximum number of keepalive probes TCP should send before dropping the connection. 0 selects the system default	0
shared_buffers	Min 128kB	128MB
huge_pages	-	try
temp_buffers	Min 800kB	8MB
SQOOPLIMIT	-
FETCHSIZE	-
ORACLE_LOGMINER_DICTIONARY_FILE_PATH	-
ORACLE_ARCHIVE_LOG_PATH	The absolute path to directory where archive logs are stored.
ORACLE_LOGMINER_DICTIONARY_FILE_NAME	-
BUILD_DICTIONARY_BEFOR_CDC	Setting the value to true builds a dictionary before every CDC.	false
INCLUDE_CURRENT_LOG	-
DATABASE_OBJECT_TYPES	-
TEMP_LOG_TABLE_NAME	-
LOGMINER_TEMP_TABLE_TABLESPACE_NAME	-
CREATE_TEMP_TABLE_PARALLELISM	-
TEMP_DATABASE_NAME	-
TEMP_TABLE_INDEX_NAME	-
CREATE_INDEX_ON_TEMP_TABLE	-
USE_NEW_TABLESPACE_FOR_LOGMINER	-
USE_REDO_LOG_DICTIONARY	Setting the value to true uses redo log dictionary to read archive logs with DDL tracking. The value must be set to true in case of schema synchronization.	False
FETCH_NULL_TIMESTAMPED_RECORDS	-
ORACLE_ARCHIVE_LOG_INFO_OBJECT_NAME	-
ENABLE_DDL_TRACKING	-
IS_STAGING_ORACLE_SERVER	-
DROP_HIVE_SCHEMA	-
DISABLECATEGORICAL	-
DISABLEADDITIONALTABLEINFO	-
DISABLEROWCOUNT	-
TPT_SCRIPT_PATH	-
TPT_CHECKPOINTS_PATH	-
TPT_LOG_PATH	-
TPT_CHARACTERSET	-
SOURCE_DATE_FORMAT	-
SOURCE_TIME_FORMAT	-
SFTP_BUFFER_SIZE	-
GZIP_FILE_EXTENSION	-
ENCODE_PRIMARY_PARTITION	-
UFI_INGEST_HIDDEN_FILES	-
UFI_MAX_FAILURE_PERCENTAGE_PER_TABLE	-
XML_INPUT_FILE_ENCODING	-
FIXED_WIDTH_INPUT_FILE_ENCODING	-
FIXED_WIDTH_RECORD_SEPARATOR	-
FIXED_WIDTH_PADDING_CHARACTER	-
FIXED_WIDTH_PARSER_LIB	-
MAX_FIXED_WIDTH_RECORD_SIZE	-
FIXED_WIDTH_ERROR_THRESHHOLD	If the number of error records increases this threshold, the MR job fails.	100
FIXED_WIDTH_KEEP_FILES	If the host type is local before the MR job runs, the csv files are copied to the tableId/CSV directory. If this config is true, then the files are not deleted after the crawl.	True
FIXED_WIDTH_COMMENT_START_CHARACTER	-
FIXED_WIDTH_SKIP_CHARACTERS_UNTIL_NEW_LINE	-
FIXED_WIDTH_ROW_ENDS_WITH_NEW_LINE	-
FIXED_WIDTH_TYPE_DETECTION_ROW_COUNT	-
CDC_START_TIMESTAMP	-
CDC_END_TIMESTAMP	-
LOG_BASED_CDC_SPLIT_BY_COLUMN	-
IS_WIDE_TABLE	-
VALIDATECHUNKS	-
FAILCHUNKWHENCOUNTDOESNTMATCH	-
CSV_INPUT_FILE_ENCODING	-
JSON_INPUT_FILE_ENCODING	-
JSON_ERROR_THRESHHOLD	If the number of error records increases this threshold, the MR job fails.	100
JSON_KEEP_FILES	If the host type is local before the MR job runs, the CSV files are copied to the tableId/CSV directory. If this value is true, the files are not deleted after the crawl.	True
JSON_TYPE_DETECTION_ROW_COUNT	Number of rows to be read for type detection/metacrawl.	100
CONTROL_FILE_READER	-
VALIDATE_AFTER_SFTP_TO_LOCAL	-
VALIDATE_AFTER_LOCAL_TO_HDFS	-
SEC_PARTITION_MERGE_RED	-
VALIDATE_AFTER_HDFS_TO_HIVE	-
CSV_MULTILINE_MODE	-
TPT_DELIMITER	-
TPT_FILE_ESCAPE_CHAR	-
TPT_FILE_QUOTE_ESCAPE_CHAR	-
TPT_FILE_ENCLOSE_CHAR	-
TPT_EXPORT_SPOOL_MODE	-
TPT_JOB_RESTART_LIMIT	-
TPT_CHECKPOINT_INTERVAL_IN_SECONDS	-
TPT_EXPORT_BLOCK_SIZE	-
TPT_EXPORT_TENACITY_HOURS	-
TPT_EXPORT_RETRY_INTERVAL_MINS	-
TPT_IO_BUFFER_SIZE	-
TPT_HADOOP_BLOCK_SIZE	-
TPT_EXPORT_MAX_SESSIONS	-
TPT_EXPORT_MIN_SESSIONS	-
TPT_EXPORT_OPERATOR_MAX_DECIMALDIGITS	-
TPT_EXPORT_READER_INSTANCES	-
TPT_EXPORT_WRITER_INSTANCES	-
TPT_UNICODE_MULTIPLICATION_FACTOR	-
TPT_ASCII_MULTIPLICATION_FACTOR	-
TPT_UTF8_MULTIPLICATION_FACTOR	-
TPT_UTF16_MULTIPLICATION_FACTOR	-
USE_TPT_SELECTOR_OPERATOR	-
USE_TPT_GENERATED_SCHEMA	-
ENABLE_TPT_TRACE_LEVEL	-
USE_TPT_DESTINATION_AS_HDFS	-
USE_TPT_TDCH_INTERFACE	-
TPT_FILE_FORMAT	-
SFTP_STAGING_BASE_PATH	-

Admin Configurations Moved from Global to Entity Levels

The following table lists the admin configurations that are moved from global to entity levels:

IWConstant	Current Level	Moved to
UFI_INGEST_HIDDEN_FILES	Global	Table
UFI_MAX_FAILURE_PERCENTAGE_PER_TABLE	Global	Table
EXPORT_PARALLELIZATION_FACTOR	Global	Global
XML_INPUT_FILE_ENCODING	Global	Table
XML_ERROR_THRESHHOLD	Global	Table
XML_KEEP_FILES	Global	Table
FIXED_WIDTH_INPUT_FILE_ENCODING	Global	Table
FIXED_WIDTH_RECORD_SEPARATOR	Global	Table
FIXED_WIDTH_PADDING_CHARACTER	Global	Table
FIXED_WIDTH_PARSER_LIB	Global	Table
MAX_FIXED_WIDTH_RECORD_SIZE	Global	Table
FIXED_WIDTH_THRESHHOLD	Global	Table
FIXED_WIDTH_KEEP_FILES	Global	Table
FIXED_WIDTH_COMMENT_START_CHARACTER	Global	Table
FIXED_WIDTH_SKIP_CHARACTERS_UNTIL_NEW_LINE	Global	Table
FIXED_WIDTH_ROW_ENDS_WITH_NEW_LINE	Global	Table
FIXED_WIDTH_TYPE_DETECTION_ROW_COUNT	Global	Table
sqoopLimit		Source
fetchSize		Source
CDC_START_TIMESTAMP		Table
CDC_END_TIMESTAMP		Table
LOG_BASED_CDC_SPLIT_BY_COLUMN		Table
ORACLE_LOGMINER_DICTIONARY_FILE_PATH		Source
ORACLE_ARCHIVE_LOG_PATH		Source
ORACLE_LOGMINER_DICTIONARY_FILE_NAME		Source
BUILD_DICTIONARY_BEFOR_CDC		Source
TABLE_SCHEMA_FULL_REFRESH		Global
ORACLE_DATE_FUNCTIONS	Global	Global
INCLUDE_CURRENT_LOG	Global	Source
DATABASE_OBJECT_TYPES	Global	Source
TEMP_LOG_TABLE_NAME	Source	Source
LOGMINER_TABLESPACE_NAME		Source
LOGMINER_TEMP_TABLE_TABLESPACE_NAME		Source
CREATE_TEMP_TABLE_PARALLELISM		Source
TEMP_DATABASE_NAME	Source	Source
TEMP_TABLE_INDEX_NAME		Source
CREATE_INDEX_ON_TEMP_TABLE		Source
USE_TEMP_TABLE_FOR_LOG_BASED_CDC	Source	Source
USE_NEW_TABLESPACE_FOR_LOGMINER		Source
IS_WIDE_TABLE	Global	Table
USE_REDO_LOG_DICTIONARY	Source	Source
MAP_ORACLE_DATE_TO_TIMESTAMP	Source	Source
FETCH_NULL_TIMESTAMPED_RECORDS		Source
ORACLE_ARCHIVE_LOG_INFO_OBJECT_NAME	Source	Source
ENABLE_DDL_TRACKING		Source
IS_STAGING_ORACLE_SERVER	Source	Source
DROP_HIVE_SCHEMA	Global	Source
disableCategorical	Global	Source
disableAdditionalTableInfo	Global	Source
disableRowCount	Global	Source
SKIP_RELOADED_CHUNKS	Table	Table
enable_schema_synchronization	Table	Table
validateChunks	Admin	Table
failChunkWhenCountDoesntMatch	Table	Table
CSV_PARSER_LIB	Admin	Table
CSV_SPLIT_SIZE_MB	Admin	Table
CSV_INPUT_FILE_ENCODING	Admin	Table
JSON_INPUT_FILE_ENCODING	Admin	Table
CSV_ERROR_THRESHHOLD	Admin	Table
JSON_ERROR_THRESHHOLD	Admin	Table
CSV_KEEP_FILES	Admin	Table
JSON_KEEP_FILES	Admin	table
CSV_TYPE_DETECTION_ROW_COUNT	Admin	Table
JSON_TYPE_DETECTION_ROW_COUNT	Admin	Table
CONTROL_FILE_READER	Admin	Table
VALIDATE_AFTER_SFTP_TO_LOCAL	Table	Table
VALIDATE_AFTER_LOCAL_TO_HDFS	Admin	Table
SEC_PARTITION_MERGE_RED	Admin	Table
VALIDATE_AFTER_HDFS_TO_HIVE	Admin	Table
CSV_MULTILINE_MODE	Admin	Table
TPT_DEFAULT_CHARSET		Global
TPT_DELIMITER		Table
TPT_FILE_ESCAPE_CHAR		Table
TPT_FILE_QUOTE_ESCAPE_CHAR		Table
TPT_FILE_ENCLOSE_CHAR		Table
TPT_SCRIPT_PATH		Source
TPT_EXPORT_SPOOL_MODE		Table
TPT_CHECKPOINTS_PATH		Source
TPT_LOG_PATH		Source
TPT_CHARACTERSET		Source
TPT_JOB_RESTART_LIMIT		Table
TPT_CHECKPOINT_INTERVAL_IN_SECONDS		Table
TPT_EXPORT_BLOCK_SIZE		Table
TPT_EXPORT_TENACITY_HOURS		Table
TPT_EXPORT_RETRY_INTERVAL_MINS		Table
TPT_IO_BUFFER_SIZE		Table
TPT_HADOOP_BLOCK_SIZE		Table
TPT_EXPORT_MAX_SESSIONS		Table
TPT_EXPORT_MIN_SESSIONS		Table
TPT_EXPORT_OPERATOR_MAX_DECIMALDIGITS		Table
TPT_EXPORT_READER_INSTANCES		Table
TPT_EXPORT_WRITER_INSTANCES		Table
TPT_UNICODE_MULTIPLICATION_FACTOR		Table
TPT_ASCII_MULTIPLICATION_FACTOR		Table
TPT_UTF8_MULTIPLICATION_FACTOR		Table
TPT_UTF16_MULTIPLICATION_FACTOR		Table
USE_TPT_SELECTOR_OPERATOR		Table
IS_TIME_FORAMT_NEEDED		Global
USE_TPT_GENERATED_SCHEMA		Table
ENABLE_TPT_TRACE_LEVEL		Table
USE_TPT_DESTINATION_AS_HDFS		Table
USE_TPT_TDCH_INTERFACE		Table
TPT_FILE_FORMAT		Table
SOURCE_DATE_FORMAT		Source
SOURCE_TIME_FORMAT		Source
SFTP_STAGING_BASE_PATH		Table
SFTP_BUFFER_SIZE		Source
GZIP_FILE_EXTENSION		Source
ENCODE_PRIMARY_PARTITION		Source
DFI_DELETE_ORIGINAL_FILE_AFTER_DECOMPRESSION		Source
export_multiplication_factor		Table/ Pipeline
netezza_export_escape_char		Table/ Pipeline
netezza_export_enclose_char		Table/ Pipeline
netezza_export_null_value		Table/ Pipeline
netezza_export_delimiter		Table/ Pipeline
bcp_crawl_separator		Table/ Pipeline
bcp_row_delimiter		Table/ Pipeline
DF_SNOWFLAKE_VALIDATE_ROW_COUNT		Pipeline

Updating Password for MongoDB and HIVE

Perform the following to encrypt and update password for MongoDB or Hive in the Infoworks DataFoundry system:

Execute the following interactive bash script: $IW_HOME/apricot-meteor/infoworks_python/infoworks/bin/infoworks_security.sh -encrypt -p
Enter the new plain text password when prompted.
Copy the encrypted password displayed in the result and update in the Infoworks conf.properties file located in the $IW_HOME/conf folder.

Backup

This feature allows administrators to take a backup of the current metadata stored in MongoDB.

In the System Configuration page, click Backup.

The Metadata Database Backup page includes two sections: Backup Schedule, Recent Backups

Backup Schedule

The backup can be taken immediately or it can be scheduled to run whenever required.

The Backup Now button takes an immediate backup of the metadata and is disabled by default until the target path to store the backup is specified.
To specify the target path click Edit Schedule, add absolute target path of local file system, configure any schedule if needed by selecting Enabled in Status dropdown.

Click Save to save the configuration.
Click Backup Now to run the backup metadata job in the background. After successful completion of backup, an entry will be added in the Recent Backup view.

Recent Backup

The recent backup view displays the list of all previous backup of the Infoworks DataFoundry metadata.

Date: Timestamp on which the backup was taken.
Filename: Path of the file where the backup was taken.
Status: Whether the backup was successful or not.

Notification

The Notification feature allows the admin to configure emails of success or failure of various jobs, and any issues with Infoworks DataFoundry installation.

NOTE: Email notifications require an SMTP server and an email account to send emails from.

This section describes the steps to configure the notifications.

Open the platform-config.json file from the $IW_HOME/ platform/conf folder.
Set the following configurations in the messaging-service section as required.

Edit the smtpUsername and smtpHost parameters with required values. If the email address provided in the smtpUsername parameter is of a Gmail account, do not make any change to the smtpHost value.
Provide the AES encrypted email password for the specified email address as the smtpPassword. If no password is setup for the specified email ID, provide the AES encrypted empty string.
To encrypt the password, run the following command : $IW_HOME/apricot-meteor/infoworks_python/infoworks/bin/infoworks_security.sh -encrypt -p <yourpassword>
Restart services using the following commands:

$IW_HOME/bin/stop.sh platform

$IW_HOME/bin/start.sh platform

Last updated on