RDBMS Configuration Options

Editing Schema

You can edit the target datatype of columns using the following steps:

In the Source Configuration page, select the table to be ingested.
Click the Edit Schema tab and edit the target data type for required columns. The Hive table will be created with the specified datatypes.

Following are the datatype conversions supported:

Source Datatype	Target Datatype
String	Boolean, Integer, Float, Double, Long, Decimal, Date, Timestamp. Following are date and timestamp formats supported: Date: yyyy-MM-dd Timestamp: yy-MM-dd hh:mm:ss
Boolean	String
Integer	Float, Double, Long, Decimal, String
Float	Double, Long, Decimal, String
Double	Float, Long, Decimal, String
Long	Float, Double, Decimal, String
Decimal	Integer, Float, Long, String
Date	String
Timestamp	String
Byte	String

WARNING: The job might fail if any datatype conversion is not possible or if any error record is found.

The Options field allows you to edit the precision and scale of the column. You can use the Bulk Edit option to edit the precision and scale of all the columns.

NOTE: The precision and scale values can be edited only for the Decimal datatype columns.

Limitation

Providing scale values greater than the precision values might cause issues.

Configurations

The following configurations must be used when performing metadata crawl for very large sources:

crawl_connection_threshold (integer): The number of tables after which the connection will be re-established with source database system. The default value is 1000.
METADATA_CRAWL_BATCH_SIZE (integer): The number of tables for which the metadata is stored in memory. The default value is 1000.
enable_autofill_fk_relations (boolean): This option enables auto-filling of natural keys during metadata crawl, fetching of foreign keys and relations when set to true. The default value is false.

This feature includes the addition of new columns in RDBMS tables and changes applied on the target table in Hive (for both current view and historic view) during incremental ingestion. Ingested records will be backfilled with the default value of the newly added columns. The schema synchronization (only addition of new columns) is enabled using a table-level configuration.

Currently, schema synchronization is supported for following databases for different types of incremental delta fetches:

Query-Based Incremental Ingestion: The schema changes are directly read from the table metadata crawl. Currently, we support schema synchronization for Teradata (JDBC), Teradata (TPT), Oracle, SQL Server, DB2, Netezza, SAP HANA, MySQL, Sybase, MariaDB, Vertica, Redshift and Apache Ignite.
Log-Based Incremental Ingestion: In this approach, the schema changes are read from the table metadata and not from the DDL statements in the database log. Currently, schema synchronization is supported for Oracle.

Reference

Data Types for RDBMS Query-Based Incremental Ingestion

Last updated on

RDBMS Configuration Options

Editing Schema

Configurations

RDBMS Schema Synchronization

Reference