RDBMS Configuration Options
Editing Schema
You can edit the target datatype of columns using the following steps:
- In the Source Configuration page, select the table to be ingested.
- Click the Edit Schema tab and edit the target data type for required columns. The Hive table will be created with the specified datatypes.
Following are the datatype conversions supported:
Source Datatype | Target Datatype |
---|---|
String | Boolean, Integer, Float, Double, Long, Decimal, Date, Timestamp. Following are date and timestamp formats supported: Date: yyyy-MM-dd Timestamp: yy-MM-dd hh:mm:ss |
Boolean | String |
Integer | Float, Double, Long, Decimal, String |
Float | Double, Long, Decimal, String |
Double | Float, Long, Decimal, String |
Long | Float, Double, Decimal, String |
Decimal | Integer, Float, Long, String |
Date | String |
Timestamp | String |
Byte | String |
WARNING: The job might fail if any datatype conversion is not possible or if any error record is found.
- The Options field allows you to edit the precision and scale of the column. You can use the Bulk Edit option to edit the precision and scale of all the columns.
NOTE: The precision and scale values can be edited only for the Decimal datatype columns.
Limitation
- Providing scale values greater than the precision values might cause issues.
Configurations
The following configurations must be used when performing metadata crawl for very large sources:
- crawl_connection_threshold (integer): The number of tables after which the connection will be re-established with source database system. The default value is 1000.
- METADATA_CRAWL_BATCH_SIZE (integer): The number of tables for which the metadata is stored in memory. The default value is 1000.
- enable_autofill_fk_relations (boolean): This option enables auto-filling of natural keys during metadata crawl, fetching of foreign keys and relations when set to true. The default value is false.
RDBMS Schema Synchronization
This feature includes the addition of new columns in RDBMS tables and changes applied on the target table in Hive (for both current view and historic view) during incremental ingestion. Ingested records will be backfilled with the default value of the newly added columns. The schema synchronization (only addition of new columns) is enabled using a table-level configuration.
Currently, schema synchronization is supported for following databases for different types of incremental delta fetches:
- Query-Based Incremental Ingestion: The schema changes are directly read from the table metadata crawl. Currently, we support schema synchronization for Teradata (JDBC), Teradata (TPT), Oracle, SQL Server, DB2, Netezza, SAP HANA, MySQL, Sybase, MariaDB, Vertica, Redshift and Apache Ignite.
- Log-Based Incremental Ingestion: In this approach, the schema changes are read from the table metadata and not from the DDL statements in the database log. Currently, schema synchronization is supported for Oracle.