Log-Based Incremental Ingestion
Prerequisite
Ensure that transactional logs are enabled on the source database.
Overview
Log-based incremental ingestion allows ingesting data with lesser load on the source databases, as compared to other ingestion methods. The incremental data is read from the log statements (Data Manipulation Language - DMLs) stored in database log files (archival log in Oracle). Tables in the log-based incremental ingestion require natural keys to be configured like query-based CDC. As incremental data is read from logs, deletion of records is handled from source tables. Infoworks supports Oracle, MSSQL, Sybase IQ and DB2 log-based CDC.
If the incremental run has multiple transactions for the same record, then the transaction with the latest watermark value will be written to the DataLake Hive table. In the event that the watermark values are same for multiple latest transactions of the same record, then any one of these entries will be written to the DataLake Hive table. When there are only two transactions for the same record (an insert and an update) with the same watermark, the update entry will be ingested.
For Oracle log-based ingestion, the watermark is System Change Number (SCN).
For OGG the watermark is a combination of the transaction timestamp and the sequence number. For Sybase, DB2 and SQL Server ingestion, the watermark is transaction timestamp.
