Batch-ID Based Incremental Ingestion

Prerequisite

Ensure that a incremental numeric column is available, which identifies the batch in which each record was updated/inserted.

Overview

Tables ingested using this method will be loaded fully in the first ingestion. Subsequent ingestions will be incremental data ingestion. Delta will be fetched and merged/appended with the data that already exists on target. Fetching of delta uses a numeric value from the Batch ID column (for inserts and updates configured by the user during table configuration). These columns are expected to be in the same table.

NOTE: Sliding window-based adhoc incremental loads are not supported in case of Batch-ID based CDC.

Configurations

Configuration to Set Comparator to Fetch CDC Records

The USE_GTE_FOR_CDC configuration allows you to fetch the CDC records based on the use case.

  • true: the CDC records will be fetched using the >= comparator. This is the default behaviour and must be used for merge use cases to ensure that the data from the last batch is brought again. This is performed when some data of the last batch and timestamp is still being populated in the source system when the ingestion job has finished.
  • false: the CDC records will be fetched using the > comparator. This behaviour must be used for append mode scenarios where the data for the last batch or timestamp in the source system is fully populated and the user does not want the old data again.

This configuration is applicable for Timestamp and BatchID sync type tables.

NOTE: Data might be lost when the > comparator is used. If records with same batch ID are being inserted in the source system when the ingestion job is running, all the records that are inserted just after the job is run and with same batch ID will be missed in next CDC job.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard