DB2 Log-Based Incremental Ingestion
In DB2 log-based incremental ingestion, data is continuously synchronized into Hive using CDC log tables for the corresponding DB2 tables. Logging in DB2 database server is enabled using the SQL Replication Capture program (asnclp, asncap and apply). The SQL Replication program creates CDC and target tables (replication tables) for the corresponding source table.
For details on how to enable logging in DB2, see How to Enable SQL Replication for DB2.
Before you configure DB2 CDC, ensure that the following prerequisites and user authority requirements are met. Also, review the restrictions to configure CDC properly.
Prerequisites
Following are the Infoworks CDC DB2 server prerequisites:
- Archive logging must be active for the database that contains the source tables from which change data is to be captured.
- DB2 source tables must be defined with the DATA CAPTURE CHANGES clause for capture processing to occur.
- A valid DB2 environment must exist for the Infoworks DataFoundry user.
User Authority Required for DB2 CDC
For Infoworks DataFoundry to read change data from DB2 logs, the user ID that you specify for database access must have SYSADM or DBADM authority. Usually, this user ID is specified in the UDB CAPI_CONNECTION statement in the dbmover.cfg file.
DB2 Datatypes Supported for CDC
Infoworks DataFoundry supports most DB2 datatypes for CDC. The following table identifies the DB2 source datatypes supported by Infoworks DataFoundry for CDC:
| DB2 Datatypes | Supported | Comments | 
|---|---|---|
| bigint | Y | - | 
| BLOB | N | If you register a table with large object (LOB) columns, Infoworks DataFoundry does not capture changes for the LOB columns but can capture changes for other columns in the table. | 
| CHAR | Y | - | 
| CLOB | N | If you register a table with LOB columns, Infoworks DataFoundry does not capture changes for the LOB columns but can capture changes for other columns in the table. | 
| DATE | N | If you register a table with DATE columns, Infoworks DataFoundry does not capture changes for the DATE columns but can capture changes for other columns in the table. | 
| DECFLOAT | N | If you register a table with DECFLOAT columns, Infoworks DataFoundry does not capture changes for the DECFLOAT columns but can capture changes for other columns in the table. | 
| DECIMAL | Y | - | 
| DOUBLE | Y | - | 
| GRAPHIC | Y | - | 
| INTEGER | Y | - | 
| LONG VARCHAR | Y | - | 
| LONG VARGRAPHIC | Y | - | 
| REAL | Y | - | 
| REF | N | DB2 does not allow change data capture for tables with REF columns. | 
| SMALLINT | Y | - | 
| TIME | Y | - | 
| TIMESTAMP | Y | - | 
| UDTs (User-defined datatypes, such as DISTINCT and STRUCT.) | N | - | 
| VARCHAR | Y | - | 
| VARGRAPHIC | Y | - | 
| XML | N | - | 
DB2 CDC Considerations
Consider the following CDC capabilities and restrictions when planning DB2 CDC processing:
- To extract change data on a DB2 client machine that is remote from the DB2 server where the change data is captured, both machines must have the same architecture. Else, change data capture processing might fail.
- If the source tables are compressed, ensure that you have a compression dictionary that is compatible with the compressed DB2 log records from which Infoworks reads change data for the tables. Else, DB2 will not be able to decompress the log records for Infoworks read requests. Usually, the compatible compression dictionary is available because DB2 maintains the current compression dictionary and a backup of the previous compression dictionary on disk.
- If you run the DB2 REORG TABLE utility or the DB2 LOAD utility with the REPLACE or RESUME NO option against compressed source tables, Infoworks recommends that you specify the KEEPDICTIONARY option for the utility. The KEEPDICTIONARY option forces DB2 to retain the current compression dictionary, if it exists. If you use the RESETDICTIONARY option, DB2 rebuilds compression dictionary. In this case, the previous compression dictionary that matches the DB2 log records might not be available any longer.
- Infoworks cannot capture change data for the following DB2 datatypes:
- DECFLOAT, LOB, and XML datatypes: You can create a capture registration for a table that includes columns with DECFLOAT, LOB, and XML datatypes. However, the registration does not include these columns, and Infoworks does not capture change data for them. Infoworks does capture change data for the other columns in the registered table that have supported datatypes.
- User-defined datatypes: Tables that include columns with user-defined datatypes cannot be registered for change data capture. Infoworks cannot capture change data for these tables.
- To add or drop partitions in a partitioned database and then redistribute table data across the updated partition group, or to reconfigure a database partition group, you must use a special procedure. Else, Infoworks might not be able to resume change data capture properly.
- If you alter a column datatype to or from FOR BIT DATA, Infoworks does not detect the datatype change. Infoworks continues to use the datatype that is specified in the existing capture registration.
- If you alter a source table to change the default value of a DB2 column of CDC interest, Infoworks does not detect this DDL change during capture processing. As a result, the correct default value is not available when Infoworks performs the following operations:
- Delivers the pre-existing short rows for a table to which columns were added.
- Delivers rows for source tables that use the VALUE COMPRESSION option and that include a column with the COMPRESS SYSTEM DEFAULT option and a default value.
- In a partitioned database, if an UPDATE to a table row changes the partition key and that change causes the row to move to another partition, Infoworks processes the UPDATE as two operations, DELETE and INSERT. However, based on the DB2 log information, Infoworks cannot predictably determine the order in which to perform the DELETE and INSERT operations. If the INSERT is processed first, both the original row and the updated row appear on the target until the DELETE is processed.
- Infoworks uses multithreaded processing for change data capture. By default, Infoworks uses up to nine threads. To configure the number of threads, specify the THREADING parameter in the UDB CAPI CONNECTION statement. If you have a DB2 partitioned database, you can use a maximum of one thread for each database partition node plus two additional threads for the CAPI and merge processing.
Configuring DB2 for CDC
Following are the tasks to configure DB2 for Infoworks CDC:
- In the DB2 Control Center Configure Database Logging Wizard, enable archive logging for the DB2 database. For more details, see the IBM DB2 documentation.
- Set the following user environment variables in any process that runs Infoworks CDC or the DTLUCUDB program:
- Set DB2NOEXITLIST to ON.
- Set DB2CODEPAGE to 1208.
- Verify that the DB2 source tables are defined with the DATA CAPTURE CHANGES clause.
- To enable Infoworks to report the authorization ID and application that is associated with a DB2 transaction, set the DB2_LOGGING_DETAIL registry variable to APPLINFO in DB2.
- To set this variable for the current DB2 instance, enter the following command: db2set DB2_LOGGING_DETAIL=APPLINFO
- To set this variable for all DB2 instances on the system, enter the following command: db2set -g DB2_LOGGING_DETAIL=APPLINFO
- If a table that is selected for change data capture includes columns with LONG datatype, use the INCLUDE LONGVAR COLUMNS clause to alter the table so that Infoworks can capture data for the LONG columns.
Enabling Logs in DB2 Server
Logging in DB2 database server is enabled using the SQL Replication Capture program (asnclp, asncap and apply). The SQL Replication program creates CDC and target tables (replication tables) for the corresponding source table.
To set up the replication, perform the following steps from command line:
- Run the following command to create CDC, target tables from the DB2 command line:
$DB2HOME/bin/asnclp -f sqlrep.asnclp
Following are sample contents of the ansclp script. In the script below, capture program is run for a test_rep database with userid as DB2INST2 and password as infoworks. CDC is specifically enabled for test_tbl table. The change data capture for the table will be stored in the test_tbl_cdc table and the corresponding replication table will be stored in the test_tbl_target in the same schema. You can specify the target schema for creating CDC and target tables.

- After running the asnclp program, start the SQL Replication Capture (asncap) program for the database test_rep from a DB2 command window (run in the background): asncap capture_server=test_rep &
- Run Apply program in the background: asnapply control_server=sample apply_qual=MYQUAL20.
- Drop the target database created if the replication of the table is not required.
- Query the test_tbl_cdc table to see CDC.
NOTE: This may take up to a minute. By default, the Apply program sleeps one minute when it finds no data to replicate.
- For more details on SQL Replication, see Introduction to SQL Replication and Setting Up SQL Replication.
