Introduction
Data Ingestion is the process of obtaining data from various source formats and moving it onto Hadoop / Hive, where the data can be stored and further analyzed. Ingestion is the first step to perform data preparation/analytics via Infoworks.
Data can be streamed in real time or ingested in batches. Infoworks supports loading entire large source data sets at once and then load the incremental changes to that source data.
Ingestion Data Source Types
Ingestion is classified based on the data source type as follows:
RDBMS Ingestion
- Teradata Ingestion
- Oracle Ingestion
- MySQL
- Maria DB
- SQL Server
- DB2
- Netezza
- SAP Hana
- Hive
- SybaseIq
- Apache Ignite
- Vertica
No SQL Ingestion
- MapR-DB Ingestion
CRM Ingestion
- SalesForce Ingestion
File Ingestion
- Structured File Ingestion - Delimited File Ingestion, Fixed Width Ingestion, Mainframe Data File Ingestion
- JSON Ingestion
- XML Ingestion
- Unstructured File Ingestion
Ingestion Sync Types
Ingestion is classified based on the sync type as follows:
- Full Ingestion - fetches the complete data every time the ingestion job is run.
- Incremental Load Ingestion - fetches the complete data only in the first run, and in subsequent runs, fetches only the changed data.
Segmented Ingestion allows data to be loaded in segments defined by values of a column and can be performed on full load and incremental load.

Full Ingestion
NOTE: Tables or sources that are fully ingested will always be truncated and reloaded on target.
Following are the steps to perform full ingestion for a table:
- Click the Sources menu and click the required source.
- Click the Configure button for the required table.
- In the Configuration page, set the Ingest Type to Full Load, and enter the required values.
Incremental Ingestion
When incremental ingestion sync type is selected and ingestion is performed for the first time, the entire data will be crawled. For the consecutive ingestions, only the records that have been inserted/updated will be crawled.
Incremental load ingestion includes the following:
- Timestamp-Based Incremental Ingestion
- Query-Based Incremental Ingestion
- Batch ID Based Incremental Ingestion
- The Oracle, SQL Server, DB2 and Sybase databases additionally support Log-Based Incremental Ingestion.
- The Oracle database additionally supports OGG-Based Incremental Ingestion.
Following are the steps to perform incremental ingestion for a table:
- Click the Sources menu and click the required source.
- Click the Configure button for the required table.
- In the Configuration page, set the Ingest Type to Timestamp-Based Incremental Load, Query Based SCD1 Incremental Load, Query Based SCD2 Incremental Load, Batch ID Based Incremental Load or Log-Based Incremental Load and enter the required values.
WARNING: Switching the incremental ingestion from Append to Merge mode might result in some missing records that were previously ingested. It is therefore, strongly recommended to perform Initialize and Ingest and perform a full load immediately after switching the modes.