Ingestion Process

This chapter describes the basic steps involved in the Infoworks ingestion process.

Creating Source

Login to Infoworks.
Click Admin > Sources > New Source.
Enter the Source Name, Source Type, Target Hive Schema and Target HDFS Location. Some sources include options to set the the JDBC Driver Name, enable ECB Agent or select the source data format. See specific ingestion sections for their respective settings.
Enable Make Publicly Available to make the source available for anyone to use.
Click Save Settings.

NOTE: Click the Ingestion Logs icon to view the progress and logs of the test connection.

NOTE: Click the Ingestion Logs icon to view the progress and logs of the metadata crawl.

After the successful crawl, the Source Configuration page will be displayed.
Click the View button and click the Schema tab. The table metadata will be displayed.

Click the Sources menu and click the required source.
Click the Configure button for the required table.
In the Configuration page, set the Ingest Type to Full Load, and enter the required values.
Click Save Configuration.

Click the Table Group tab in the Tables page.
Click the Add Table Group button, enter the table group details and click Add Tables.
Add the required tables and click the Add Tables button.
Click Save Configuration.
Click the View Table Group icon for the table group created.
For first time ingestion or for a clean crawl, click Initialize and Ingest Now.
To fetch new data for the crawled source, click Ingest Now from the second crawl onwards, only the new and changed data will be picked.

Last updated on