Ingestion Process
This chapter describes the basic steps involved in the Infoworks ingestion process.

Creating Source
- Login to Infoworks.
- Click Admin > Sources > New Source.
- Enter the Source Name, Source Type, Target Hive Schema and Target HDFS Location. Some sources include options to set the the JDBC Driver Name, enable ECB Agent or select the source data format. See specific ingestion sections for their respective settings.
- Enable Make Publicly Available to make the source available for anyone to use.
- Click Save Settings.
Setting Source
- Click the Sources menu, click the required source.
- Click the Click here to enter them link or the Settings icon.
- Enter the source configuration details.
- Click the Test Connection option to verify the connection details entered.
NOTE: Click the Ingestion Logs icon to view the progress and logs of the test connection.
- Click Save Settings to save the table configuration in metadata storage.
Crawling Metadata
- Click the Sources menu and click the required source.
- In the Source Configuration page, click the Crawl Metadata button.
NOTE: Click the Ingestion Logs icon to view the progress and logs of the metadata crawl.
- After the successful crawl, the Source Configuration page will be displayed.
- Click the View button and click the Schema tab. The table metadata will be displayed.
Configuring Tables
- Click the Sources menu and click the required source.
- Click the Configure button for the required table.
- In the Configuration page, set the Ingest Type to Full Load, and enter the required values.
- Click Save Configuration.
Ingesting Data
- Click the Table Group tab in the Tables page.
- Click the Add Table Group button, enter the table group details and click Add Tables.
- Add the required tables and click the Add Tables button.
- Click Save Configuration.
- Click the View Table Group icon for the table group created.
- For first time ingestion or for a clean crawl, click Initialize and Ingest Now.
- To fetch new data for the crawled source, click Ingest Now from the second crawl onwards, only the new and changed data will be picked.