Generic REST API Ingestion
Multiple sources expose the data in REST API format. To ingest data, response from REST API must obtain metadata and data. Currently only JSON response from the REST API is supported.
Feature List
Generic REST API ingestion supports the following features:
- Schema crawl
- Data crawl
- CDC and Append
Reference Video
The demo video of REST API Ingestion is available here.
Creating Source
In the Admin section (Admin > Source > New Source), create a source and select the source type as Rest-Generic. Enter the Hive schema name and HDFS location.

Configuring Source
- Click the Sources menu and select the Generic REST API source you created.
- In the Source Configuration page, click the Click here to enter them link.

In the Settings page, enter the following:
- Authentication Mechanism: The authentication mechanism used to connect to REST API auth server. The options include OAuth, OAuth 2.0, Basic Authentication, None.
- Request Type: The HTTP request type method. The options include GET and POST.
- OAuth URL: URL for the OAuth server for the specified client.
- OAuth Token JSON Path: Path of the OAuth Token in the JSON response of Auth URL. For details on JSON path, see JSON Path Syntax.
- Secret Key: Secret key is static with the client and will be a part of Authorization headers. This field is displayed if the Authentication Mechanism selected is Basic Authentication.
- Test Connection URL: URL for verifying basic authentication mechanism. Test connection will be successful if the response (with secret key and other headers) is OK.
- Request Content: Authentic data to be sent to the OAuth server if the Request Type selected is POST.
- Request Headers: HTTP request header key-value pair.
- Request Params: HTTP request parameter key-value pair.

- Click Save Settings and perform a Test Connection.
Schema Crawl
- Click the Source Configuration icon.
Click Add New Table and enter the following details:
- Table Name: Hive table name
- Target HDFS Path: Target HDFS path relative to the source base path.
- Base URL group: Grouping of the URL on the basis of the criteria where every URL must have the same JSON schema, request headers and request params. Base URL group will be used only for full load ingestion.
- Request Headers: HTTP request header key-value pair.
To send Auth Token in the request, mention the header as Authorization. No other auth headers are supported.
- Request Params: HTTP request parameter key-value pair.
- URL Groups for CDC: Newline-separated multiple URLs. In each phase of CDC, one URL group will get processed. Multiple groups can be configured to be used during CDC ingestion.
- Meta URL: Meta URL is useful in retrieving metadata response with same JSON schema as that of base or CDC URL groups.

- Click Save and Crawl Schema.
A tree representing the schema created by crawling response from one of the URL in the base URL group.

- Select a path from the tree and click Extract Schema.
Enter the following details:
- Table Name: Hive table name.
- Target HDFS Path: Target HDFS path relative to the source base path.

- Fetch Mechanism: Includes Linked List, Pagination, None (URL list).
Linked List: REST API data access pattern in which URL of the next segment of data is present in the current JSON response.
Pagination: Method for handling large datasets and responses in the browser-based Web to minimize response time for requests and improve the user experience. The pagination parameter must be mentioned in request parameter. For example, localhost:8080/confluence/rest/api/space/ds/content?page=1&size=10.
URL List: Collection of URLs having the same JSON schema, request headers and request params. Response will be obtained from each URL and will be ingested.
- Next URL JSON Path: JSON path for the next URL in the current URL response. This field is displayed when the fetch mechanism selected is Linked List.
- Base Group URL Prefix: The next URL in the current URL response can be full or partial. For partial URL, enter the static prefix. This field is displayed when the fetch mechanism selected is Linked List.
- Pagination Type: The options include URI Path and Request Params. This field is displayed when the fetch mechanism selected is Pagination.
- Page Param Key: The key used to indicate page parameter in the pagination. For example, localhost:8080/confluence/rest/api/space/ds/content?page=1&size=10. This field is displayed when the fetch mechanism selected is Pagination.
- Param Initial Value: The value at which pagination result starts. This field is displayed when the fetch mechanism selected is Pagination.
Limitation: Currently only numeric values are supported.
- Click Recrawl Source for Schema.
Data Crawl Full Load
Following are the steps to perform a full load data crawl:
- Click Configure button for the table that requires a full load data crawl.
- Select the Ingest Type as Full Load, enter the required values and click Save Configuration. For descriptions of fields, see Source Table Configuration Field Descriptions.

- Click Save Configuration.
- Click Table Groups tab and add a table group.
- Click View Table Group icon for the table group.
- For first time ingestion or for a clean crawl, click Initialize and Ingest Now.
- To append new data to the crawled source, click Ingest Now from the second crawl onwards. Only the new and changed data will be picked.
Data Crawl Incremental Load
Following are the steps to perform an incremental load data crawl:
- Click Configure button for the table that requires an incremental load data crawl.
- Select the required incremental load Ingest Type,
- Check the Incremental Append Mode option to perform incremental ingestion.

- Enter the other required values and click Save Configuration. For descriptions of fields, see Source Table Configuration Field Descriptions.
- Click Table Groups tab and add a table group.
- Click View Table Group icon for the table group.
- For first time ingestion or for a clean crawl, click Initialize and Ingest Now.
- To append new data to the crawled source, click Ingest Now from the second crawl onwards, only the new and changed data will be picked.
REST Settings
To edit the table details, perform the following:
- Click the Source Configuration icon and click Configure on the required table.
- Click REST Settings and edit the table details.

- Click Save Entry.
Limitation
- Configuration migration is not supported for REST API sources.