Mainframe Data File Ingestion

Mainframe data file ingestion provides the following features:

  • Schema crawl
  • Data Crawl
  • Append Mode
  • CDC and Merge

Creating Mainframe Data File Source

For creating a DFI source, see Creating Source. Ensure that the Source Type selected is Structured Files (CSV, Fixed-width, Mainframe Data Files).

Configuring Mainframe Data File Source

For configuring a DFI source, see Configuring Source.

Creating Table and Crawling Metadata

  • Click the Source Settings icon.
  • In the File Mapping section, click Add Entry to add a folder as a table.

Configure the following table details:

  • Table: Table name.
  • Hive Table Name: Name of the Hive table that holds the crawled data.
  • Source Path: Folder path of the table. This is relative to the source base path.
  • Relative Target HDFS Path: Target HDFS path. This is relative to the target base path.
  • Include/Exclude Files From Directory: Regex pattern to include or skip files.
  • Ingest sub-directories: Specifies whether to crawl the files in the recursive structure of the specified source path.
  • File Type: Type of structured file. Select Copybook.
  • Path to Copybook Layout: Location of the Copybook layout file which defines schema for the table.
  • File Dialect: Cobol dialect used. The default value is Mainframe.
  • File ORG: The format with which the records are organized in the files.
  • Font of Layout: Font or character set.
  • Cobol Splits: Option to split records when the data includes hierarchy.
  • Click Save and Crawl Schema. The Edit Schema page is displayed.
  • Edit the schema and click Save Schema.

Crawling Mainframe Data File Data

For crawling a Fixed-width source, see Crawling Data.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard