Bulk Pipeline Creation Script

Infoworks DataFoundry allows running a script to create pipelines with the same structure, in bulk.

Following are the steps to run the script for bulk pipeline creation:

  • Navigate to the $IW_HOME/scripts/pipeline folder.
  • Run the script using the following command: python pipeline_create.py -s <input_sql> -c <input_csv> -t <TOKEN> -o <output_csv>

where,

  • <input_sql> is the path of the SQL template based on which new pipelines will be created,
  • <input_csv> is the path of the CSV file that includes the specifics of the pipelines to be created,
  • <TOKEN> is the user authentication token obtained from the user settings page
  • <output_csv> is the output CSV file generated once the script is run.

Sample Query

SQL
Copy

where,

{table1}, {table2}...{tableN) are the alias for the actual tables given in the table_names column in the input CSV file.

Sample CSV Input

SQL
Copy

The CSV file must contain the following columns:

  • Domain Name
  • Pipeline Name
  • Schema Name
  • Table Name
  • Target Schema
  • Target Table
  • Target HDFS Location
  • Target Mode
  • Target Natural Keys (comma separated)
  • Target Partition Keys (comma separated)
  • Target Number of Secondary Partitions

The output CSV file includes the following columns:

  • PipelineName
  • Pipeline ID (created)
  • Error Description
  • Pipeline Name Already Exists
  • Table Not Found (Table Details)
  • Input Error
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard