Bulk Pipeline Creation Script
Infoworks DataFoundry allows running a script to create pipelines with the same structure, in bulk.
Following are the steps to run the script for bulk pipeline creation:
- Navigate to the $IW_HOME/scripts/pipeline folder.
- Run the script using the following command:
python pipeline_create.py -s <input_sql> -c <input_csv> -t <TOKEN> -o <output_csv>
where,
<input_sql>
is the path of the SQL template based on which new pipelines will be created,<input_csv>
is the path of the CSV file that includes the specifics of the pipelines to be created,<TOKEN>
is the user authentication token obtained from the user settings page<output_csv>
is the output CSV file generated once the script is run.
Sample Query
select * from {table1} UNION select * from {table2}
where,
{table1}, {table2}...{tableN) are the alias for the actual tables given in the table_names column in the input CSV file.
Sample CSV Input
x
domain_name,pipeline_name,source_name,table_names,target_schema,target_table,target_hdfs,target_mode,target_natural_keys,target_partition_keys,target_no_of_sec_partitions
ImportTest,test1,salesDB,"catalog_sales,item,date_dim",dev_testing,big_ticket_sales1,/iw/pipelines/dev_testing/big_ticket_sales1,OVERWRITE,i_item_id,i_category,1
ImportTest,test2,salesDB,"catalog_sales,item,date_dim",dev_testing,big_ticket_sales11,/iw/pipelines/dev_testing/big_ticket_sales11,OVERWRITE,"i_item_id,i_item_desc",,1
The CSV file must contain the following columns:
- Domain Name
- Pipeline Name
- Schema Name
- Table Name
- Target Schema
- Target Table
- Target HDFS Location
- Target Mode
- Target Natural Keys (comma separated)
- Target Partition Keys (comma separated)
- Target Number of Secondary Partitions
The output CSV file includes the following columns:
- PipelineName
- Pipeline ID (created)
- Error Description
- Pipeline Name Already Exists
- Table Not Found (Table Details)
- Input Error