How to Set HDFS Path for Pipeline
Issue
Setting the path for external tables while building the pipelines.
An error message related to target HDFS path used in pipelines targets might be displayed. This is because the Infoworks might not allow the user to set some path due to conflicts.
Following are the details on the possible issues:
User must provide the target HDFS path for the following sections:
- providing path for DF pipeline
- resolving path issues while building pipelines
Consider a pipeline, pipelines_1_customer, created with target table, customer, and with the following settings:
- Schema Name: Customer_details
- HDFS path: /storage/data/analytics/customer
Consider creating another pipeline, pipelines_2_customer, with target table customer_analysis, with the following settings:
- Schema Name: Customer_details
- HDFS path: /storage/data/analytics/customer
After saving and running, the following error message might be displayed:
The HDFS path you provided is conflicting with an existing entity (path: //storage/data/analytics/customer target: “/storage/data/analytics//customer” in pipeline pipelines_2_customer” [domain: “Customer_details”]). Please choose a different HDFS path).
Cause
Following are the reasons:
Hive metastore stores the path for given Hive table and the path, /storage/data/analytics/customer, is already been used by the target table Customer inpipeline_1_customer. When the pipeline_2_customer is built, the same path, /storage/data/analytics/customer, is used. This is not supported by Hive metastore, hence an issue occurs in the backend when the pipeline is run.
Solution
Ensure that you provide a new target HDFS path instead of the existing path.
In the above mentioned issue, the path can be used as /storage/data/analytics/customer_1 or /storage/data/analytics/customer_analysis to avoid the conflict.
Also, before naming the path, check the existing names/directories/files available in the target path by using the following command:
hadoop fs -ls /path/target_directory_for_target_table
This provides the available files present in the directory and helps to avoid the conflict with existing file names.