Knowledge Base
Knowledge Base
How To's
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
How to Read Hive Table Data Ingested in Parquet Format through Spark Shell
Copy Markdown
Open in ChatGPT
Open in Claude
Issue
Reading Hive table data ingested in parquet format through Spark shell.
Cause
Infoworks can ingest data in Parquet format in Hive. The data will be stored recursively in nested HDFS directories and if user tries to read the data through Spark shell, no results will be displayed.
Solution
To read the hive table data stored in recursive directories in HDFS through Spark shell, you must set the following configurations in the df_spark-defaults.conf file in the $IW_HOME/conf directory and then run the Spark shell command:
x
spark.sql.hive.convertMetastoreParquet falsespark.sql.parquet.writeLegacyFormat truespark.mapreduce.input.fileinputformat.input.dir.recursive truespark.hive.mapred.supports.subdirectories truespark.mapred.input.dir.recursive truespark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive truespark.sql.crossJoin.enabled trueSpark Shell Command
spark-shell -- properties-file <absolute_path_for_df_spark-defaults.conf file>Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Last updated on
Next to read:
How to Set HDFS Path for PipelineFor more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
(C) 2015-2022 Infoworks.io, Inc. and Confidential
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message