Title
Create new category
Edit page index title
Edit category
Edit link
Troubleshooting Delimited File Export
Authentication and Authorization
Authentication and Authorization configuration depends on the cloud storage provider and the driver being used. For EMR clusters, emrfs and authentication will be configured by default on all servers. To change permissions, see Configuring IAM Roles.
NOTES:
- If the security configurations in EMR are set as described in Configuring IAM Roles, the configurations cannot be modified after creating the EMR cluster.
- If the role used to export data to S3 and the IAM role of the EMR cluster is different, ensure that the roles have a trust relationship defined. For details, see Modifying a Role.
Overwriting Existing Directory
If the final export path already exists, for example, when full export is performed twice, the default behaviour is to fail the job. To enable the overwrite of target directory, set the export_fs_data_overwrite configuration to true in either the table advanced configuration, source advanced configuration or global configuration based on the required scope.
WARNING: All existing data will be deleted.
S3 HDFS Schemes
Multiple implementations of S3 HDFS filesystems are available which can be used based on the environment. The filesystem is specified by the scheme which is the first part of the path like hdfs://, s3://, s3a://.
EMR supports EMRFS (s3://) proprietary filesystem which is the recommended scheme on EMR. This scheme is not available on other environments.
NOTE: An older implementation of S3 HDFS filesystem is available which uses s3://. In the newer versions of EMR, s3:// refers to EMRFS, while on non-EMR environments s3:// refers to a deprecated older implementation.
On non-EMR systems, it is recommended to use the s3a:// scheme. S3A is an open source implementation and is readily available in most Hadoop distributions.
Spark Configuration
- For the following error message: "Inputs Invalid master URL: spark://{{SPARKMASTER}}:7077", do the following:
Add a new parameter: spark_master as yarn in the advanced configuration section.
- For the following error message:"java.lang.NoSuchMethodError: org.apache.spark.network.client.TransportClient.getChannel()Lio/netty/channel/Channel;", do the following:
Change the spark.master as local, in the {SPARK_HOME}/conf/spark-defaults.conf file.
For more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
(C) 2015-2022 Infoworks.io, Inc. and Confidential