MapR-DB Export

This feature exports data from Hive to MapR-DB JSON table.

Field Description

  • Export Type: Select Full Export or Incremental Export.
  • Target Database: Select the database type to be exported as MapR-DB JSON.

Connection Parameters

  • Target Table: Enter the name of the target table with an optional path specified. For example, /tmp/centos/table-name.
  • Table Exist: By default, the first time load into a target table will drop it and re-create the table with the parameters specified. You can avoid this by selecting the checkbox.
  • Check this option if the table already exists in the MapR-DB. If the table does not exist, a new table with the name you provide will be created in the MapR-DB.

NOTE: If you select this option, ensure that the existing table includes all the columns and properties created. Infoworks will not create or check the existing table for any property in this case and the export job might fail if the corresponding columns or property are not found.

  • Row Delete Enabled: Select this option to enable deletion of records in incremental export type job. Records that are marked as deleted in the source tables and pipeline targets are deleted from MapR-DB. If the audit column ziw_is_deleted is set true, the record is marked as deleted. The export job deletes the records from MapR-DB first and then upserts the incremental data. Deletes are supported only for ID field of string datatype.

NOTE: Delete and updates/inserts for the same natural key value within the same export job might cause inconsistency.

  • Export Columns: Select the columns that must be exported to the target table. At least one column must be selected.
  • Natural Key Columns: Enter the column to be used as natural key in the MapR-DB target table.

NOTE: MapRDB supports only columns with string or binary data type as natural key.

  • Export Method: Select the method to export data to MapR-DB table (sequential and bulk).
  • Sequential export reads the data from Hive and sequentially export data to MapR-DB.
  • Bulk export sorts the data and then export the data in bulk.

Both the methods use spark to perform the export. The performance is generally comparable in both methods but it is recommended to explore both and choose the method that works efficiently for your data.

MapR-DB Export Datatypes

The export feature supports the following datatypes in Hive:

  • TINYINT
  • SMALLINT
  • INTEGER
  • BIGINT
  • TIMESTAMP
  • DATE
  • STRING
  • BOOLEAN
  • BINARY
  • CHAR
  • VARCHAR
  • STRUCT
  • ARRAY MapR-DB does not support the following datatypes:
  • UNION
  • MAP

Troubleshooting

IW Constants and Configurations

  • spark_master: the spark master address. By default, the value is picked from the conf.properties file located at {IW_HOME}/conf/conf.properties.
  • MapR-DB export is performed via spark. Any extra spark configuration can be added by editing the file located at {IW_HOME}/conf/spark-export.conf.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard