DataReplicator

Overview

Infoworks Replicator is a feature used for Hive Data Warehouse migration.This can be used to migrate the hive data and metadata to another cluster or cloud. It can also be used to keep two clusters in synchronization with respect to their hive data and metadata. It replicates data between HDFS and all cloud providers that implement the HDFS API to access their data.

The replicator can replicate data between secure and non-secure clusters. It keeps the data integrity via checksum checks post transfers. Fine grained control over replication is provided by user configurable parallelism and network bandwidth throttling over every parallel unit that transfers data. It also preserves all the attributes of the transferred files and directories.

Replicator can be installed and executed on both source and destination.

This chapter will walk you through the following procedures involved in replication:

Creating a cluster and crawling the source cluster metadata.
Creating a domain.
Creating a workflow in the domain and configuring the replicator nodes in the workflow with the clusters created.
Executing or scheduling the workflow.
Configuring and starting the replication from one Hive Data Warehouse to another.

Last updated on