Home

Awesome

[IMPORTANT] Dynamodb now provides server-side support for cross-region replication using Global Tables. Please use that instead of this client-side library. For more details about Global Tables, please see https://aws.amazon.com/dynamodb/global-tables/

DynamoDB Cross-Region Replication

The DynamoDB cross-region replication process consists of 2 distinct steps:

Requirements

Step 1 (Optional): Table copy (bootstrapping existing data)

This step is necessary if your source table contains existing data, and you would like to sync the data first. Please use the following steps to complete the table copy:

  1. (Optional) If your source table is not receiving live traffic, you may skip this step. Otherwise, if your source table is being continuously updated, you must enable DynamoDB Streams to record these live writes while table copy is ongoing. Enable DynamoDB Streams on your source table with StreamViewType set to "New and old images". For more information on how to do this, please refer to our offical DynamoDB Streams documentation.
  2. Check the read provisioned throughput (RCU) on your source table, and the write provisioned throughput (WCU) on your destination table. Ensure they are set high enough to allow table copy to complete well within 24 hours.
    • Rough calculation: table copy completion time ~= # of items in source table * ceiling(average item size / 1KB) / WCU of destination table.
  3. Start the table copy process, there are a few options:
    • Use the Import/Export option available via the official AWS DynamoDB Console, which exports data to S3 then imports it back to a different DynamoDB table. For more information, please refer to our official Import/Export documentation
    • Use a custom Java tool on awslabs that performs a parallel table scan then writes scanned items to the destination table, also available on Github.
    • Write your own tool to perform the table copy, essentially scanning items in the source table and using parallel PutItem calls to write items into the destination table.

WARNING: If your source table has live writes, make sure the table copy process completes well within 24 hours, because DynamoDB Streams records are only available for 24 hours. If your table copy process takes more than 24 hours, you can potentially end up with inconsistent data across your tables!

Step 2: Real-time updates (applying live stream records)

This step sets up a replication process that continuously consumes DynamoDB stream records from the source table and applies them to the destination table in real-time.

  1. Enable DynamoDB Streams on your source table with StreamViewType set to "New and old images". For more information on how to do this, please refer to our offical DynamoDB Streams documentation.

  2. Build the library:

    mvn install
  1. This produces the target jar in the target/ directory, to start the replication process:
    java -jar target/dynamodb-cross-region-replication-1.2.1.jar --sourceRegion <source_region> --sourceTable <source_table_name> --destinationRegion <destination_region> --destinationTable <destination_table_name>

Use the --help option to view all available arguments to the connector executable jar. The connector process accomplishes a few things:

NOTE: More information on the design and internal structure of the connector library can be found in the design doc. Please note it is your responsibility to ensure the connector process is up and running at all times - replication stops as soon as the process is killed, though upon resuming the process automatically uses the checkpoint table in DynamoDB to restore progress.

Advanced: running replication process across multiple machines

With extremely large tables or tables with high throughput, it might be necessary to split the replication process across multiple machines. In this case, simply kick off the target executable jar with the same command on each machine (i.e. one KCL worker per machine). The processes use the DynamoDB checkpoint table to coordinate and distribute work among them, as a result, it is essential that you use the same taskName for each process, or if you did not specify a taskName, a default one is computed.

Advanced: replicating multiple tables

Each instantiation of the jar executable is for a single replication path only (i.e. one source DynamoDB table to one destination DynamoDB table). To enable replication for multiple tables or create multiple replicas of the same table, a separate instantiation of the cross-region replication library is required. Some examples of replication setup:

Replication Scenario 1: One source table in us-east-1, one replica in each of us-west-2, us-west-1, and eu-west-1

Replication Scenario 2: Two source tables (table1 & table2) in us-east-1, both replicated separately to us-west-2

Can multiple cross-region replication processes run on the same machine?

How can I ensure the process is always up and running?

How can I build the library and run tests? Execute mvn clean verify -Pintegration-tests on the command line. This will download DynamoDB Local and run an integration test against the local instance with CloudWatch metrics disabled.