Home

Awesome

SB-OSC

Sendbird's online schema migration for Aurora MySQL

SB-OSC is an online schema change tool for Aurora MySQL databases, designed to dramatically improve performance on large tables by leveraging multithreading in all stages of schema migration process.

It also provides seamless pausing and resuming of tasks to adeptly handle extended operation times of large table schema changes, along with a built-in monitoring system to dynamically control its heavy DML load based on Aurora's performance metrics.

SB-OSC is designed to overcome the limitations that existing migration tools face with large-scale tables, significantly reducing the operational overhead associated with managing large tables.

Please visit our blog for more information.

Takeaways

SB-OSC has its own unique features that differentiate it from existing schema migration tools such as pt-osc and gh-ost.

Multithreading

SB-OSC is designed to leverage multithreading in all stages of the schema migration process, bulk import (initial table copy), binlog event processing, and DML event application.

For binlog event processing, SB-OSC processes binlog files in parallel, which enables it to handle large tables with heavy write loads.

Resumable

SB-OSC is resumable at any stage of the schema migration process. It saves the current state of each stage to database and Redis, allowing users to pause and resume the process at any time, as log as binlog retention is sufficient.

Operation Class

SB-OSC supports operation classes that can override main queries used in the schema migration process. This feature allows users to customize queries for specific tables such as data retention, table redesign, and more.

Also, it provides operation class that allows replication cross different Aurora clusters which can be used in various scenarios such as cross-region replication, cross-account replication, clone cluster replication, etc.

Guide for operation class

Data Validation

SB-OSC provides strong data validation features to ensure data consistency between the source and destination tables. It validates both the bulk import and DML event application stages, and attempts to recover from any inconsistencies.

Index Creation Strategy

SB-OSC allows users to create indexes after the bulk import stage, which can significantly reduce the time required for the initial table copy. This feature is especially useful for large tables with many indexes.

Monitoring

SB-OSC has a built-in monitoring system that dynamically controls its heavy DML load based on Aurora's performance metrics. This feature makes SB-OSC more reliable on production environments, since it will automatically adjust its DML load when production traffic increases.

Requirements

SB-OSC is designed to work with Aurora MySQL database. It's a containerized application that can be run on both Kubernetes and Docker environments.

SB-OSC accepts ROW for binlog format. It is recommended to set binlog-ignore-db to sbosc to prevent SB-OSC from processing its own binlog events.

Detailed requirements and setup instructions can be found in the deployment guide.

Performance

SB-OSC shows high performance on both binlog event processing and bulk import. Following are specs of tables used for performance testing:

Table AliasAvg Row Length (Bytes)Write IOPS (IOPS/m)
A57149
B912502
C613.38 K
D64717.9 K
E104224.4 K
F86151 K
G121160.7 K

Avg Row Length: avg_row_length from information_schema.TABLES
Write IOPS: Average increase of count_write from performance_schema.table_io_waits_summary_by_table per minute.

All tables were in the same Aurora MySQL v3 cluster

Binlog Event Processing

Following are read throughput of binlog event processing in read bytes per minute. By comparing read throughput to total binlog creation rate of the cluster, we can see whether SB-OSC can catch up DML events or not.

Total Binlog Creation Rate: 144 (MB/m)

Table AliasABCDEFG
Read Throughput (MB/m)513589591402466361305

Result shows that SB-OSC can catch up DML events on tables with very high write load.

Bulk Import

To provide general insight on bulk import performance, the test was conducted on table A with no secondary indexes, and no additional traffic.

Actual performance of bulk import can vary depending on the number of secondary indexes, the number of rows, column types, production traffic, etc.

Following are the results of bulk import performance based on instance sizes:

Instance TypeInsert Rate (rows/s)Network Throughput (Bytes/s)Storage Throughput (Bytes/s)CPU Utilization (%)
r6g.2xlarge42.3 K27.2 K457 M55.0
r6g.4xlarge94.0 K45.9 K900 M51.9
r6g.8xlarge158 K72.2 K1.39 G44.6

Insert rate, network throughput, and storage throughput are the average values calculated from CloudWatch metrics.

Comparison with gh-ost

We've compared total migration time of SB-OSC and gh-ost on following conditions:

w/o traffic

ToolTotal Migration TimeCPU Utilization (%)
SB-OSC22m60.6
gh-ost1h 52m19.7

w/ traffic

Traffic was generated only to table C during the migration. (~1.0K inserts/s, ~0.33K updates/s, ~0.33K deletes/s)

ToolTotal Migration TimeCPU Utilization (%)
SB-OSC27m62.7
gh-ost1d+27.4

For gh-ost, we interrupted the migration at 50% (~12h) since ETA kept increasing.

Limitations