Awesome

sql-sync

Offline replication between SQLite (clients) and MySQL (master). This project is not a library, rather it is example code. Some parts of it (code in lib) could be reused in other projects. The code mainly just tests that the approach described here works. The code is built for Node.JS.

Assumes the following:

Primary keys are UUIDs (here 36-character strings);
No foreign key constraints.

It is possible to use foreign key constraints but then table updates must be reordered correctly (complicates the code a lot!).

It is possible to use natural keys instead of UUIDs but there do not always exist good natural keys. In most cases large composite keys would have to be used. It is not possible to use autoincremented keys because of key value conflicts.

Metainfo

On the client side table actions (INSERT/UPDATE/DELETE) are recorded into the metadata table sync with the following structure:

CREATE TABLE sync (
    action INTEGER NOT NULL,
    keyval CHARACTER(36) NOT NULL,
    tid INTEGER NOT NULL,
    PRIMARY KEY (keyval) ON CONFLICT REPLACE
);

In the table:

action - 0 marks insert/update, 1 marks delete;
keyval - primary key value in the row;
tid - table id (used by triggers below).

Synced tables are kept in the table sync_table:

CREATE TABLE sync_table (
    tid INTEGER NOT NULL,
    name VARCHAR(255) NOT NULL,
    keycol VARCHAR(255) NOT NULL,
    PRIMARY KEY (tid),
    UNIQUE (name)
);

In the table:

tid - table id.
name - table name.
keycol - name of primary key column in the table.

For each table in sync_table the following triggers have to be created:

CREATE TRIGGER <table>_insert
AFTER INSERT ON <table> FOR EACH ROW
BEGIN
    INSERT INTO sync (action, keyval, tid)
    VALUES (0, NEW.<keycol>, <tableid>);
END;

CREATE TRIGGER <table>_update
AFTER UPDATE ON <table> FOR EACH ROW
BEGIN
    INSERT INTO sync(action, keyval, tid)
    VALUES (1, OLD.<keycol>, <tableid>);
    INSERT INTO sync(action, keyval, tid)
    VALUES (0, NEW.<keycol>, <tableid>);
END;

CREATE TRIGGER <table>_delete
AFTER DELETE ON <table> FOR EACH ROW
BEGIN
    INSERT INTO sync(action, keyval, tid)
    VALUES (1, OLD.<keycol>, <tableid>);
END;

A special table is used for storing the last revision number (given by the server at the end of sync). This is sent with each sync request (but is not updated on each data table action on the client):

CREATE TABLE revision (
    rev UNSIGNED BIG INT NOT NULL
);

Metainfo tables on the server are similar. The main difference is in the sync table:

CREATE TABLE sync (
    action TINYINT UNSIGNED NOT NULL,
    keyval CHAR(36) NOT NULL,
    tid TINYINT UNSIGNED NOT NULL,
    rev BIGINT UNSIGNED NOT NULL,
    PRIMARY KEY (keyval)
);

The rev field is updated with each data table action. This is done with the help of the following stored procedure:

CREATE PROCEDURE sync_mark (
    in_keyval CHAR(36),
    in_tid TINYINT UNSIGNED,
    in_action VARCHAR(10)
)
BEGIN
    INSERT INTO sync (action, keyval, tid, rev)
    VALUES (
        in_action,
        in_keyval,
        in_tid,
        (SELECT rev + 1 FROM revision)
    ) ON DUPLICATE KEY UPDATE action = VALUES(action), rev = VALUES(rev);
    UPDATE revision SET rev = rev + 1;
END

The procedure is called by triggers. They have to be created for each data table:

CREATE TRIGGER <table>_insert
AFTER INSERT ON <table> FOR EACH ROW
BEGIN CALL sync_mark(NEW.<keycol>, <tableid>, 0); END

CREATE TRIGGER <table>_update
AFTER UPDATE ON <table> FOR EACH ROW
BEGIN
    CALL sync_mark(OLD.<keycol>, <tableid>, 1);
    CALL sync_mark(NEW.<keycol>, <tableid>, 0);
END

CREATE TRIGGER <table>_delete
AFTER DELETE ON <table> FOR EACH ROW
BEGIN CALL sync_mark(OLD.<keycol>, <tableid>, 1); END

General algorithm

Client finds all changes.
Client finds all deletes.
Client finds clrev, the last revision the client synced with the server.
Client sends changes, deletes and clrev to server.
Server locks meta and data tables.
Server finds current revision crrev.
Server applies client changes and deletes.
Server finds current revison again, nwrev.
Server finds all changes between clrev and crrev.
Server finds all deletes between clrev and crrev.
Server unlocks tables.
Server sends changes, deletes and nwrev back to the client.
Client stores nwrev.

Finding changes on the client (this and others have to be executed per data table):

SELECT <table>.* FROM <table>
JOIN sync ON (<table>.<keycol> = sync.keyval)
WHERE sync.action = 0 AND sync.tid = <tableid>
ORDER BY <keycol>;

Finding deletes on the client:

SELECT keyval FROM sync
WHERE action = 1 AND tid = <tableid> ORDER BY keyval;

Applying changes on the server (per row):

INSERT INTO <table> (col1, col2, ...)
VALUES (val1, val2, ...)
ON DUPLICATE KEY UPDATE col1 = VALUES(col1),
col2 = VALUES(col2), ...;

Applying deletes on the server (per deleted row):

DELETE FROM <table> WHERE <keycol> = value;

Finding changes on the server:

SELECT <table>.* FROM <table>
JOIN sync ON (<table>.<keycol> = sync.keyval
WHERE sync.action = 0 AND sync.rev > <clrev>
AND sync.rev <= <crrev>
AND sync.tid = <tableid> ORDER BY <keycol>;

Finding deletes on the server:

SELECT keyval FROM sync
WHERE action = 1 AND rev > <clrev>
AND rev <= <crrev> AND tid = <tableid>
ORDER BY keyval;

Applying changes on the client (per row, both queries are needed):

INSERT OR IGNORE INTO <table> (col1, col2, ...)
VALUES (val1, val2, ...);
UPDATE <table> SET col1 = val1, col2 = val2, ...
WHERE <keycol> = value;

Applying deletes on the server (per deleted row):

DELETE FROM <table>
WHERE <keycol> = value;

Multiuser case

When data is kept by user (data tables contain some sort of user id) then the revision table on the server side must also contain user id. Queries on the server must take it into account (add to WHERE clauses or SET user id when inserting). The procedure sync_mark has to be rewritten to update by-user rev value.

Sync data over HTTP

Both the server and the client send JSON object in the form:

{
    "<table>": {
        "deletes": [ "keyval1", ... ],
        "changes": [
            {
                "prop1": "value1",
                "prop2": 23
            }
        ]
    },
    "revision": 120
}

Running tests

Install dependencies:

npm install

And install Mocha:

npm install mocha -g

Modify MySQL connection details in Makefile and tests/helpers/server.js. Run make test. Tests will create test schema in two clients and the server and run various random operations in each. Then synchronize and check for data consistency.

License

The MIT License.

Copyright (c) 2013 Raivo Laanemets

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software,
and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.