Home

Awesome

<h1 align="center">DataToken</h1>

Overview

<div align="center"> <img src="./docs/figures/tree.png" width="95%"> </div>

中文版

This project implements a new decentralized data management and off-chain trusted computing middleware, DataToken SDK. It is developed by Ownership Labs and supported by the LatticeX Foundation. Design philosophies can be found in the grants and paper. The SDK leverages the trusted features of blockchains to return data ownership to its owners while maintaining the computability of data.

Motivation

Our vision is to make the data flows more transparent. To achieve it, we design a new data service specification for traceable computation and hierarchical aggregation. Data owners can declare a permitted list of trusted operators and related constraints in the data service terms. Data aggregators can define trusted, distributed computing workflows on multiple data assets, formalizing data in different domains into an aggregated data union. Data buyers can directly purchase aggregated datasets and confirm the origins of each data inside it.

Specifically, only when the pre-declared constraints are satisfied, assets will be authorized for aggregated computation. This process can be executed automatically without manually audits, ultimately enabling data assets to be defined once and sold multiple times. This design is consistent with the structure of real-world data flows, and the whole lifecycle of data sharing and utilization becomes more transparent, compliant and traceable.

System Design

ModuleDescription
dt-contractssmart contracts for data token
DataTokenaccess control for decentralized data and runtime for computation monetization
Compute-to-Datasmart data grid and on-premise computing system
AuthComputadata science framework for constrained, authorized, privacy-preserving ML

SDK Guides

highlights

The repo provides several key services for data collaboration, including System module, Asset module, Job module, Tracer module and Verifier module. Different modules are designed for different participators:

The definition of data unions and trusted workflow service specification can be found in the AuthComputa repository.

play with it

You first need to deploy dt-contracts, refer to Deployment Tutorial. Then set up the config.ini in the DataToken directory (e.g., artifacts_path and address_file), and modify the accounts in the test files, e.g., using the four private keys provided by ganache-cli.

Run the following commands:

$ git clone https://github.com/ownership-labs/DataToken
$ git clone https://github.com/ownership-labs/dt-contracts
$ cd DataToken
$ export PYTHONPATH=$PYTHONPATH:../DataToken
$ pip install -r requirements.txt --no-deps
$ python tests/test.py

When you run it multiple times or modify the constraint parameters, the command line will print out the whole lifecycle of data sharing and utilization.

<div align="center"> <img src="./docs/figures/test.png" width="95%"> </div>

examples and tutorials

We provide several use cases, including cross-site data collaboration (between enterprises) and edge federated learning (between users), see the [examples](. /examples). We also design a smart data grid for serving private machine learning of sensitive data assets, see the Compute-to-Data. With DataToken combined, data owners can quickly define allowed AI services and the data grid will automatically verify the external data usage requests. Third-party scientists can start remote executions and get results on data they cannot see. In other words, data owners run the codes on-premise and thus monetize the computation rights of private data.