Home

Awesome

Documentation

all kinds of đź“š for Open Mined

This top level project should provide helpful links to all available documentation.

<!-- TOC depthFrom:2 --> <!-- /TOC -->

Overview

Watch the video

Glossary

Projects

components

Repositories

Workflow

The Open Mined system starts with a Data Scientist creating a request to Sonar in order to begin a mining campaign. The Data Scientist defines some basic parameters:

Bounty
The amount of money the Data Scientist is willing to spend in order the train their model

Input
A description of the data being requested from the mines (i.e. tweets of a user) including the schema.

Output
A description of the desired output data given the input (i.e. recommended hashtags for a tweet) including the schema.

Starting Accuracy
A claim to the initial accuracy of the default model evaluated on a validation dataset owned by Data Scientist.

Target Accuracy
(Default - 0% error)
The desired accuracy a dataset should attain once it has been trained by mines, evaluated on the same validation dataset owned by Data Scientist.

Algorithm (optional)
Which of the algorithms in Syft the Data Scientist would like to use, optionally including an initial weight position (in case Data Scientist wants to start the campaign with a model from a different campaign).

A neural network is generated from this campaign and placed on the Sonar blockchain. Sonar then communicates with Capsule which generates a public key and a private key (for Homomorphic Encryption) using PGP encryption. The public key is then sent back to Sonar and stored with the neural network. The private key is kept secret in Capsule. At this point, Mines are able to pull down the Model (neural network) and attempt to train it. Upon completion of training, the computed gradient is then uploaded back to the Sonar blockchain. At this point, it can be determined how much the uploaded gradient affected the accuracy of the Model.

For instance, if the initial accuracy of the dataset before training takes place is at 0% (meaning it’s never been trained before) and the first mine to attempt the training increases the accuracy to 2%, then that mine is responsible for a 2% increase in the accuracy of the gradient and will receive 2% of the bounty.

After all mining has been completed and the dataset is sufficiently trained by numerous mines, the encrypted dataset is sent to Capsule to be decrypted using the private key and is then delivered to the Data Scientist. At such point, we begin issuing payouts to all the responsible mines. Using an example: if the bounty was set at $1,000 with an accuracy threshold of 80%, then a 2% increase in the accuracy would yield a 2.5% relative accuracy increase and that mine’s payout would be $25. The formula is as such: (total accuracy increase / desired accuracy) x total bounty.

Incentives

Data Scientists are incentivized to have high bounties because it means that the competing mines reap a larger payout. A higher bounty means that a dataset will be trained quicker because mines will see it as advantageous. Low bounties would be seen as a “waste of resources” by mines and therefore would take a lot longer to train.

It’s also advantageous for a mine to begin training a dataset as soon as possible since the potential change gradient is higher at the beginning of training.

Sonar is not aware of the mines before training, only the mines are aware of Sonar. The mines are also not aware of the resources of each other, but yet they’re in direct competition. This means that the larger and more varied a mine is in the data it contains, the greater chance it has to make an impact on a neural network. This also means that miners (average people) are incentivized to upload as much personal data as possible into the mine. The earliest creators of mines will also reap the largest benefits, as it commonly the case with distributed blockchains.

Sequential view

workflow

Website

The official Open Mined website is available at openmined.org.

Meetings

All public meetings are posted to the meetings folder.

License

CC BY-SA 4.0