Home

Awesome

DAWNBench Submission Instructions

Thank you for the interest in DAWNBench!

To add your model to our leaderboard, open a Pull Request with title <Model name> || <Task name> || <Author name> (example PR), with JSON (and TSV where applicable) result files in the format outlined below.

Tasks

CIFAR10 Training

Task Description

We evaluate image classification performance on the CIFAR10 dataset.

For training, we have two metrics:

Including cost is optional and will only be calculated if the costPerHour field is included in the JSON file. Submissions that only aim for time aren't restricted to public cloud infrastructure.

JSON Format

Results for the CIFAR10 training tasks can be reported using a JSON file with the following fields,

In addition, report training progress at the end of every epoch in a TSV with the following format,

epoch\thours\ttop1Accuracy

We will compute time to reach a test set accuracy of 94% by reading off the first entry in the above TSV with a top-1 test set accuracy of at least 94%.

JSON and TSV files are named [author name]_[model name]_[hardware tag]_[framework].json, similar to dawn_resnet56_1k80-gc_tensorflow.[json|tsv]. Put the JSON and TSV files in the CIFAR10/train/ sub-directory.

Example JSON and TSV

JSON

{
    "version": "v1.0",
    "author": "Stanford DAWN",
    "authorEmail": "dawn-bench@cs.stanford.edu",
    "framework": "TensorFlow",
    "codeURL": "https://github.com/stanford-futuredata/dawn-benchmark/tree/master/tensorflow",
    "model": "ResNet 56",
    "hardware": "1 K80 / 30 GB / 8 CPU (Google Cloud)",
    "costPerHour": 0.90,
    "timestamp": "2017-08-14",
    "misc": {}
}

TSV

epoch   hours top1Accuracy
1       0.07166666666666667     33.57
2       0.1461111111111111      52.51
3       0.21805555555555556     61.71
4       0.2902777777777778      69.46
5       0.3622222222222222      71.47
6       0.43416666666666665     69.64
7       0.5061111111111111      75.81
<br/>

CIFAR10 Inference

Task Description

We evaluate image classification performance on the CIFAR10 dataset.

For inference, we have two metrics:

JSON Format

Results for the CIFAR10 inference tasks can be reported using a JSON file with the following fields,

Note that it is only necessary to specify one of the latency and cost fields outlined above. However, it is encouraged to specify both (if available) in a single JSON result file.

JSON files are named [author name]_[model name]_[hardware tag]_[framework].json, similar to dawn_resnet56_1k80-gc_tensorflow.json. Put the JSON file in the CIFAR10/inference/ sub-directory.

Example JSON

{
    "version": "v1.0",
    "author": "Stanford DAWN",
    "authorEmail": "dawn-bench@cs.stanford.edu",
    "framework": "TensorFlow",
    "codeURL": "https://github.com/stanford-futuredata/dawn-benchmark/tree/master/tensorflow",
    "model": "ResNet 56",
    "hardware": "1 K80 / 30 GB / 8 CPU (Google Cloud)",
    "latency": 43.45,
    "cost": 1e-6,
    "accuracy": 94.45,
    "timestamp": "2017-08-14",
    "misc": {}
}
<br/>

ImageNet Training

Task Description

We evaluate image classification performance on the ImageNet dataset.

For training, we have two metrics:

Including cost is optional and will only be calculated if the costPerHour field is included in the JSON file. Submissions that only aim for time aren't restricted to public cloud infrastructure.

JSON Format

Results for the ImageNet training tasks can be reported using a JSON file with the following fields,

In addition, report training progress at the end of every epoch in a TSV with the following format,

epoch\thours\ttop1Accuracy\ttop5Accuracy

We will compute time to reach a top-5 validation accuracy of 93% by reading off the first entry in the above TSV with a top-5 validation accuracy of at least 93%.

JSON and TSV files are named [author name]_[model name]_[hardware tag]_[framework].json, similar to dawn_resnet56_1k80-gc_tensorflow.[json|tsv]. Put the JSON and TSV files in the ImageNet/train/ sub-directory.

Example JSON and TSV

JSON

{
    "version": "v1.0",
    "author": "Stanford DAWN",
    "authorEmail": "dawn-bench@cs.stanford.edu",
    "framework": "TensorFlow",
    "codeURL": "https://github.com/stanford-futuredata/dawn-benchmark/tree/master/tensorflow",
    "model": "ResNet 50",
    "hardware": "1 K80 / 30 GB / 8 CPU (Google Cloud)",
    "costPerHour": 0.90,
    "timestamp": "2017-08-14",
    "misc": {}
}

TSV

epoch   hours top1Accuracy top5Accuracy
1       0.07166666666666667     33.57     68.93
2       0.1461111111111111      52.51     72.48 
3       0.21805555555555556     61.71     81.46
4       0.2902777777777778      69.46     81.92
5       0.3622222222222222      71.47     82.17 
6       0.43416666666666665     69.64     83.68
7       0.5061111111111111      75.81     84.31 
<br/>

ImageNet Inference

Task Description

We evaluate image classification performance on the ImageNet dataset.

For inference, we have two metrics:

JSON Format

Results for the ImageNet inference tasks can be reported using a JSON file with the following fields,

Note that it is only necessary to specify one of the latency and cost fields outlined above. However, it is encouraged to specify both (if available) in a single JSON result file.

JSON files are named [author name]_[model name]_[hardware tag]_[framework].json, similar to dawn_resnet56_1k80-gc_tensorflow.json. Put the JSON file in the ImageNet/inference/ sub-directory.

Example JSON

{
    "version": "v1.0",
    "author": "Stanford DAWN",
    "authorEmail": "dawn-bench@cs.stanford.edu",
    "framework": "TensorFlow",
    "codeURL": "https://github.com/stanford-futuredata/dawn-benchmark/tree/master/tensorflow",
    "model": "ResNet 50",
    "hardware": "1 K80 / 30 GB / 8 CPU (Google Cloud)",
    "latency": 43.45,
    "cost": 4.27e-6,
    "top5Accuracy": 93.45,
    "timestamp": "2017-08-14",
    "misc": {}
}
<br/>

SQuAD Training

Task Description

We evaluate question answering performance on the SQuAD dataset.

For training, we have two metrics:

Including cost is optional and will only be calculated if the costPerHour field is included in the JSON file. Submissions that only aim for time aren't restricted to public cloud infrastructure.

JSON Format

Results for the SQuAD training tasks can be reported using a JSON file with the following fields,

In addition, report training progress at the end of every epoch in a TSV with the following format,

epoch\thours\tf1Score

We will compute time to reach a F1 score of 0.73 by reading off the first entry in the above TSV with a F1 score of at least 0.73.

JSON and TSV files are named [author name]_[model name]_[hardware tag]_[framework].json, similar to dawn_bidaf_1k80-gc_tensorflow.[json|tsv]. Put the JSON and TSV files in the SQuAD/train/ sub-directory.

Example JSON and TSV

JSON

{
    "version": "v1.0",
    "author": "Stanford DAWN",
    "authorEmail": "dawn-bench@cs.stanford.edu",
    "framework": "TensorFlow",
    "codeURL": "https://github.com/stanford-futuredata/dawn-benchmark/tree/master/tensorflow_qa/bi-att-flow",
    "model": "BiDAF",
    "hardware": "1 K80 / 30 GB / 8 CPU (Google Cloud)",
    "costPerHour": 0.90,
    "timestamp": "2017-08-14",
    "misc": {}
}

TSV

epoch   hours f1Score
1     0.7638888888888888      0.5369029640999999
2     1.5238381055555557      0.6606892943
3     2.2855751       0.700419426
4     3.0448481305555557      0.7229908705
5     3.806446388888889       0.731013
6     4.5750864       0.7370445132
7     5.346703258333334       0.7413719296
<br/>

SQuAD Inference

Task Description

We evaluate question answering performance on the SQuAD dataset.

For inference, we have two metrics:

JSON Format

Results for the SQuAD inference tasks can be reported using a JSON file with the following fields,

Note that it is only necessary to specify one of the latency and cost fields outlined above. However, it is encouraged to specify both (if available) in a single JSON result file.

JSON files are named [author name]_[model name]_[hardware tag]_[framework].json, similar to dawn_bidaf_1k80-gc_tensorflow.json. Put the JSON file SQuAD/inference/ sub-directory.

Example JSON

{
    "version": "v1.0",
    "author": "Stanford DAWN",
    "authorEmail": "dawn-bench@cs.stanford.edu",
    "framework": "TensorFlow",
    "codeURL": "https://github.com/stanford-futuredata/dawn-benchmark/tree/master/tensorflow_qa/bi-att-flow",
    "model": "BiDAF",
    "hardware": "1 K80 / 30 GB / 8 CPU (Google Cloud)",
    "latency": 590.0,
    "cost": 2e-6,
    "f1Score": 0.7524165510999999,
    "timestamp": "2017-08-14",
    "misc": {}
}

FAQ

Disclosure: The Stanford DAWN research project is a five-year industrial affiliates program at Stanford University and is financially supported in part by founding members including Intel, Microsoft, NEC, Teradata, VMWare, and Google. For more information, including information regarding Stanford’s policies on openness in research and policies affecting industrial affiliates program membership, please see DAWN's membership page.