Home

Awesome

awesome-dl-hw-resources

A curated list of awesome hardware/chip design resources for deep learning

Step 1: Get Inspired

Simon Knowles Talk about developing Intelligent Machines https://www.youtube.com/watch?v=tyW9x5ROl2E

Future of AI Hardware panel discussion https://vimeo.com/238818665

Existing lists:

https://amundtveit.com/2017/07/12/deep-learning-for-embedded-systems/

https://github.com/Piyush3dB/awesome-deep-computation

Energy Estimation

From Vivienne Sze's Lab https://energyestimation.mit.edu/

Machine Learning for chip design

  1. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach https://people.inf.ethz.ch/omutlu/pub/rlmc_isca08.pdf

Chip Design for Machine Learning

Surveys:

Efficient Processing of Deep Neural Networks: A Tutorial and Survey (https://arxiv.org/abs/1703.09039)

Recent advances in efficient computation of deep convolutional neural networks https://link.springer.com/content/pdf/10.1631%2FFITEE.1700789.pdf

Dissertation: EFFICIENT METHODS AND HARDWARE FOR DEEP LEARNING https://stacks.stanford.edu/file/druid:qf934gh3708/EFFICIENT%20METHODS%20AND%20HARDWARE%20FOR%20DEEP%20LEARNING-augmented.pdf

Neural-inspired & neuromorphic computing http://www.sciencedirect.com/science/article/pii/S2212683X16300561

16 Views of Hot Chips ‘17 http://www.eetimes.com/document.asp?doc_id=1332192

Papers

Google TPU1 : https://arxiv.org/abs/1704.04760

Optimizing for Fisher's bound by bringing in HPC concepts on a chip https://arxiv.org/pdf/1705.05983.pdf

An Architecture to Accelerate Convolution in Deep Neural Networks https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8070363

Implementation

CNN Hardware Accelerator. http://cs231n.stanford.edu/reports/2017/pdfs/116.pdf https://github.com/kkiningh/cs231n-project

Lectures

Systolic Arrays https://www.youtube.com/watch?v=m_-zjdX7Lmw&t=2668s

Talks

Graphcore :

  1. https://www.youtube.com/watch?v=Gh-Tff7DdzU
  2. https://www.youtube.com/watch?v=cSXbhEsUUGo

DeepPhi:

  1. Efficient Methods and Hardware for Deep Learning https://www.youtube.com/watch?v=eZdOkDtYMoo&index=69

Google TPU:

  1. Dave Patterson's Berkeley Talk https://www.youtube.com/watch?v=fhHAArxwzvQ
  2. Jeff Dean's Systems & Machine Learning Talk https://www.youtube.com/watch?v=PWv4ROEvqmk

Nvidia:

  1. High-Performance Hardware for Machine Learning https://www.youtube.com/watch?v=6oofOSxwUvA
  2. Bill Dally's Talk https://www.youtube.com/watch?v=h3QKvUPg_AI

Companies

Graphcore:

  1. Preliminary IPU benchmarks https://www.graphcore.ai/posts/preliminary-ipu-benchmarks-providing-previously-unseen-performance-for-a-range-of-machine-learning-applications

Nvidia:

  1. Volta https://devblogs.nvidia.com/parallelforall/inside-volta/

Wave Computing:

  1. https://www.nextplatform.com/2017/08/23/first-depth-view-wave-computings-dpu-architecture-systems/

Microsoft:

  1. Brainwave Slides at https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/?utm_source=t.co&utm_medium=referral
  2. At the edge https://www.youtube.com/watch?v=5ZDYWFXrhl8&t=176s

ThinCI:

  1. Graph Processor http://www.eetimes.com/document.asp?doc_id=1332176
  2. QnA http://www.eetimes.com/document.asp?doc_id=1332159

Baidu:

  1. XPU https://www.nextplatform.com/2017/08/22/first-look-baidus-custom-ai-analytics-processor/
  2. Mixed Precision Training https://arxiv.org/abs/1710.03740

Huawei:

  1. NPU: http://newatlas.com/huawei-kirin-970-ai-chip/51186/

DL on Embedded Devices

Anirudh Kaul's presentation: https://www.slideshare.net/anirudhkoul/squeezing-deep-learning-into-mobile-phones

Pete Warden's blog https://petewarden.com/2017/06/22/what-ive-learned-about-neural-network-quantization/

Pete Warden's book: http://www.oreilly.com/data/free/building-mobile-applications-with-tensorflow.csp

Discussion about B/W, Compute restrictions on Embedded Devices https://www.youtube.com/watch?v=FATXK4yyaD0

Stick it

Movidius NN Compute Stick http://uk.rs-online.com/web/p/processor-microcontroller-development-kits/1393655/

Quantization:

https://www.tensorflow.org/performance/quantization

Low Precision Math:

General Matrix Multiplication in Low Precision https://github.com/google/gemmlowp

Arm's Math Library http://arm-software.github.io/CMSIS_5/DSP/html/index.html

8-bit compression https://arxiv.org/abs/1511.04561

Self Driving Car Compute

Talk about Compute requirements of Google's(Waymo) self driving cars by Daniel Rosenband https://www.youtube.com/watch?v=V_KLfSClcHg

Voyage's CEO Oliver Cameron's write-up on compute requirements: https://news.voyage.auto/under-the-hood-of-a-self-driving-car-78e8bbce62a6

Udacity Carla's internals https://medium.com/udacity/how-the-udacity-self-driving-car-works-575365270a40

DL Hardware Choice

Which GPU for deep learning ? http://timdettmers.com/2017/04/09/which-gpu-for-deep-learning/