Awesome
SPIN
This repository contains the official implementation for the ECCV'22 paper, "SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks".
Code Overview
We provide the implementation of weight sharing version of the ConvMixer model. The main code for the implementation are in the models
directory. The model can be configured by the files in configs
. We provide three example configs.
configs/ConvMixer.yaml
for vanilla ConvMixer model.configs/WS-ConvMixer.yaml
for Weight-shared ConvMixer (WS-ConvMixer) model.configs/WFWS-ConvMixer.yaml
for Weight-fusion Weight-shared ConvMixer (WFWS-ConvMixer) model.
Note that in order to run the model configs/WF-WSConvMixer.yaml
, you must have a corresponding pretrained ConvMixer model. Please refer to our paper for each technique.
Installation
First, clone this repo with
git clone https://github.com/apple/ml-spin.git
The implementation of SPIN reuses the infrastructure of Meta Research's open source project SlowFast. Our modification to the SlowFast code is stored in the spin-slowfast.patch
. To download the SlowFast code and apply our changes, run
bash setup.sh
After getting the codebase ready, follow this link from SlowFast repo to setup your environment and install other dependencies.
Training
After the environment is set up, you can run the following example training script to train a weight sharing ConvMixer model. The script assumes you have a machine with 4-GPUs.
bash run.sh
Pre-trained ConvMixer Models on ImageNet1K
We provide our pretrained models of ConvMixer, WS-ConvMixer and WFWS-ConvMixer in the following table. For the WFWS-ConvMixer, we first initialized the model using the proposed weight fusion technique with mean strategy, and then run the models/fuse_weights.py
to export the fused model after training. In order to re-run the model, please use the WS-ConvMixer configuration. Please note we did a light hyperparameter tunning so the accuracy is slightly higher than the numbers reported in the paper.
C/D/P/K | Weight Sharing? | Weight Fusion? | Sharing Rate | Share Distribution | Sharing Mapping | Accuracy | Model Size |
---|---|---|---|---|---|---|---|
768/32/14/3 | No | No | - | - | - | 76.32% | 79MB |
768/32/14/3 | Yes | No | 2 | Uniform | Sequential | 74.27% | 43MB |
768/32/14/3 | Yes | Mean | 2 | Uniform | Sequential | 75.21% | 43MB |
Citation
If you find our code or paper helps, please consider citing:
@article{spin_eccv22,
author = {Lin, Chien-Yu and Prabhu, Anish and Merth, Thomas and Mehta, Sachin and Ranjan, Anurag and Horton, Maxwell and Rastegari, Mohammad}
title = {SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022}
}