Awesome
Meta Gradient Fusion: Training Dynamic-Depth Neural Networks Harmoniously
Important Modification:
if you want to apply meta-gf to your multi-task learning program, please see the guideline in QuicklyApplication_MetaGF_ForMultitask.zip
Install:
Pytorch>=1.9.0
Dataset:
CIFAR10 CIFAR100 ImageNet
Usage:
Run the script in the folder named "Script"(select the corresponding networks)
After training finished, run the script in the "Script/TestScript"(select the corresponding networks)
Please downloading the Cifar datasets and put it to the "./data/cifar"
data └── cifar ├── cifar-100-python │ ├── meta │ ├── test │ └── train ├── cifar-100-python.tar.gz ├── cifar-10-batches-py │ ├── batches.meta │ ├── data_batch_1 │ ├── data_batch_2 │ ├── data_batch_3 │ ├── data_batch_4 │ ├── data_batch_5 │ ├── readme.html │ └── test_batch └── cifar-10-python.tar.gz
Notes:
- We adopt channel-wise weighting for vgg because the layer of vgg is too less
- We adopt layer-wise weighting policy for resnet and msdnet
- We adopt EMA updating policy for the meta-weights training
- We train the ImageNet in distributed mode
This version of code may have bugs. We will continue updating it.
New results:
The accuracy of Cagrad on ImageNet: 58.37/64.21/66.88/68.22/69.42
Acknowledgements
We thanks for the public codes provided by the following works:
Li, H., Zhang, H., Qi, X., Yang, R., Huang, G.: Improved techniques for training adaptive deep networks. In: Proceedings of the IEEE/CVF International Confer- ence on Computer Vision. pp. 1891–1900 (2019)
Kaya, Y., Hong, S., Dumitras, T.: Shallow-deep networks: Understanding and mit- igating network overthinking. In: International Conference on Machine Learning. pp. 3301–3310. PMLR