Awesome

TF-NAS

Official Pytorch code of paper TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search in ECCV2020.

With the flourish of differentiable neural architecture search (NAS), automatically searching latency-constrained architectures gives a new perspective to reduce human labor and expertise. However, the searched architectures are usually suboptimal in accuracy and may have large jitters around the target latency. In this paper, we rethink three freedoms of differentiable NAS, i.e. operation-level, depth-level and width-level, and propose a novel method, named Three-Freedom NAS (TF-NAS), to achieve both good classification accuracy and precise latency constraint. For the operation-level, we present a bi-sampling search algorithm to moderate the operation collapse. For the depth-level, we introduce a sink-connecting search space to ensure the mutual exclusion between skip and other candidate operations, as well as eliminate the architecture redundancy. For the width-level, we propose an elasticity-scaling strategy that achieves precise latency constraint in a progressively fine-grained manner. Experiments on ImageNet demonstrate the effectiveness of TF-NAS. Particularly, our searched TF-NAS-A obtains 76.9% top-1 accuracy, achieving state-of-the-art results with less latency. The total search time is only 1.8 days on 1 Titan RTX GPU.

Overall_Framework

Requirements

Python 3.7
Pytorch >= 1.1.0
torchvision >= 0.3.0
(Optional) apex from this link

Model Zoo

Our pretrained models can be downloaded in the following links. The complete list of the models can be found here.

Name	FLOPs	Top-1(%)	Top-5(%)	GPU Lat	CPU Lat	Pretrained Model
TF-NAS-A	457M	76.87	93.11	18.03ms	80.14ms	Google Drive
TF-NAS-B	361M	76.28	92.88	15.06ms	72.10ms	Google Drive
TF-NAS-C	284M	75.15	92.13	11.95ms	51.87ms	Google Drive
TF-NAS-D	219M	74.19	91.45	10.08ms	46.09ms	Google Drive
TF-NAS-CPU-A	305M	75.83	92.57	14.00ms	60.11ms	Google Drive
TF-NAS-CPU-B	230M	74.44	91.82	10.29ms	40.09ms	Google Drive

GPU and CPU Lat is measured on Titan RTX 24G GPU and Intel Xeon Gold 6130 @ 2.10GHz, respectively.

Search

For searching, taking the following script as an example:

CUDA_VISIBLE_DEVICES=0 python -u train_search.py \
	--img_root "Your ImageNet Train Set Path" \
	--train_list "./dataset/ImageNet-100-effb0_train_cls_ratio0.8.txt" \
	--val_list "./dataset/ImageNet-100-effb0_val_cls_ratio0.8.txt" \
	--lookup_path "./latency_pkl/latency_gpu.pkl" \
	--target_lat 15.0

For GPU latency, set --lookup_path to ./latency_pkl/latency_gpu.pkl. For CPU latency, set --lookup_path to ./latency_pkl/latency_cpu.pkl.
You can search with different target latencies by changing --target_lat. Please refer to example.sh for more details.

After searching, you can parse the searched architecture by:

CUDA_VISIBLE_DEVICES=3 python -u parsing_model.py \
	--model_path "Searched Model Path" \
	--save_path "./model.config" \
	--lookup_path "./latency_pkl/latency_gpu.pkl"

You will get a model config file for training and testing, as well as some model profile information.

Train

If apex is not installed, please employ train_eval.py.

Set --model_path to "Searched Model Path". It will parse and train the searched architecture.

CUDA_VISIBLE_DEVICES=0,1 python -u train_eval.py \
	--train_root "Your ImageNet Train Set Path" \
	--val_root "Your ImageNet Val Set Path" \
	--train_list "ImageNet Train List" \
	--val_list "ImageNet Val List" \
	--model_path "Searched Model Path"

Or set --config_path to the parsed model config file.

CUDA_VISIBLE_DEVICES=0,1 python -u train_eval.py \
	--train_root "Your ImageNet Train Set Path" \
	--val_root "Your ImageNet Val Set Path" \
	--train_list "ImageNet Train List" \
	--val_list "ImageNet Val List" \
	--config_path "./model.config"

If apex is installed, please employ train_eval_amp.py. We highly recommend to use mixed precision and distributed training in apex.

Automatic Mixed Precision

CUDA_VISIBLE_DEVICES=0,1 python -u train_eval_amp.py \
	--train_root "Your ImageNet Train Set Path" \
	--val_root "Your ImageNet Val Set Path" \
	--train_list "ImageNet Train List" \
	--val_list "ImageNet Val List" \
	--config_path "./model.config" \
	--opt_level "O1"

Automatic Mixed Precision + DistributedDataParallel

CUDA_VISIBLE_DEVICES=0,1 python -u -m torch.distributed.launch --nproc_per_node=2 train_eval_amp.py \
	--train_root "Your ImageNet Train Set Path" \
	--val_root "Your ImageNet Val Set Path" \
	--train_list "ImageNet Train List" \
	--val_list "ImageNet Val List" \
	--config_path "./model.config" \
	--opt_level "O1"

Please refer to example.sh for more details.

Test

After training, you can test the trained model by:

CUDA_VISIBLE_DEVICES=0 python -u test.py \
	--val_root "Your ImageNet Val Set Path" \
	--val_list "ImageNet Val List" \
	--model_path "./model.config" \
	--weights "Pretrained Weights"

Other

If you are interested in ImageNet training or want to try more tricks, schedulers and properties, please browse this repo.

License

TF-NAS is released under the MIT license. Please see the LICENSE file for more information.

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows.

@inproceedings{Hu2020TFNAS,
  title     =  {TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search},
  author    =  {Yibo Hu, Xiang Wu and Ran He},
  booktitle =  {Proc. Eur. Conf. Computer Vision (ECCV)},
  year      =  {2020}
}