Awesome

Recycling dormant neurons

Pytorch reimplementation of ReDo (The Dormant Neuron Phenomenon in Deep Reinforcement Learning). The paper establishes the dormant neuron phenomenon, where over the course of training a network with nonstationary targets, a significant portion of the neurons in a deep network become dormant, i.e. their activations become minimal to compared the other neurons in the layer. This phenomenon is particularly prevalent in value-based deep reinforcement learning algorithms, such as DQN and its variants. As a solution, the authors propose to periodically check for dormant neurons and reinitialize them.

Dormant neurons

The score $s_i^{\ell}$ of a neuron $i$ in layer $l$ is defined as the absolute value of its activation $\mathbb{E}_{x \in D} |h_i^{\ell}(x)|$ divided by the normalized average of absolute activations within the layer $\frac{1}{H^{\ell}} \sum_{k \in h} \mathbb{E}_{x \in D}|h_k^{\ell}(x)|$:

$$s_i^{\ell}=\frac{\mathbb{E}_{x \in D}|h_i^{\ell}(x)|}{\frac{1}{H^{\ell}} \sum_{k \in h} \mathbb{E}_{x \in D}|h_k^{\ell}(x)|}$$

A neuron is defined as $\tau$-dormant when $s_i^{\ell} \leq \tau$.

ReDo

Every $F$-th time step:

Check whether a neuron $i$ is $\tau$-dormant.
If a neuron $i$ is $\tau$-dormant:
Re-initialize input weights and bias of $i$.
Set the outgoing weights of $i$ to $0~.$

Results

These results were generated using 3 seeds on DemonAttack-v4. Note I was not using typical hyperparameters for DQN, but instead chose a hyperparameter set to exaggerate the dormant neuron phenomenon.
In particular:

Updates are done every environment step instead of every 4 steps.
Target network updates every 2000 steps instead of every 8000.
Fewer random samples before learing starts.
$\tau=0.1$ instead of $\tau=0.025$.

Episodic Return

Dormant count $\tau=0.0$

Dormant count $\tau=0.1$

I've skipped running 10M or 100M experiments because these are very expensive in terms of compute.

Implementation progress

Update 1:
Fixed and simplified the for-loop in the redo resets.

Udpate 2: The reset-check in the main function was on the wrong level and the re-initializations are now properly done in-place and work.

Update 3: Adam moment step-count reset is crucial for performance. Else the Adam updates will immediately create dead neurons again.
Preliminary results now look promising.

Update 4: Fixed the outgoing weight resets where the mask was generated wrongly and not applied to the outgoing weights. See this issue. Thanks @SaminYeasar!

Citations

Paper:

@inproceedings{sokar2023dormant,
  title={The dormant neuron phenomenon in deep reinforcement learning},
  author={Sokar, Ghada and Agarwal, Rishabh and Castro, Pablo Samuel and Evci, Utku},
  booktitle={International Conference on Machine Learning},
  pages={32145--32168},
  year={2023},
  organization={PMLR}
}

Training code is based on cleanRL:

@article{huang2022cleanrl,
  author  = {Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga and Dipam Chakraborty and Kinal Mehta and João G.M. Araújo},
  title   = {CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms},
  journal = {Journal of Machine Learning Research},
  year    = {2022},
  volume  = {23},
  number  = {274},
  pages   = {1--18},
  url     = {http://jmlr.org/papers/v23/21-1342.html}
}

Replay buffer and wrappers are from Stable Baselines 3:

@misc{raffin2019stable,
  title={Stable baselines3},
  author={Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah},
  year={2019}
}