Home

Awesome

CorrNet+

This repo holds codes of the paper: CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation [paper], which is an extension of our previous work (CVPR 2023) [paper]

For the code supporting continuous sign language recognition, refer to CorrNet_Plus_CSLR for the code.

We currently reserve the code of CorrNet_Plus_SLT.

Performance

<table align="center"> <tbody align="center" valign="center"> <tr> <td rowspan="3">Method</td> <td colspan="4">PHOENIX2014</td> <td colspan="2">PHOENIX2014-T</td> <td colspan="2">CSL-Daily</td> </tr> <tr> <td colspan="2">Dev(%)</td> <td colspan="2">Test(%)</td> <td rowspan="2">Dev(%)</td> <td rowspan="2">Test(%)</td> <td rowspan="2">Dev(%)</td> <td rowspan="2">Test(%)</td> </tr> <tr> <td>del/ins</td> <td>WER</td> <td>del/ins</td> <td>WER</td> </tr> <tr> <td>CVT-SLR (CVPR2023)</td> <td>6.4/2.6</td> <td>19.8</td> <td>6.1/2.3</td> <td>20.1</td> <td>19.4</td> <td>20.3</td> <td>-</td> <td>-</td> </tr> <tr> <td>CoSign-2s (ICCV2023)</td> <td>-</td> <td>19.7</td> <td>-</td> <td>20.1</td> <td>19.5</td> <td>20.1</td> <td>-</td> <td>-</td> </tr> <tr> <td>AdaSize (PR2024)</td> <td>7.0/2.6</td> <td>19.7</td> <td>7.2/3.1</td> <td>20.9</td> <td>19.7</td> <td>21.2</td> <td>31.3</td> <td>30.9</td> </tr> <tr> <td>AdaBrowse+ (ACMMM2023)</td> <td>6.0/2.5</td> <td>19.6</td> <td>5.9/2.6</td> <td>20.7</td> <td>19.5</td> <td>20.6</td> <td>31.2</td> <td>30.7</td> </tr> <tr> <td>SEN (AAAI2023)</td> <td>5.8/2.6</td> <td>19.5</td> <td>7.3/4.0</td> <td>21.0</td> <td>19.3</td> <td>20.7</td> <td>31.1</td> <td>30.7</td> </tr> <tr> <td>CTCA (CVPR2023)</td> <td>6.2/2.9</td> <td>19.5</td> <td>6.1/2.6</td> <td>20.1</td> <td>19.3</td> <td>20.3</td> <td>31.3</td> <td>29.4</td> </tr> <tr> <td>C2SLR (CVPR2022)</td> <td>-</td> <td>20.5</td> <td>-</td> <td>20.4</td> <td>20.2</td> <td>20.4</td> <td>-</td> <td>-</td> </tr> <tr> <th>CorrNet+</th> <td>5.3/2.7</td> <th>18.0</th> <td>5.6/2.4</td> <th>18.2</th> <th>17.2</th> <th>19.1</th> <th>28.6</th> <th>28.2</th> </tr> </tbody> </table> <table> <tbody align="center" valign="center"> <tr> <td colspan="11">PHOENIX2014-T</td> </tr> <tr> <td>Method</td> <td colspan="5">Dev(%)</td> <td colspan="5">Test(%)</td> </tr> <tr> <td></td> <td>Rouge</td> <td>BLEU1</td> <td>BLEU2</td> <td>BLEU3</td> <td>BLEU4</td> <td>Rouge</td> <td>BLEU1</td> <td>BLEU2</td> <td>BLEU3</td> <td>BLEU4</td> </tr> <tr> <td>SignBT (CVPR2021)</td> <td>50.29</td> <td>51.11</td> <td>37.90</td> <td>29.80</td> <td>24.45</td> <td>49.54</td> <td>50.80</td> <td>37.75</td> <td>29.72</td> <td>24.32</td> </tr> <tr> <td>MMTLB (CVPR2022)</td> <td>53.10</td> <td>53.95</td> <td>41.12</td> <td>33.14</td> <td>27.61</td> <td>52.65</td> <td>53.97</td> <td>41.75</td> <td>33.84</td> <td>28.39</td> </tr> <tr> <td>SLTUNET (ICLR2023)</td> <td>52.23</td> <td>-</td> <td>-</td> <td>-</td> <td>27.87</td> <td>52.11</td> <td>52.92</td> <td>41.76</td> <td>33.99</td> <td>28.47</td> </tr> <tr> <td>TwoStream-SLT (NeuIPS2023)</td> <td>54.08</td> <td>54.32</td> <td>41.99</td> <td>34.15</td> <td>28.66</td> <td>53.48</td> <td>54.90</td> <td>42.43</td> <td>34.46</td> <td>28.95</td> </tr> <tr> <td>CorrNet+</td> <th>54.54</th> <th>54.56</th> <th>42.31</th> <th>34.48</th> <th>29.13</th> <th>53.76</th> <th>55.32</th> <th>42.74</th> <th>34.86</th> <th>29.42</th> </tr> <tr> <td colspan="11">CSL-Daily</td> </tr> <tr> <td>Method</td> <td colspan="5">Dev(%)</td> <td colspan="5">Test(%)</td> </tr> <tr> <td></td> <td>Rouge</td> <td>BLEU1</td> <td>BLEU2</td> <td>BLEU3</td> <td>BLEU4</td> <td>Rouge</td> <td>BLEU1</td> <td>BLEU2</td> <td>BLEU3</td> <td>BLEU4</td> </tr> <tr> <td>SignBT (CVPR2021)</td> <td>49.49</td> <td>51.46</td> <td>37.23</td> <td>27.51</td> <td>20.80</td> <td>49.31</td> <td>51.42</td> <td>37.26</td> <td>27.76</td> <td>21.34</td> </tr> <tr> <td>MMTLB (CVPR2022)</td> <td>53.38</td> <td>53.81</td> <td>40.84</td> <td>31.29</td> <td>24.42</td> <td>53.25</td> <td>53.31</td> <td>40.41</td> <td>30.87</td> <td>23.92</td> </tr> <tr> <td>SLTUNET (ICLR2023)</td> <td>53.58</td> <td>-</td> <td>-</td> <td>-</td> <td>23.99</td> <td>54.08</td> <td>54.98</td> <td>41.44</td> <td>31.84</td> <td>25.01</td> </tr> <tr> <td>TwoStream-SLT (NeuIPS2023)</td> <td>55.10</td> <td>55.21</td> <td>42.31</td> <td>32.71</td> <td>25.76</td> <td>55.72</td> <td>55.44</td> <td>42.59</td> <td>32.87</td> <td>25.79</td> </tr> <tr> <td>CorrNet+</td> <th>55.52</th> <th>55.64</th> <th>42.78</th> <th>33.13</th> <th>26.14</th> <th>55.84</th> <th>55.82</th> <th>42.96</th> <th>33.26</th> <th>26.14</th> </tr> </tbody> </table>

Visualizations

As shown below, our method intelligently models the human body trajectories across adjacent frames and pays special attention to the moving human body parts.

The visualizations of spatial-temporal correlation maps

Data preparation, Environment, Training, Inference and Visualizations

For detailed instructions of data preparation, environment, training, inference and visualizations, please refer to each sub-repo for guidance.