Awesome
Human Visual Attention
This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.
β Latest Update: 10 September 2024. βThis repo is a work in progress. New updates coming soon, stay tuned!! :construction:
π£ Latest News π£
20 April 2024
Our survey paper has been accepted for publication at IJCAI2024 Survey Track!
Our Survey on Human Visual Attention π
π₯π₯ Trends, Applications, and Challenges in Human Attention Modelling π₯π₯
Authors:
Giuseppe Cartella,
Marcella Cornia,
Vittorio Cuculo,
Alessandro D'Amelio,
Dario Zanca,
Giuseppe Boccignone,
Rita Cucchiara
π Table of Contents
-
Human Attention Modelling
-
<details>
<summary>Saliency Prediction</summary>
Year Conference / Journal Title Authors Links 2025 WACV SUM: Saliency Unification through Mamba for Visual Attention Modeling Alireza Hosseini et al. π Paper / Code :octocat: / Project Page 2024 WACV Learning Saliency from Fixations Yasser Abdelaziz Dahou Djilali et al. π Paper / Code :octocat: 2023 CVPR Learning from Unique Perspectives: User-aware Saliency Modeling Shi Chen et al. π Paper 2023 CVPR TempSAL - Uncovering Temporal Information for Deep Saliency Prediction Bahar Aydemir et al. π Paper / Code :octocat: 2023 BMVC Clustered Saliency Prediction Rezvan Sherkat et al. π Paper 2023 NeurIPS What Do Deep Saliency Models Learn about Visual Attention? Shi Chen et al. π Paper / Code :octocat: 2022 Neurocomputing TranSalNet: Towards perceptually relevant visual saliency prediction Jianxun Lou et al. π Paper / Code :octocat: 2020 CVPR STAViS: Spatio-Temporal AudioVisual Saliency Network Antigoni Tsiami et al. π Paper / Code :octocat: 2020 CVPR How much time do you have? Modeling multi-duration saliency Camilo Fosco et al. π Paper / Code :octocat: / Project Page 2018 IEEE Transactions on Image Processing Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model Marcella Cornia et al. π Paper / Code :octocat: 2015 CVPR SALICON: Saliency in Context Ming Jiang et al. π Paper / Project Page 2009 ICCV Learning to Predict Where Humans Look Tilke Judd et al. π Paper 1998 TPAMI A Model of Saliency-Based Visual Attention for Rapid Scene Analysis Laurent Itti et al. π Paper
-
<details>
<summary>Scanpath Prediction</summary>
Year Conference / Journal Title Authors Links 2024 ECCV GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths Xianyu Chen et al. [π Paper] / Code :octocat: 2024 ECCV Look Hear: Gaze Prediction for Speech-directed Human Attention Sounak Mondal et al. π Paper / Code :octocat: 2024 CVPR Beyond Average: Individualized Visual Scanpath Prediction Xianyu Chen et al. π Paper 2024 CVPR Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers Zhibo Yang et al. π Paper / Code :octocat: 2023 arXiv Contrastive Language-Image Pretrained Models are Zero-Shot Human Scanpath Predictors Dario Zanca et al. π Paper / Code + Dataset :octocat: 2023 CVPR Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention Sounak Mondal et al. π Paper / Code :octocat: 2022 ECCV Target-absent Human Attention Zhibo Yang et al. π Paper / Code :octocat: 2022 TMLR Behind the Machine's Gaze: Neural Networks with Biologically-inspired Constraints Exhibit Human-like Visual Attention Leo Schwinn et al. π Paper / Code :octocat: 2022 Journal of Vision DeepGaze III: Modeling free-viewing human scanpaths with deep learning Matthias KΓΌmmerer et al. π Paper / Code :octocat: 2021 CVPR Predicting Human Scanpaths in Visual Question Answering Xianyu Chen et al. π Paper / Code :octocat: 2019 TPAMI Gravitational Laws of Focus of Attention Dario Zanca et al. π Paper / Code :octocat: 2015 Vision Research Saccadic model of eye movements for free-viewing condition Olivier Le Meur et al. π Paper
-
<details>
<summary>Saliency Prediction</summary>
-
Integrating Human Attention in AI models
-
Image and Video Processing
-
<details>
<summary>Visual Recognition</summary>
</details>Year Conference / Journal Title Authors Links 2023 IJCV Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos Minglang Qiao et al. π Paper / Code :octocat: 2022 ECML PKDD Foveated Neural Computation Matteo Tiezzi et al. π Paper / Code :octocat: 2021 WACV Integrating Human Gaze into Attention for Egocentric Activity Recognition Kyle Min et al. π Paper / Code :octocat: 2019 CVPR Learning Unsupervised Video Object Segmentation through Visual Attention Wenguan Wang et al. π Paper / Code :octocat: 2019 CVPR Shifting more attention to video salient object detection Deng-Ping Fan et al. π Paper / Code :octocat: -
<details>
<summary>Graphic Design</summary>
Year Conference / Journal Title Authors Links 2020 ACM Symposium on UIST (User Interface Software and Technology) Predicting Visual Importance Across Graphic Design Types Camilo Fosco et al. π Paper / Code :octocat: 2020 ACM MobileHCI Understanding Visual Saliency in Mobile User Interfaces Luis A. Leiva et al. π Paper 2017 ACM Symposium on UIST (User Interface Software and Technology) Learning Visual Importance for Graphic Designs and Data Visualizations Zoya Bylinskii et al. π Paper / Code :octocat:
-
<details>
<summary>Image Enhancement and Manipulation</summary>
Year Conference / Journal Title Authors Links 2023 CVPR Realistic saliency guided image enhancement S. Mahdi H. Miangoleh et al. π Paper / Code :octocat: / Project Page 2022 CVPR Deep saliency prior for reducing visual distraction Kfir Aberman et al. π Paper / Project Page 2021 CVPR Saliency-guided image translation Lai Jiang et al. π Paper 2017 arXiv Guiding human gaze with convolutional neural networks Leon A. Gatys et al. π Paper
-
<details>
<summary>Image Quality Assessment</summary>
Year Conference / Journal Title Authors Links 2023 CVPR ScanDMM: A Deep Markov Model of Scanpath Prediction for 360Β° Images Xiangjie Sui et al. π Paper / Code :octocat: 2021 ICCV Workshops Saliency-Guided Transformer Network combined with Local Embedding for No-Reference Image Quality Assessment Mengmeng Zhu et al. π Paper 2019 ACMMM SGDNet: An End-to-End Saliency-Guided Deep Neural Network for No-Reference Image Quality Assessment Sheng Yang et al. π Paper / Code :octocat:
-
<details>
<summary>Visual Recognition</summary>
-
Vision-and-Language Applications
-
<details>
<summary>Automatic Captioning</summary>
Year Conference / Journal Title Authors Links 2020 EMNLP Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze Ece Takmaz et al. π Paper / Code :octocat: 2019 ICCV Human Attention in Image Captioning: Dataset and Analysis Sen He et al. π Paper / Code :octocat: 2018 ACM TOMM Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention Marcella Cornia et al. π Paper 2017 CVPR Supervising Neural Attention Models for Video Captioning by Human Gaze Data Youngjae Yu et al. π Paper / Code :octocat: 2016 arXiv Seeing with Humans: Gaze-Assisted Neural Image Captioning Yusuke Sugano et al. π Paper
-
<details>
<summary>Visual Question Answering</summary>
Year Conference / Journal Title Authors Links 2023 EMNLP GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations Muhammet Furkan Ilaslan et al. π Paper / Code :octocat: 2023 CVPR Workshops Multimodal Integration of Human-Like Attention in Visual Question Answering Ekta Sood et al. π Paper / Project Page 2021 CoNLL VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering Ekta Sood et al. π Paper / Dataset + Project Page 2020 ECCV AiR: Attention with Reasoning Capability Shi Chen et al. π Paper / Code :octocat: 2018 AAAI Exploring Human-like Attention Supervision in Visual Question Answering Tingting Qiao et al. π Paper / Code :octocat: 2016 EMNLP Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? Abhishek Das et al. π Paper
-
<details>
<summary>Automatic Captioning</summary>
-
Language Modelling
-
<details>
<summary>Machine Reading Comprehension</summary>
Year Conference / Journal Title Authors Links 2023 ACL Workshops Native Language Prediction from Gaze: a Reproducibility Study Lina Skerath et al. π Paper / Code :octocat: 2022 ETRA Inferring Native and Non-Native Human Reading Comprehension and Subjective Text Difficulty from Scanpaths David R. Reich et al. π Paper / Code :octocat: 2017 ACL Predicting Native Language from Gaze Yevgeni Berzak et al. π Paper
-
<details>
<summary>Natural Language Understanding</summary>
Year Conference / Journal Title Authors Links 2023 EMNLP Pre-Trained Language Models Augmented with Synthetic Scanpaths for Natural Language Understanding Shuwen Deng et al. π Paper / Code :octocat: 2023 EACL Synthesizing Human Gaze Feedback for Improved NLP Performance Varun Khurana et al. π Paper 2020 NeurIPS Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention Ekta Sood et al. π Paper / Project Page
-
<details>
<summary>Machine Reading Comprehension</summary>
-
Domain-Specific Applications
-
<details>
<summary>Robotics</summary>
Year Conference / Journal Title Authors Links 2023 IEEE RA-L GVGNet: Gaze-Directed Visual Grounding for Learning Under-Specified Object Referring Intention Kun Qian et al. π Paper 2022 RSS Gaze Complements Control Input for Goal Prediction During Assisted Teleoperation Reuben M. Aronson et al. π Paper 2019 CoRL Understanding Teacher Gaze Patterns for Robot Learning Akanksha Saran et al. π Paper / Code :octocat: 2019 CoRL Nonverbal Robot Feedback for Human Teachers Sandy H. Huang et al. π Paper
-
<details>
<summary>Autonomous Driving</summary>
Year Conference / Journal Title Authors Links 2023 ICCV FBLNet: FeedBack Loop Network for Driver Attention Prediction Yilong Chen et al. π Paper 2022 IEEE Transactions on Intelligent Transportation Systems DADA: Driver Attention Prediction in Driving Accident Scenarios Jianwu Fang et al. π Paper / Code :octocat: 2021 ICCV MEDIRL: Predicting the Visual Attention of Drivers via Deep Inverse Reinforcement Learning Sonia Baee et al. π Paper / Code :octocat: / Project Page 2020 CVPR βLooking at the right stuffβ - Guided semantic-gaze for autonomous driving Anwesan Pal et al. π Paper / Code :octocat: 2019 ITSC DADA-2000: Can Driving Accident be Predicted by Driver Attention? Analyzed by A Benchmark Jianwu Fang et al. π Paper / Code :octocat: 2018 ACCV Predicting Driver Attention in Critical Situations Ye Xia et al. π Paper / Code :octocat: 2018 TPAMI Predicting the Driverβs Focus of Attention: the DR(eye)VE Project Andrea Palazzi et al. π Paper / Code :octocat:
-
<details>
<summary>Medicine</summary>
Year Conference / Journal Title Authors Links 2024 MICCAI Weakly-supervised Medical Image Segmentation with Gaze Annotations Yuan Zhong et al. π Paper / Code :octocat: 2024 AAAI Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis Zihao Zhao et al. π Paper / Code :octocat: 2024 WACV GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification Bin Wang et al. π Paper / Code :octocat: 2023 WACV Probabilistic Integration of Object Level Annotations in Chest X-ray Classification Tom van Sonsbeek et al. π Paper 2023 IEEE Transactions on Medical Imaging Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning Chong Ma et al. π Paper 2023 Transactions on Neural Networks and Learning Systems Rectify ViT Shortcut Learning by Visual Saliency Chong Ma et al. π Paper 2022 IEEE Transactions on Medical Imaging Follow My Eye: Using Gaze to Supervise Computer-Aided Diagnosis Sheng Wang et al. π Paper / Code :octocat: 2022 MICCAI GazeRadar: A Gaze and Radiomics-Guided Disease Localization Framework Moinak Bhattacharya et al. π Paper / Code :octocat: 2022 ECCV RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attentionβguided Disease Classification Moinak Bhattacharya et al. π Paper / Code :octocat: 2021 Nature Scientific Data Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development Alexandros Karargyris et al. π Paper / Code :octocat: 2021 BMVC Human Attention in Fine-grained Classification Yao Rong et al. π Paper / Code :octocat: 2018 Journal of Medical Imaging Modeling visual search behavior of breast radiologists using a deep convolution neural network Suneeta Mall et al. π Paper
-
<details>
<summary>Robotics</summary>
-
-
Datasets & Benchmarks ππ
- <details> <summary>Datasets</summary>
How to Contribute π
- Fork this repository and clone it locally.
- Create a new branch for your changes:
git checkout -b feature-name
. - Make your changes and commit them:
git commit -m 'Description of the changes'
. - Push to your fork:
git push origin feature-name
. - Open a pull request on the original repository by providing a description of your changes.
This project is in constant development, and we welcome contributions to include the latest research papers in the field or report issues π₯π₯.