Awesome
Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models
Official implementation. The paper has been accepted to ACM ICMR 2024.
We propose an interactive image retrieval system capable of refining queries based on user relevance feedback in a multi-turn setting. This system incorporates a vision language model (VLM) based image captioner to enhance the quality of text-based queries, resulting in more informative queries with each iteration. Moreover, we introduce a large language model (LLM) based denoiser to refine text-based query expansions, mitigating inaccuracies in image descriptions generated by captioning models.