Home

Awesome

On Enriching Image Captions by Fine-Tuning Large Vision-Language Models with Caption Rewrites

image

Illustration of image caption rewriting using ChatGPT.

In Stage 1, the Keyword Extraction Prompt instructs ChatGPT to generate verbs, nouns, and adjectives (highlighted in brown) from the original caption. In Stage 2, the Caption Generation Prompt guides ChatGPT to generate a rewritten caption. By iteratively applying this prompt, multiple rewritten captions can be generated.

python gen_augdata.py

Use different model

You need to first deploy the following models locally.

python {use_llava/owl/minigpt4}.py

Generated example

image

Result

image