Awesome

Apollo-Cap

Pytorch implementation of [Apollo: Zero-Shot Multimodal Reasoning with Multiple Experts](todo put here link arxiv)

Stylized Image Caption Generation

Examples from SentiCap and Flickrstyle10k datasets:

Ablation study of our approach for positive caption

Audio-Aware Image Caption Generation

APOLLO-CAP-P caption examples for images and audio clips featuring children’s laughter:

Set up environment:

$ pip install requirements.txt
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/openai/CLIP.git

The code was tested successfully on Intel Xeon with NVIDIA RTX 2080 Ti and CUDA 11.4.

In order to use Apollo-Cap-PD:

git clone https://github.com/danielabd/CLIP

Usage

To execute the following commands, go to zero_shot_style/.

To generate stylized image caption:

Apollo-Cap:

positive:

python run.py --style positive --ce_scale 1.96 --clip_scale 2.19 --text_style_scale 9.68 --sentiment_temperature 0.001 --use_img_path ./example_images/000000276434.jpg

negative:

python run.py --style negative --ce_scale 2.855 --clip_scale 5.036 --text_style_scale 11.9 --sentiment_temperature 0.001 --use_img_path ./example_images/000000155617.jpg

humor:

python run.py --style humor --ce_scale 0.6604141408776456 --clip_scale 1 --text_style_scale 2.9876837003652907 --sentiment_temperature 0.001 --use_img_path ./example_images/2211593099_4a4f1c85d2.jpg

romantic:

python run.py --style romantic --ce_scale 0.7097647446401579 --clip_scale 1 --text_style_scale 4.332869432646197 --sentiment_temperature 0.001 --use_img_path ./example_images/1579287915_4257c54451.jpg

Apollo-Cap-P:

positive:

python run.py --style positive --mul_clip_style --ce_scale 4 --clip_scale 8 --text_style_scale 0 --sentiment_temperature 0.01 --use_img_path ./example_images/000000274455.jpg

negative:

python run.py --style negative --mul_clip_style --ce_scale 0.6209475551271303 --clip_scale 2 --text_style_scale 0 --sentiment_temperature 0.08555964306820746 --use_img_path ./example_images/000000217303.jpg

humor:

python run.py --style humor --mul_clip_style --ce_scale 0.4173438996507689 --clip_scale 1 --text_style_scale 0 --sentiment_temperature 0.05089738868653932 --use_img_path ./example_images/311267421_e204e643cf.jpg

romantic:

python run.py --style romantic --mul_clip_style --ce_scale 0.5 --clip_scale 1 --text_style_scale 0 --sentiment_temperature 0.05364761206623257 --use_img_path ./example_images/3457315666_b943111dec.jpg

Apollo-Cap-PD:

positive

python run.py --style positive --update_ViT --mul_clip_style --ce_scale 0.2214432225421577 --clip_scale 1 --text_style_scale 0 --sentiment_temperature 0.1430339855494212 --num_iterations_clip_style 1 --use_img_path ./example_images/000000276434.jpg

negative

python run.py --style negative --update_ViT --mul_clip_style --ce_scale 0.6070550610590508 --clip_scale 2 --text_style_scale 0 --sentiment_temperature 0.17425402664880124 --num_iterations_clip_style 1 --use_img_path ./example_images/000000077954.jpg

humor:

python run.py --style humor --update_ViT --mul_clip_style --ce_scale 0.3426740175716766 --clip_scale 1 --text_style_scale 0 --sentiment_temperature 0.05655316717625009 --num_iterations_clip_style 1 --use_img_path ./example_images/940973925_a2e6d7951c.jpg

romantic:

python run.py --style romantic --update_ViT --mul_clip_style --ce_scale 0.3735 --clip_scale 1 --text_style_scale 0 --sentiment_temperature 0.0672676323972359 --num_iterations_clip_style 1 --use_img_path ./example_images/1489286545_8df476fa26.jpg

To generate audio-aware image caption:

Apollo-Cap-P:

python run.py --mul_clip_style --ce_scale 4 --clip_scale 8 --text_style_scale 0 --sentiment_temperature 0.01 --use_audio_model --audio_path ./child_laughing.wav --audio_sampling_rate 24000 --use_img_path ./example_images/000000155617.jpg

Citation

If you use our work for your research, please cite us.