Awesome

🎞 VLog: Video as a Long Document

Given a long video, we turn it into a doc containing visual + audio info. By sending this doc to ChatGPT, we can chat over the video!

vlog

News

23/April/2023: We release Huggingface gradio demo!
20/April/2023: We release our project on github and local gradio demo!

To Do List

Done

LLM Reasoner: ChatGPT (multilingual) + LangChain
Vision Captioner: BLIP2 + GRIT
ASR Translator: Whisper (multilingual)
Video Segmenter: KTS
Huggingface Space

Doing

Optimize the codebase efficiency
Improve Vision Models: MiniGPT-4 / LLaVA, Family of Segment-anything
Improve ASR Translator for better alignment
Introduce Temporal dependency
Replace ChatGPT with own trained LLM

🧸 Examples

<details open><summary>[ News - GPT4 launch event ]</summary><img src="./figures/case5.png" alt="GPT4 launch event" style="width: 100%; height: auto;"> </details> <details open><summary>[ TV series - 征服之华强买瓜 ]</summary><img src="./figures/case2.png" alt="华强买瓜" style="width: 100%; height: auto;"> </details> <details><summary>[ TV series - The Big Bang Theory ]</summary><img src="./figures/case4.png" alt="The Big Bang Theory" style="width: 100%; height: auto;"> </details> <details><summary>[ Travel video - Travel in Rome ]</summary><img src="./figures/case1.png" alt="Travel in Rome" style="width: 100%; height: auto;"> </details> <details><summary>[ Vlog - Basketball training ]</summary><img src="./figures/case3.png" alt="Basketball training" style="width: 100%; height: auto;"> </details>

🔨 Preparation

Please find installation instructions in install.md.

🌟 Start here

Run in cmd

python main.py --video_path examples/buy_watermelon.mp4 --openai_api_key xxxxx

The generated video document will be generated and saved in examples/buy_watermelon.log

Run in Gradio

python main_gradio.py --openai_api_key xxxxx

🙋 Suggestion

Stay tuned for our project 🔥

If you have more suggestions or functions need to be implemented in this codebase, feel free to drop us an email kevin.qh.lin@gmail.com, leiwx52@gmail.com or open an issue.

😊 Acknowledgment

This work is based on ChatGPT, BLIP2, GRIT, KTS, Whisper, LangChain, Image2Paragraph.