Awesome
π VLog: Video as a Long Document
<a src="https://img.shields.io/badge/%F0%9F%A4%97-Open%20in%20Spaces-blue" href="https://huggingface.co/spaces/TencentARC/Vlog"> <img src="https://img.shields.io/badge/%F0%9F%A4%97-Open%20in%20Spaces-blue" alt="Open in Spaces"> </a> <a src="https://img.shields.io/twitter/url?color=blue&label=Tweet&logo=twitter&url=https%3A%2F%2Ftwitter.com%2FKevinQHLin%2Fstatus%2F1649124447037841408" href="https://twitter.com/KevinQHLin/status/1649124447037841408"> <img src="https://img.shields.io/twitter/url?color=blue&label=Tweet&logo=twitter&url=https%3A%2F%2Ftwitter.com%2FKevinQHLin%2Fstatus%2F1649124447037841408" alt="Tweet"> </a>Given a long video, we turn it into a doc containing visual + audio info. By sending this doc to ChatGPT, we can chat over the video!
News
- 23/April/2023: We release Huggingface gradio demo!
- 20/April/2023: We release our project on github and local gradio demo!
To Do List
Done
- LLM Reasoner: ChatGPT (multilingual) + LangChain
- Vision Captioner: BLIP2 + GRIT
- ASR Translator: Whisper (multilingual)
- Video Segmenter: KTS
- Huggingface Space
Doing
- Optimize the codebase efficiency
- Improve Vision Models: MiniGPT-4 / LLaVA, Family of Segment-anything
- Improve ASR Translator for better alignment
- Introduce Temporal dependency
- Replace ChatGPT with own trained LLM
π§Έ Examples
<details open><summary>[ News - GPT4 launch event ]</summary><img src="./figures/case5.png" alt="GPT4 launch event" style="width: 100%; height: auto;"> </details> <details open><summary>[ TV series - εΎζδΉεεΌΊδΉ°η ]</summary><img src="./figures/case2.png" alt="εεΌΊδΉ°η" style="width: 100%; height: auto;"> </details> <details><summary>[ TV series - The Big Bang Theory ]</summary><img src="./figures/case4.png" alt="The Big Bang Theory" style="width: 100%; height: auto;"> </details> <details><summary>[ Travel video - Travel in Rome ]</summary><img src="./figures/case1.png" alt="Travel in Rome" style="width: 100%; height: auto;"> </details> <details><summary>[ Vlog - Basketball training ]</summary><img src="./figures/case3.png" alt="Basketball training" style="width: 100%; height: auto;"> </details>π¨ Preparation
Please find installation instructions in install.md.
π Start here
Run in cmd
python main.py --video_path examples/buy_watermelon.mp4 --openai_api_key xxxxx
The generated video document will be generated and saved in examples/buy_watermelon.log
Run in Gradio
python main_gradio.py --openai_api_key xxxxx
π Suggestion
Stay tuned for our project π₯
If you have more suggestions or functions need to be implemented in this codebase, feel free to drop us an email kevin.qh.lin@gmail.com
, leiwx52@gmail.com
or open an issue.
π Acknowledgment
This work is based on ChatGPT, BLIP2, GRIT, KTS, Whisper, LangChain, Image2Paragraph.