Awesome

Fashion Matrix: Editing Photos by Just Talking

[Project page] [ArXiv] [PDF] [Video] [Demo(temporarily offline)]

Fashion Matrix is dedicated to bridging various visual and language models and continuously refining its capabilities as a comprehensive fashion AI assistant. This project will continue to update new features and optimization effects.

Updates

2023/08/01: Code of v1.1 is released. The details are a bit different from the original version (Paper).
2023/08/01: Demo(Label) v1.1 with new AI model function and security updates is released.
2023/07/28: Demo(Label) v1.0 is released.
2023/07/26: Video and Project Page are released.
2023/07/25: Arxiv Preprint is released.

Versions

April 01, 2023

Fashion Matrix (Label version) v1.1

We updated the use of ControlNet, currently using inpaint, openpose, lineart and (softedge).

Add the task AI model, which can replace the model while keeping the pose and outfits.
Add NSFW (Not Safe For Work) detection to avoid inappropriate using.

July 28, 2023

Fashion Matrix (Label version) v1.0

Basic functions: replace, remove, add, and recolor.

Installation

You can follow the steps indicated in the Installation Guide for environment configuration and model deployment, and models except LLM can be deployed on a single GPU with 13G+ VRAM. (In the case of sacrificing some functions, A simplified version of Fashion Matrix can be realized without LLM. Maybe the simplified version of Fashion Matrix will be released in the future)

Acknowledgement

Our work is based on the following excellent works:

Realistic Vision is a finely calibrated model derived from Stable Diffusion v1.5, designed to enhance the realism of generated images, with a particular focus on human portraits. ControlNet v1.1 offers more comprehensive and user-friendly conditional control models, enabling the concurrent utilization of multiple ControlNets. This significantly broadens the potential and applicability of text-to-image techniques. BLIP facilitates a rapid visual question-answering within our system.

Grounded-SAM create a very interesting demo by combining Grounding DINO and Segment Anything which aims to detect and segment anything with text inputs! Matting Anything Model (MAM) is an efficient and versatile framework for estimating the alpha matte ofany instance in an image with flexible and interactive visual or linguistic user prompt guidance. Detectron2 is a next generation library that provides state-of-the-art detection and segmentation algorithms. The DensePose code we adopted is based on Detectron2. Graphonomy has the capacity for swift and effortless analysis of diverse anatomical regions within the human body.

Citation

 @misc{chong2023fashion,
      title={Fashion Matrix: Editing Photos by Just Talking},
      author={Zheng Chong and Xujie Zhang and Fuwei Zhao and Zhenyu Xie and Xiaodan Liang},
      year={2023},
      eprint={2307.13240},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
    }