Home

Awesome

XL Textual Inversion for Stable Diffusion XL 1.0 SDXL (Turbo) on a 24 GB GPU

XL_Inversion.ipynb is Copyright © 2023 HANS ROETTGER and distributed under the terms of AGPLv3.

This is an implementation of the textual inversion algorithm to incorporate your own objects, faces or styles into Stable Diffusion XL 1.0. Input: a couple of template images. Output: a concept ("Embedding") that can be used in the standard Stable Diffusion XL pipeline to generate your artefacts. (Please also note my implementation variant for Deep Floyd IF.)

Please appreciate weeks of intensive work and leave me a ⭐ star in the top right!

[!NOTE] 2023-12-09 Embeddings created by XL_Inversion work problem-free with SDXL-Turbo (diffusers V 0.24.0)!

pipe = AutoPipelineForText2Image.from_pretrained(...
set_XLembedding(pipe,embedding,token=learn_token)
prompt="The 3D rendering of a {}".format(learn_token)
image = pipe(...

2023-11-16 "time_ids" problem with newer diffusers versions solved

(update tested with diffusers 0.19.3, 0.23.1)
Thanks for the notice, Bo Sun!

Input Images ➡ XL Embedding

<table style="width: 100%"> <tr> <td colspan=2><img src="./Samples/input.png" alt="" width=384 border=3></img></td> </tr> </table>

The XL_Inversion.ipynb notebook creates an Embedding for your input images within 15 minutes on a Nvidia 3090 GPU. Starting with a random Embedding, the XL Textual Inversion will optimize it quickly to match your input images:

<table style="width: 100%"> <tr> <td colspan=2><img src="./Samples/myPuppet768.gif" alt="" height=256 width=256 border=3></img></td> </tr> </table>

XL Embedding ➡ Stable Diffusion XL 1.0 Pipeline

Load the XL Embedding to a single token (e.g. "my") and use it in the standard Stable Diffusion XL prompts (see XL_Apply_Inversion.ipynb)

def set_XLembedding(base,emb,token="my"):
with torch.no_grad():            
    # Embeddings[tokenNo] to learn
    tokens=base.components["tokenizer"].encode(token)
    assert len(tokens)==3, "token is not a single token in 'tokenizer'"
    tokenNo=tokens[1]
    tokens=base.components["tokenizer_2"].encode(token)
    assert len(tokens)==3, "token is not a single token in 'tokenizer_2'"
    tokenNo2=tokens[1]
    embs=base.components["text_encoder"].text_model.embeddings.token_embedding.weight
    embs2=base.components["text_encoder_2"].text_model.embeddings.token_embedding.weight
    assert embs[tokenNo].shape==emb["emb"].shape, "different 'text_encoder'"
    assert embs2[tokenNo2].shape==emb["emb2"].shape, "different 'text_encoder_2'"
    embs[tokenNo]=emb["emb"].to(embs.dtype).to(embs.device)
    embs2[tokenNo2]=emb["emb2"].to(embs2.dtype).to(embs2.device)

def load_XLembedding(base,token="my",embedding_file="myToken.pt",path="./Embeddings/"):
    emb=torch.load(path+embedding_file)
    set_XLembedding(base,emb,token)  

learned="my"
embs_path="./Embeddings/"
emb_file="myPuppet768.pt"
load_XLembedding(base,token=learned,embedding_file=emb_file,path=embs_path)

prompt="The {} doll at the beach".format(learned)
<table style="width: 100%"> <tr> <td colspan=2><img src="./Samples/20.png" alt="" height=192 width=192 border=3></img></td> <td colspan=2><img src="./Samples/30.png" alt="" height=192 width=192 border=3></img></td> <td colspan=2><img src="./Samples/40.png" alt="" height=192 width=192 border=3></img></td> </tr> </table>
prompt="The 3D rendering of a group of {} figurines dressed in red-striped bathing suits having fun at the beach".format(learned)
<table style="width: 100%"> <tr> <td colspan=2><img src="./Samples/1.png" alt="" height=192 width=192 border=3></img></td> <td colspan=2><img src="./Samples/8.png" alt="" height=192 width=192 border=3></img></td> <td colspan=2><img src="./Samples/9.png" alt="" height=192 width=192 border=3></img></td> </tr> </table>
prompt="The 3D rendering of a group of {} figurines dressed in dirndl wearing sunglasses drinking beer and having fun at the Oktoberfest".format(learned)
<table style="width: 100%"> <tr> <td colspan=2><img src="./Samples/45.png" alt="" height=192 width=192 border=3></img></td> <td colspan=2><img src="./Samples/75.png" alt="" height=192 width=192 border=3></img></td> <td colspan=2><img src="./Samples/90.png" alt="" height=192 width=192 border=3></img></td> </tr> </table>

Prerequisites

Installation