first commit

This commit is contained in:
ailab 2024-06-08 02:42:16 +08:00
commit 381603471e
12 changed files with 232 additions and 0 deletions

40
.gitattributes vendored Normal file
View File

@ -0,0 +1,40 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
cann-small-couple.png filter=lfs diff=lfs merge=lfs -text
cann-small-hf-ofice.png filter=lfs diff=lfs merge=lfs -text
cann-small-megatron.png filter=lfs diff=lfs merge=lfs -text
cann-small-woman.png filter=lfs diff=lfs merge=lfs -text
hug_lab_grid.png filter=lfs diff=lfs merge=lfs -text

108
README.md Normal file
View File

@ -0,0 +1,108 @@
---
license: openrail++
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
- stable-diffusion-xl
- stable-diffusion-xl-diffusers
- text-to-image
- diffusers
- controlnet
inference: false
---
# Small SDXL-controlnet: Canny
These are small controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning. This checkpoint is 7x smaller than the original XL controlnet checkpoint.
You can find some example images in the following.
prompt: aerial view, a futuristic research complex in a bright foggy jungle, hard lighting
![images_0)](./cann-small-hf-ofice.png)
prompt: a woman, close up, detailed, beautiful, street photography, photorealistic, detailed, Kodak ektar 100, natural, candid shot
![images_1)](./cann-small-woman.png)
prompt: megatron in an apocalyptic world ground, runied city in the background, photorealistic
![images_2)](./cann-small-megatron.png)
prompt: a couple watching sunset, 4k photo
![images_3)](./cann-small-couple.png)
## Usage
Make sure to first install the libraries:
```bash
pip install accelerate transformers safetensors opencv-python diffusers
```
And then we're ready to go:
```python
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers.utils import load_image
from PIL import Image
import torch
import numpy as np
import cv2
prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
negative_prompt = "low quality, bad quality, sketches"
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
controlnet_conditioning_scale = 0.5 # recommended for good generalization
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0-small",
torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
vae=vae,
torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
image = Image.fromarray(image)
images = pipe(
prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
).images
images[0].save(f"hug_lab.png")
```
![hug_lab_grid)](./hug_lab_grid.png)
To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
🚨 Please note that this checkpoint is experimental and there's a lot of room for improvement. We encourage the community to build on top of it, improve it, and provide us with feedback. 🚨
### Training
Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
You can refer to [this script](https://github.com/huggingface/diffusers/blob/7b93c2a882d8e12209fbaeffa51ee2b599ab5349/examples/research_projects/controlnet/train_controlnet_webdataset.py) for full discolsure.
* This checkpoint does not perform distillation. We just use a smaller ControlNet initialized from the SDXL UNet. We
encourage the community to try and conduct distillation too. This resource might be of help in [this regard](https://huggingface.co/blog/sd_distillation).
* To learn more about how the ControlNet was initialized, refer to [this code block](https://github.com/huggingface/diffusers/blob/7b93c2a882d8e12209fbaeffa51ee2b599ab5349/examples/research_projects/controlnet/train_controlnet_webdataset.py#L981C1-L999C36).
* It does not have any attention blocks.
* The model works pretty good on most conditioning images. But for more complex conditionings, the bigger checkpoints might be better. We are still working on improving the quality of this checkpoint and looking for feedback from the community.
* We recommend playing around with the `controlnet_conditioning_scale` and `guidance_scale` arguments for potentially better
image generation quality.
#### Training data
The model was trained on 3M images from LAION aesthetic 6 plus subset, with batch size of 256 for 50k steps with constant learning rate of 3e-5.
#### Compute
One 8xA100 machine
#### Mixed precision
FP16

BIN
cann-small-couple.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
cann-small-hf-ofice.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
cann-small-megatron.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
cann-small-woman.png (Stored with Git LFS) Normal file

Binary file not shown.

57
config.json Normal file
View File

@ -0,0 +1,57 @@
{
"_class_name": "ControlNetModel",
"_diffusers_version": "0.20.0.dev0",
"_name_or_path": "valhalla/c-n-a-fixed",
"act_fn": "silu",
"addition_embed_type": "text_time",
"addition_embed_type_num_heads": 64,
"addition_time_embed_dim": 256,
"attention_head_dim": [
5,
10,
20
],
"block_out_channels": [
320,
640,
1280
],
"class_embed_type": null,
"conditioning_channels": 3,
"conditioning_embedding_out_channels": [
16,
32,
96,
256
],
"controlnet_conditioning_channel_order": "rgb",
"cross_attention_dim": 2048,
"down_block_types": [
"DownBlock2D",
"DownBlock2D",
"DownBlock2D"
],
"downsample_padding": 1,
"encoder_hid_dim": null,
"encoder_hid_dim_type": null,
"flip_sin_to_cos": true,
"freq_shift": 0,
"global_pool_conditions": false,
"in_channels": 4,
"layers_per_block": 2,
"mid_block_scale_factor": 1,
"norm_eps": 1e-05,
"norm_num_groups": 32,
"num_attention_heads": null,
"num_class_embeds": null,
"only_cross_attention": false,
"projection_class_embeddings_input_dim": 2816,
"resnet_time_scale_shift": "default",
"transformer_layers_per_block": [
0,
0,
0
],
"upcast_attention": null,
"use_linear_projection": true
}

BIN
diffusion_pytorch_model.bin (Stored with Git LFS) Normal file

Binary file not shown.

BIN
diffusion_pytorch_model.fp16.bin (Stored with Git LFS) Normal file

Binary file not shown.

BIN
diffusion_pytorch_model.fp16.safetensors (Stored with Git LFS) Normal file

Binary file not shown.

BIN
diffusion_pytorch_model.safetensors (Stored with Git LFS) Normal file

Binary file not shown.

BIN
hug_lab_grid.png (Stored with Git LFS) Normal file

Binary file not shown.