first commit

This commit is contained in:
Charles95 2024-10-21 03:29:15 +00:00
commit 4e9a8a796a
19 changed files with 659 additions and 0 deletions

38
.gitattributes vendored Normal file
View File

@ -0,0 +1,38 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer/vocab.txt filter=lfs diff=lfs merge=lfs -text
imgs/head_final3.png filter=lfs diff=lfs merge=lfs -text
imgs/t2i.png filter=lfs diff=lfs merge=lfs -text

93
MODEL_LICENSE Normal file
View File

@ -0,0 +1,93 @@
模型许可协议
模型发布日期2024/7/6
通过点击同意或使用、复制、修改、分发、表演或展示模型作品的任何部分或元素,您将被视为已承认并接受本协议的内容,本协议立即生效。
1.定义。
a. “协议”指本协议中所规定的使用、复制、分发、修改、表演和展示模型作品或其任何部分或元素的条款和条件。
b. “材料”是指根据本协议提供的专有的模型和文档(及其任何部分)的统称。
c. “模型”指大型语言模型、图像/视频/音频/3D 生成模型、多模态大型语言模型及其软件和算法,包括训练后的模型权重、参数(包括优化器状态)、机器学习模型代码、推理支持代码、训练支持代码、微调支持代码以及我们公开提供的前述其他元素。
d. “输出”是指通过操作或以其他方式使用模型或模型衍生品而产生的模型或模型衍生品的信息和/或内容输出。
e. “模型衍生品”包括:(i)对模型或任何模型衍生物的修改;(ii)基于模型的任何模型衍生物的作品;或(iii)通过将模型或模型的任何模型衍生物的权重、参数、操作或输出的模式转移到该模型而创建的任何其他机器学习模型,以使该模型的性能类似于模型或模型衍生物。为清楚起见,输出本身不被视为模型衍生物。
f. “模型作品”包括:(i)材料;(ii)模型衍生品;及(iii)其所有衍生作品。
g. “许可人”或“我们”指作品所有者或作品所有者授权的授予许可的实体,包括可能对模型和/或分发模型拥有权利的个人或实体。
h.“被许可人”、“您”或“您的”是指行使本协议授予的权利和/或为任何目的和在任何使用领域使用模型作品的自然人或法人实体。
i.“第三方”是指不受我们或您共同控制的个人或法人实体。
2. 许可内容。
a.我们授予您非排他性的、全球性的、不可转让的、免版税的许可(在我们的知识产权或我们拥有的体现在材料中或利用材料的其他权利的范围内),允许您仅根据本协议的条款使用、复制、分发、创作衍生作品(包括模型衍生品)和对材料进行修改,并且您不得违反(或鼓励、或允许任何其他人违反)本协议的任何条款。
b.在遵守本协议的前提下,您可以分发或向第三方提供模型作品,您须满足以下条件:
i您必须向所有该模型作品或使用该作品的产品或服务的任何第三方接收者提供模型作品的来源和本协议的副本
ii您必须在任何修改过的文档上附加明显的声明说明您更改了这些文档
iii您可以在您的修改中添加您自己的版权声明并且在您对该作品的使用、复制、修改、分发、表演和展示符合本协议的条款和条件的前提下您可以为您的修改或任何此类模型衍生品的使用、复制或分发提供额外或不同的许可条款和条件。
c. 附加商业条款: 若您希望将模型及模型衍生品用作商业用途,则您必须向许可人申请许可,许可人可自行决定向您授予许可。除非许可人另行明确授予您该等权利,否则您无权行使本协议项下的任何权利。
3.使用限制。
a. 您对本模型作品的使用必须遵守适用法律法规(包括贸易合规法律法规),并遵守《服务协议》(https://kolors.kuaishou.com/agreement)。您必须将本第 3(a) 和 3(b) 条中提及的使用限制作为可执行条款纳入任何规范本模型作品使用和/或分发的协议(例如许可协议、使用条款等),并且您必须向您分发的后续用户发出通知,告知其本模型作品受本第 3(a) 和 3(b) 条中的使用限制约束。
b. 您不得使用本模型作品或本模型作品的任何输出或成果来改进任何其他模型(本模型或其模型衍生品除外)。
4.知识产权。
a. 我们保留模型的所有权及其相关知识产权。在遵守本协议条款和条件的前提下,对于您制作的材料的任何衍生作品和修改,您是且将是此类衍生作品和修改的所有者。
b. 本协议不授予任何商标、商号、服务标记或产品名称的标识许可,除非出于描述和分发本模型作品的合理和惯常用途。
c. 如果您对我们或任何个人或实体提起诉讼或其他程序(包括诉讼中的交叉索赔或反索赔),声称材料或任何输出或任何上述内容的任何部分侵犯您拥有或可许可的任何知识产权或其他权利,则根据本协议授予您的所有许可应于提起此类诉讼或其他程序之日起终止。
5. 免责声明和责任限制。
a. 本模型作品及其任何输出和结果按“原样”提供,不作任何明示或暗示的保证,包括适销性、非侵权性或适用于特定用途的保证。我们不对材料及其任何输出的安全性或稳定性作任何保证,也不承担任何责任。
b. 在任何情况下,我们均不对您承担任何损害赔偿责任,包括但不限于因您使用或无法使用材料或其任何输出而造成的任何直接、间接、特殊或后果性损害赔偿责任,无论该损害赔偿责任是如何造成的。
6. 存续和终止。
a. 本协议期限自您接受本协议或访问材料之日起开始,并将持续完全有效,直至根据本协议条款和条件终止。
b. 如果您违反本协议的任何条款或条件,我们可终止本协议。本协议终止后,您必须立即删除并停止使用本模型作品。第 4(a)、4(c)、5和 7 条在本协议终止后仍然有效。
7. 适用法律和管辖权。
a. 本协议及由本协议引起的或与本协议有关的任何争议均受中华人民共和国大陆地区(仅为本协议目的,不包括香港、澳门和台湾)法律管辖,并排除冲突法的适用,且《联合国国际货物销售合同公约》不适用于本协议。
b. 因本协议引起或与本协议有关的任何争议,由许可人住所地人民法院管辖。
请注意,许可证可能会更新到更全面的版本。 有关许可和版权的任何问题,请通过 kwai-kolors@kuaishou.com 与我们联系。
英文版
MODEL LICENSE AGREEMENT
Release Date: 2024/7/6
By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Model Works, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
1. DEFINITIONS.
a. “Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of the Model Works or any portion or element thereof set forth herein.
b. “Materials” shall mean, collectively, Us proprietary the Model and Documentation (and any portion thereof) as made available by Us under this Agreement.
c. “Model” shall mean the large language models, image/video/audio/3D generation models, and multimodal large language models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us .
d. “Output” shall mean the information and/or content output of Model or a Model Derivative that results from operating or otherwise using Model or a Model Derivative.
e. “Model Derivatives” shall mean all: (i) modifications to the Model or any Model Derivative; (ii) works based on the Model or any Model Derivative; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of the Model or any Model Derivative, to that model in order to cause that model to perform similarly to the Model or a Model Derivative, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs or a Model Derivative for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives.
f. “Model Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof.
g. “Licensor” , “We” or “Us” shall mean the copyright owner or entity authorized by the copyright owner that is granting the License, including the persons or entities that may have rights in the Model and/or distributing the Model.
h. “Licensee”, “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Model Works for any purpose and in any field of use.
i. “Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You.
2. LICENSE CONTENT.
a. We grant You a non-exclusive, worldwide, non-transferable and royalty-free limited license under the intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy.
b. You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Model Works, provided that You meet all of the following conditions:
(i) You must provide all such Third Party recipients of the Model Works or products or services using them the source of the Model and a copy of this Agreement;
(ii) You must cause any modified documents to carry prominent notices stating that You changed the documents;
(iii) You may add Your own copyright statement to Your modifications and, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement.
c. additional commercial terms: If, on the Model version release date, the monthly active users of all products or services made available by or for Licensee is greater than 100 million monthly active users in the preceding calendar month, or, Licensee is a cloud computing platform vendor, You must request a license from licensor, which the licensor may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until We otherwise expressly grants You such rights.
3. LICENSE RESTRICITIONS.
a. Your use of the Model Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Service Agreement. You must include the use restrictions referenced in these Sections 3(a) and 3(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Model Works and You must provide notice to subsequent users to whom You distribute that Model Works are subject to the use restrictions in these Sections 3(a) and 3(b).
b. You must not use the Model Works or any Output or results of the Model Works to improve any other large model (other than Model or Model Derivatives thereof).
4. INTELLECTUAL PROPERTY.
a. We retain ownership of all intellectual property rights in and to the Model and derivatives. Conditioned upon compliance with the terms and conditions of this Agreement, with respect to any derivative works and modifications of the Materials that are made by you, you are and will be the owner of such derivative works and modifications.
b. No trademark license is granted to use the trade names, trademarks, service marks, or product names of Us, except as required to fulfill notice requirements under this Agreement or as required for reasonable and customary use in describing and redistributing the Materials.
c. If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed.
5. DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY.
a. THE MODEL WORKS AND ANY OUTPUT AND RESULTS THERE FROM ARE PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR FITNESS FOR A PARTICULAR PURPOSE. WE MAKE NO WARRANTY AND ASSUME NO RESPONSIBILITY FOR THE SAFETY OR STABILITY OF THE MATERIALS AND ANY OUTPUT THEREFROM.
b. IN NO EVENT SHALL WE BE LIABLE TO YOU FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO ANY DIRECT, OR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING FROM YOUR USE OR INABILITY TO USE THE MATERIALS OR ANY OUTPUT OF IT, NO MATTER HOW ITS CAUSED.
c. You will defend, indemnify and hold harmless Us from and against any claim by any third party arising out of or related to your use or distribution of the Materials.
6. SURVIVAL AND TERMINATION.
a. The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
b. We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Model Works. Sections 4(a), 4(c), 5 and 7 shall survive the termination of this Agreement.
7. GOVERNING LAW AND JURISDICTION.
a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of China (for the purpose of this agreement only, excluding Hong Kong, Macau, and Taiwan), without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
b. Any disputes arising from or related to this Agreement shall be under the jurisdiction of the People's Court where the Licensor is located.
Note that the license is subject to update to a more comprehensive version. For any questions related to the license and copyright, please contact us at kwai-kolors@kuaishou.com.

85
README.md Normal file
View File

@ -0,0 +1,85 @@
---
license: apache-2.0
language:
- zh
- en
tags:
- Kolors
---
# Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis
<div align="center" style="display: flex; justify-content: center; flex-wrap: wrap;">
<a href="https://github.com/Kwai-Kolors/Kolors"><img src="https://img.shields.io/static/v1?label=Kolors Code&message=Github&color=blue&logo=github-pages"></a> &ensp;
<a href="https://kwai-kolors.github.io/"><img src="https://img.shields.io/static/v1?label=Team%20Page&message=Page&color=green"></a> &ensp;
<a href="https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv:Kolors&color=red&logo=arxiv"></a> &ensp;
<a href="https://kolors.kuaishou.com/"><img src="https://img.shields.io/static/v1?label=Official Website&message=Page&color=green"></a>
</div>
<figure>
<img src="imgs/head_final3.png">
</figure>
## 📖 Introduction
Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and proprietary models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this <a href="https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf">technical report</a></b>.
## 🚀 Quick Start
### Using with Diffusers
Make sure you upgrade to the latest version of diffusers==0.30.0.dev0:
```
git clone https://github.com/huggingface/diffusers
cd diffusers
python3 setup.py install
```
**Notes:**
- The pipeline uses the `EulerDiscreteScheduler` by default. We recommend using this scheduler with `guidance scale=5.0` and `num_inference_steps=50`.
- The pipeline also supports the `EDMDPMSolverMultistepScheduler`. `guidance scale=5.0` and `num_inference_steps=25` is a good default for this scheduler.
- In addition to Text-to-Image, `KolorsImg2ImgPipeline` also supports Image-to-Image.
And then you can run:
```python
import torch
from diffusers import KolorsPipeline
pipe = KolorsPipeline.from_pretrained(
"Kwai-Kolors/Kolors-diffusers",
torch_dtype=torch.float16,
variant="fp16"
).to("cuda")
prompt = '一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着"可图"'
image = pipe(
prompt=prompt,
negative_prompt="",
guidance_scale=5.0,
num_inference_steps=50,
generator=torch.Generator(pipe.device).manual_seed(66),
).images[0]
image.show()
```
## 📜 License&Citation
### License
Kolors are fully open-sourced for academic research. For commercial use, please fill out this [questionnaire](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/可图KOLORS模型商业授权申请书.docx) and sent it to kwai-kolors@kuaishou.com for registration.
We open-source Kolors to promote the development of large text-to-image models in collaboration with the open-source community. The code of this project is open-sourced under the Apache-2.0 license. We sincerely urge all developers and users to strictly adhere to the [open-source license](MODEL_LICENSE), avoiding the use of the open-source model, code, and its derivatives for any purposes that may harm the country and society or for any services not evaluated and registered for safety. Note that despite our best efforts to ensure the compliance, accuracy, and safety of the data during training, due to the diversity and combinability of generated content and the probabilistic randomness affecting the model, we cannot guarantee the accuracy and safety of the output content, and the model is susceptible to misleading. This project does not assume any legal responsibility for any data security issues, public opinion risks, or risks and liabilities arising from the model being misled, abused, misused, or improperly utilized due to the use of the open-source model and code.
### Citation
If you find our work helpful, please cite it!
```
@article{kolors,
title={Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis},
author={Kolors Team},
journal={arXiv preprint},
year={2024}
}
```
### Acknowledgments
- Thanks to [Diffusers](https://github.com/huggingface/diffusers) for providing the codebase.
- Thanks to [ChatGLM3](https://github.com/THUDM/ChatGLM3) for providing the powerful Chinese language model.
### Contact Us
If you want to leave a message for our R&D team and product team, feel free to join our [WeChat group](https://github.com/Kwai-Kolors/Kolors/blob/main/imgs/wechat.png). You can also contact us via email (kwai-kolors@kuaishou.com).

BIN
imgs/head_final3.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
imgs/t2i.png (Stored with Git LFS) Normal file

Binary file not shown.

25
model_index.json Normal file
View File

@ -0,0 +1,25 @@
{
"_class_name": "KolorsPipeline",
"_diffusers_version": "0.30.0.dev0",
"force_zeros_for_empty_prompt": false,
"scheduler": [
"diffusers",
"EulerDiscreteScheduler"
],
"text_encoder": [
"kolors",
"ChatGLMModel"
],
"tokenizer": [
"kolors",
"ChatGLMTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}

View File

@ -0,0 +1,22 @@
{
"_class_name": "EulerDiscreteScheduler",
"_diffusers_version": "0.18.0.dev0",
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"beta_end": 0.014,
"clip_sample": false,
"clip_sample_range": 1.0,
"dynamic_thresholding_ratio": 0.995,
"interpolation_type": "linear",
"num_train_timesteps": 1100,
"prediction_type": "epsilon",
"rescale_betas_zero_snr": false,
"sample_max_value": 1.0,
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"thresholding": false,
"timestep_spacing": "leading",
"trained_betas": null,
"use_karras_sigmas": false
}

47
text_encoder/config.json Normal file
View File

@ -0,0 +1,47 @@
{
"_name_or_path": "models/kolors",
"add_bias_linear": false,
"add_qkv_bias": true,
"apply_query_key_layer_scaling": true,
"apply_residual_connection_post_layernorm": false,
"architectures": [
"ChatGLMModel"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"auto_map": {
"AutoConfig": "configuration_chatglm.ChatGLMConfig",
"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
},
"bias_dropout_fusion": true,
"classifier_dropout": null,
"eos_token_id": 2,
"ffn_hidden_size": 13696,
"fp32_residual_connection": false,
"hidden_dropout": 0.0,
"hidden_size": 4096,
"kv_channels": 128,
"layernorm_epsilon": 1e-05,
"model_type": "chatglm",
"multi_query_attention": true,
"multi_query_group_num": 2,
"num_attention_heads": 32,
"num_layers": 28,
"original_rope": true,
"pad_token_id": 0,
"padded_vocab_size": 65024,
"post_layer_norm": true,
"pre_seq_len": null,
"prefix_projection": false,
"quantization_bit": 0,
"rmsnorm": true,
"seq_length": 32768,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.41.2",
"use_cache": true,
"vocab_size": 65024
}

BIN
text_encoder/model.fp16-00001-of-00003.safetensors (Stored with Git LFS) Normal file

Binary file not shown.

BIN
text_encoder/model.fp16-00002-of-00003.safetensors (Stored with Git LFS) Normal file

Binary file not shown.

BIN
text_encoder/model.fp16-00003-of-00003.safetensors (Stored with Git LFS) Normal file

Binary file not shown.

View File

@ -0,0 +1,207 @@
{
"metadata": {
"total_size": 12487168064
},
"weight_map": {
"embedding.word_embeddings.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.final_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.0.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.0.mlp.dense_4h_to_h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.0.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.0.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.0.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.0.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.0.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.1.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.1.mlp.dense_4h_to_h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.1.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.1.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.1.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.1.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.1.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.10.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.10.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.10.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.10.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.10.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.10.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.10.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.11.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.11.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.11.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.11.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.11.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.11.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.11.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.12.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.12.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.12.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.12.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.12.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.12.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.12.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.13.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.13.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.13.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.13.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.13.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.13.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.13.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.14.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.14.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.14.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.14.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.14.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.14.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.14.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.15.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.15.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.15.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.15.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.15.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.15.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.15.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.16.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.16.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.16.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.16.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.16.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.16.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.16.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.17.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.17.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.17.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.17.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.17.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.17.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.17.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.18.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.18.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.18.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.18.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.18.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.18.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.18.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.19.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.19.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.19.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.19.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.19.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.19.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.19.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.2.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.2.mlp.dense_4h_to_h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.2.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.2.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.2.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.2.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.2.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.20.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.20.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.20.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.20.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.20.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.20.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.20.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.21.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.21.mlp.dense_4h_to_h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.21.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.21.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.21.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.21.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.21.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.22.input_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.22.mlp.dense_4h_to_h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.22.mlp.dense_h_to_4h.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.22.post_attention_layernorm.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.22.self_attention.dense.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.22.self_attention.query_key_value.bias": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.22.self_attention.query_key_value.weight": "model.fp16-00002-of-00003.safetensors",
"encoder.layers.23.input_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.23.mlp.dense_4h_to_h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.23.mlp.dense_h_to_4h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.23.post_attention_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.23.self_attention.dense.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.23.self_attention.query_key_value.bias": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.23.self_attention.query_key_value.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.24.input_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.24.mlp.dense_4h_to_h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.24.mlp.dense_h_to_4h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.24.post_attention_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.24.self_attention.dense.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.24.self_attention.query_key_value.bias": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.24.self_attention.query_key_value.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.25.input_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.25.mlp.dense_4h_to_h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.25.mlp.dense_h_to_4h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.25.post_attention_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.25.self_attention.dense.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.25.self_attention.query_key_value.bias": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.25.self_attention.query_key_value.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.26.input_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.26.mlp.dense_4h_to_h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.26.mlp.dense_h_to_4h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.26.post_attention_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.26.self_attention.dense.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.26.self_attention.query_key_value.bias": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.26.self_attention.query_key_value.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.27.input_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.27.mlp.dense_4h_to_h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.27.mlp.dense_h_to_4h.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.27.post_attention_layernorm.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.27.self_attention.dense.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.27.self_attention.query_key_value.bias": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.27.self_attention.query_key_value.weight": "model.fp16-00003-of-00003.safetensors",
"encoder.layers.3.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.3.mlp.dense_4h_to_h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.3.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.3.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.3.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.3.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.3.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.4.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.4.mlp.dense_4h_to_h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.4.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.4.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.4.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.4.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.4.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.5.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.5.mlp.dense_4h_to_h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.5.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.5.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.5.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.5.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.5.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.6.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.6.mlp.dense_4h_to_h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.6.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.6.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.6.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.6.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.6.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.7.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.7.mlp.dense_4h_to_h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.7.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.7.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.7.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.7.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.7.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.8.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.8.mlp.dense_4h_to_h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.8.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.8.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.8.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.8.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.8.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.9.input_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.9.mlp.dense_4h_to_h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.9.mlp.dense_h_to_4h.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.9.post_attention_layernorm.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.9.self_attention.dense.weight": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.9.self_attention.query_key_value.bias": "model.fp16-00001-of-00003.safetensors",
"encoder.layers.9.self_attention.query_key_value.weight": "model.fp16-00001-of-00003.safetensors",
"output_layer.weight": "model.fp16-00003-of-00003.safetensors",
"rotary_pos_emb.inv_freq": "model.fp16-00001-of-00003.safetensors"
}
}

BIN
tokenizer/tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

View File

@ -0,0 +1,12 @@
{
"name_or_path": "THUDM/chatglm3-6b-base",
"remove_space": false,
"do_lower_case": false,
"tokenizer_class": "ChatGLMTokenizer",
"auto_map": {
"AutoTokenizer": [
"tokenization_chatglm.ChatGLMTokenizer",
null
]
}
}

BIN
tokenizer/vocab.txt (Stored with Git LFS) Normal file

Binary file not shown.

72
unet/config.json Normal file
View File

@ -0,0 +1,72 @@
{
"_class_name": "UNet2DConditionModel",
"_diffusers_version": "0.27.0.dev0",
"act_fn": "silu",
"addition_embed_type": "text_time",
"addition_embed_type_num_heads": 64,
"addition_time_embed_dim": 256,
"attention_head_dim": [
5,
10,
20
],
"attention_type": "default",
"block_out_channels": [
320,
640,
1280
],
"center_input_sample": false,
"class_embed_type": null,
"class_embeddings_concat": false,
"conv_in_kernel": 3,
"conv_out_kernel": 3,
"cross_attention_dim": 2048,
"cross_attention_norm": null,
"down_block_types": [
"DownBlock2D",
"CrossAttnDownBlock2D",
"CrossAttnDownBlock2D"
],
"downsample_padding": 1,
"dropout": 0.0,
"dual_cross_attention": false,
"encoder_hid_dim": 4096,
"encoder_hid_dim_type": "text_proj",
"flip_sin_to_cos": true,
"freq_shift": 0,
"in_channels": 4,
"layers_per_block": 2,
"mid_block_only_cross_attention": null,
"mid_block_scale_factor": 1,
"mid_block_type": "UNetMidBlock2DCrossAttn",
"norm_eps": 1e-05,
"norm_num_groups": 32,
"num_attention_heads": null,
"num_class_embeds": null,
"only_cross_attention": false,
"out_channels": 4,
"projection_class_embeddings_input_dim": 5632,
"resnet_out_scale_factor": 1.0,
"resnet_skip_time_act": false,
"resnet_time_scale_shift": "default",
"reverse_transformer_layers_per_block": null,
"sample_size": 128,
"time_cond_proj_dim": null,
"time_embedding_act_fn": null,
"time_embedding_dim": null,
"time_embedding_type": "positional",
"timestep_post_act": null,
"transformer_layers_per_block": [
1,
2,
10
],
"up_block_types": [
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D",
"UpBlock2D"
],
"upcast_attention": false,
"use_linear_projection": true
}

BIN
unet/diffusion_pytorch_model.fp16.safetensors (Stored with Git LFS) Normal file

Binary file not shown.

31
vae/config.json Normal file
View File

@ -0,0 +1,31 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.18.0.dev0",
"_name_or_path": "./vae",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 1024,
"scaling_factor": 0.13025,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
]
}

BIN
vae/diffusion_pytorch_model.fp16.safetensors (Stored with Git LFS) Normal file

Binary file not shown.