first commit

2024-11-01 14:49:37 +08:00 · 2024-11-01 14:49:37 +08:00 · 041c0f8fc6
parent 591805406c
commit 041c0f8fc6
18 changed files with 2273 additions and 2 deletions
--- a/65
+++ b/65
@ -0,0 +1,65 @@
 The ChatGLM3-6B License
 1. 定义
 “许可方”是指分发其软件的 ChatGLM3-6B 模型团队。
 “软件”是指根据本许可提供的 ChatGLM3-6B 模型参数。
 2. 许可授予
 根据本许可的条款和条件，许可方特此授予您非排他性、全球性、不可转让、不可再许可、可撤销、免版税的版权许可。
 上述版权声明和本许可声明应包含在本软件的所有副本或重要部分中。
 3.限制
 您不得出于任何军事或非法目的使用、复制、修改、合并、发布、分发、复制或创建本软件的全部或部分衍生作品。
 您不得利用本软件从事任何危害国家安全和国家统一、危害社会公共利益、侵犯人身权益的行为。
 4.免责声明
 本软件“按原样”提供，不提供任何明示或暗示的保证，包括但不限于对适销性、特定用途的适用性和非侵权性的保证。 在任何情况下，作者或版权持有人均不对任何索赔、损害或其他责任负责，无论是在合同诉讼、侵权行为还是其他方面，由软件或软件的使用或其他交易引起、由软件引起或与之相关 软件。
 5. 责任限制
 除适用法律禁止的范围外，在任何情况下且根据任何法律理论，无论是基于侵权行为、疏忽、合同、责任或其他原因，任何许可方均不对您承担任何直接、间接、特殊、偶然、示范性、 或间接损害，或任何其他商业损失，即使许可人已被告知此类损害的可能性。
 6.争议解决
 本许可受中华人民共和国法律管辖并按其解释。 因本许可引起的或与本许可有关的任何争议应提交北京市海淀区人民法院。
 请注意，许可证可能会更新到更全面的版本。 有关许可和版权的任何问题，请通过 license@zhipuai.cn 与我们联系。
 1. Definitions
 “Licensor” means the ChatGLM3-6B Model Team that distributes its Software.
 “Software” means the ChatGLM3-6B model parameters made available under this license.
 2. License Grant
 Subject to the terms and conditions of this License, the Licensor hereby grants to you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license to use the Software.
 The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
 3. Restriction
 You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any military, or illegal purposes.
 You will not use the Software for any act that may undermine China's national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings.
 4. Disclaimer
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 5. Limitation of Liability
 EXCEPT TO THE EXTENT PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER BASED IN TORT, NEGLIGENCE, CONTRACT, LIABILITY, OR OTHERWISE WILL ANY LICENSOR BE LIABLE TO YOU FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES, OR ANY OTHER COMMERCIAL LOSSES, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
 6. Dispute Resolution
 This license shall be governed and construed in accordance with the laws of People’s Republic of China. Any dispute arising from or in connection with this License shall be submitted to Haidian District People's Court in Beijing.
 Note that the license is subject to update to a more comprehensive version.  For any questions related to the license and copyright, please contact us at license@zhipuai.cn.
--- a/README.md
+++ b/README.md
@ -1,3 +1,98 @@
-# chatglm3_a13445518638182400940117
+---
 language:
 - zh
 - en
 tags:
 - glm
 - chatglm
 - thudm
 ---
 # ChatGLM3-6B
 <p align="center">
  💻 <a href="https://github.com/THUDM/ChatGLM" target="_blank">Github Repo</a> • 🐦 <a href="https://twitter.com/thukeg" target="_blank">Twitter</a> • 📃 <a href="https://arxiv.org/abs/2103.10360" target="_blank">[GLM@ACL 22]</a> <a href="https://github.com/THUDM/GLM" target="_blank">[GitHub]</a> • 📃 <a href="https://arxiv.org/abs/2210.02414" target="_blank">[GLM-130B@ICLR 23]</a> <a href="https://github.com/THUDM/GLM-130B" target="_blank">[GitHub]</a> <br>
 </p>
-chatglm3
+<p align="center">
    👋 Join our <a href="https://join.slack.com/t/chatglm/shared_invite/zt-25ti5uohv-A_hs~am_D3Q8XPZMpj7wwQ" target="_blank">Slack</a> and <a href="https://github.com/THUDM/ChatGLM/blob/main/resources/WECHAT.md" target="_blank">WeChat</a>
 </p>
 <p align="center">
 📍Experience the larger-scale ChatGLM model at <a href="https://www.chatglm.cn">chatglm.cn</a>
 </p>
 ## 介绍
 ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型，在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上，ChatGLM3-6B 引入了如下特性：
 1. **更强大的基础模型：** ChatGLM3-6B 的基础模型 ChatGLM3-6B-Base 采用了更多样的训练数据、更充分的训练步数和更合理的训练策略。在语义、数学、推理、代码、知识等不同角度的数据集上测评显示，ChatGLM3-6B-Base 具有在 10B 以下的预训练模型中最强的性能。
 2. **更完整的功能支持：** ChatGLM3-6B 采用了全新设计的 [Prompt 格式](PROMPT.md)，除正常的多轮对话外。同时原生支持[工具调用](tool_using/README.md)（Function Call）、代码执行（Code Interpreter）和 Agent 任务等复杂场景。
 3. **更全面的开源序列：** 除了对话模型 ChatGLM3-6B 外，还开源了基础模型 ChatGLM-6B-Base、长文本对话模型 ChatGLM3-6B-32K。以上所有权重对学术研究**完全开放**，在填写[问卷](https://open.bigmodel.cn/mla/form)进行登记后**亦允许免费商业使用**。
 ## 软件依赖
 ```shell
 pip install protobuf 'transformers>=4.30.2' cpm_kernels 'torch>=2.0' gradio mdtex2html sentencepiece accelerate
 ```
 ## 模型下载
 modelscope API下载
 ```shell
 pip install modelscope
 ```
 ```python
 from modelscope import snapshot_download
 model_dir = snapshot_download("ZhipuAI/chatglm3-6b", revision = "v1.0.0")
 ```
 git下载
 ```shell
 git lfs install
 git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b.git
 ```
 ## 代码调用 
 可以通过如下代码调用 ChatGLM3-6B 模型来生成对话：
 ```python
 from modelscope import AutoTokenizer, AutoModel, snapshot_download
 model_dir = snapshot_download("ZhipuAI/chatglm3-6b", revision = "v1.0.0")
 tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
 model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).half().cuda()
 model = model.eval()
 response, history = model.chat(tokenizer, "你好", history=[])
 print(response)
 response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
 print(response)
 ```
 关于更多的使用说明，包括如何运行命令行和网页版本的 DEMO，以及使用模型量化以节省显存，请参考我们的 [Github Repo](https://github.com/THUDM/ChatGLM)。
 For more instructions, including how to run CLI and web demos, and model quantization, please refer to our [Github Repo](https://github.com/THUDM/ChatGLM).
 ## 协议
 本仓库的代码依照 [Apache-2.0](LICENSE) 协议开源，ChatGLM3-6B 模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
 ## 引用
 如果你觉得我们的工作有帮助的话，请考虑引用下列论文。
 ```
@article{zeng2022glm,
  title={Glm-130b: An open bilingual pre-trained model},
  author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
  journal={arXiv preprint arXiv:2210.02414},
  year={2022}
 }
 ```
 ```
@inproceedings{du2022glm,
  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={320--335},
  year={2022}
 }
 ```
--- a/config.json
+++ b/config.json
@ -0,0 +1,42 @@
 {
  "_name_or_path": "THUDM/chatglm3-6b",
  "model_type": "chatglm",
  "architectures": [
    "ChatGLMModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "bias_dropout_fusion": true,
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1e-05,
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_layers": 28,
  "original_rope": true,
  "padded_vocab_size": 65024,
  "post_layer_norm": true,
  "rmsnorm": true,
  "seq_length": 8192,
  "use_cache": true,
  "torch_dtype": "float16",
  "transformers_version": "4.30.2",
  "tie_word_embeddings": false,
  "eos_token_id": 2,
  "pad_token_id": 0
 }
--- a/configuration.json
+++ b/configuration.json
@ -0,0 +1 @@
 {"framework":"pytorch","task":"chat"}
--- a/configuration_chatglm.py
+++ b/configuration_chatglm.py
@ -0,0 +1,61 @@
 from transformers import PretrainedConfig
 class ChatGLMConfig(PretrainedConfig):
    model_type = "chatglm"
    def __init__(
        self,
        num_layers=28,
        padded_vocab_size=65024,
        hidden_size=4096,
        ffn_hidden_size=13696,
        kv_channels=128,
        num_attention_heads=32,
        seq_length=2048,
        hidden_dropout=0.0,
        classifier_dropout=None,
        attention_dropout=0.0,
        layernorm_epsilon=1e-5,
        rmsnorm=True,
        apply_residual_connection_post_layernorm=False,
        post_layer_norm=True,
        add_bias_linear=False,
        add_qkv_bias=False,
        bias_dropout_fusion=True,
        multi_query_attention=False,
        multi_query_group_num=1,
        apply_query_key_layer_scaling=True,
        attention_softmax_in_fp32=True,
        fp32_residual_connection=False,
        quantization_bit=0,
        pre_seq_len=None,
        prefix_projection=False,
        **kwargs
    ):
        self.num_layers = num_layers
        self.vocab_size = padded_vocab_size
        self.padded_vocab_size = padded_vocab_size
        self.hidden_size = hidden_size
        self.ffn_hidden_size = ffn_hidden_size
        self.kv_channels = kv_channels
        self.num_attention_heads = num_attention_heads
        self.seq_length = seq_length
        self.hidden_dropout = hidden_dropout
        self.classifier_dropout = classifier_dropout
        self.attention_dropout = attention_dropout
        self.layernorm_epsilon = layernorm_epsilon
        self.rmsnorm = rmsnorm
        self.apply_residual_connection_post_layernorm = apply_residual_connection_post_layernorm
        self.post_layer_norm = post_layer_norm
        self.add_bias_linear = add_bias_linear
        self.add_qkv_bias = add_qkv_bias
        self.bias_dropout_fusion = bias_dropout_fusion
        self.multi_query_attention = multi_query_attention
        self.multi_query_group_num = multi_query_group_num
        self.apply_query_key_layer_scaling = apply_query_key_layer_scaling
        self.attention_softmax_in_fp32 = attention_softmax_in_fp32
        self.fp32_residual_connection = fp32_residual_connection
        self.quantization_bit = quantization_bit
        self.pre_seq_len = pre_seq_len
        self.prefix_projection = prefix_projection
        super().__init__(**kwargs)
--- a/modeling_chatglm.py
+++ b/modeling_chatglm.py
--- a/pytorch_model-00001-of-00007.bin
+++ b/pytorch_model-00001-of-00007.bin
--- a/pytorch_model-00002-of-00007.bin
+++ b/pytorch_model-00002-of-00007.bin
--- a/pytorch_model-00003-of-00007.bin
+++ b/pytorch_model-00003-of-00007.bin
--- a/pytorch_model-00004-of-00007.bin
+++ b/pytorch_model-00004-of-00007.bin
--- a/pytorch_model-00005-of-00007.bin
+++ b/pytorch_model-00005-of-00007.bin
--- a/pytorch_model-00006-of-00007.bin
+++ b/pytorch_model-00006-of-00007.bin
--- a/pytorch_model-00007-of-00007.bin
+++ b/pytorch_model-00007-of-00007.bin
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@ -0,0 +1,207 @@
 {
  "metadata": {
    "total_size": 12487168064
  },
  "weight_map": {
    "transformer.embedding.word_embeddings.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.final_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "transformer.encoder.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.0.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.0.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.0.self_attention.dense.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.0.self_attention.query_key_value.bias": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.0.self_attention.query_key_value.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.1.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.1.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.1.self_attention.dense.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.1.self_attention.query_key_value.bias": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.1.self_attention.query_key_value.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.10.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.10.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.10.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.10.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.10.self_attention.dense.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.10.self_attention.query_key_value.bias": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.10.self_attention.query_key_value.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.11.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.11.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.11.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.11.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.11.self_attention.dense.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.11.self_attention.query_key_value.bias": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.11.self_attention.query_key_value.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.12.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.12.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.12.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.12.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.12.self_attention.dense.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.12.self_attention.query_key_value.bias": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.12.self_attention.query_key_value.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.13.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.13.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.13.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.13.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.13.self_attention.dense.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.13.self_attention.query_key_value.bias": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.13.self_attention.query_key_value.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.14.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.14.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.14.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.14.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.14.self_attention.dense.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.14.self_attention.query_key_value.bias": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.14.self_attention.query_key_value.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.15.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.15.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.15.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.15.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.15.self_attention.dense.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.15.self_attention.query_key_value.bias": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.15.self_attention.query_key_value.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.16.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.16.mlp.dense_4h_to_h.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.16.mlp.dense_h_to_4h.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.16.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.16.self_attention.dense.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.16.self_attention.query_key_value.bias": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.16.self_attention.query_key_value.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.17.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.17.mlp.dense_4h_to_h.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.17.mlp.dense_h_to_4h.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.17.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.17.self_attention.dense.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.17.self_attention.query_key_value.bias": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.17.self_attention.query_key_value.weight": "pytorch_model-00004-of-00007.bin",
    "transformer.encoder.layers.18.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.18.mlp.dense_4h_to_h.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.18.mlp.dense_h_to_4h.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.18.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.18.self_attention.dense.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.18.self_attention.query_key_value.bias": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.18.self_attention.query_key_value.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.19.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.19.mlp.dense_4h_to_h.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.19.mlp.dense_h_to_4h.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.19.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.19.self_attention.dense.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.19.self_attention.query_key_value.bias": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.19.self_attention.query_key_value.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.2.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.2.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.2.self_attention.dense.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.2.self_attention.query_key_value.bias": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.2.self_attention.query_key_value.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.20.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.20.mlp.dense_4h_to_h.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.20.mlp.dense_h_to_4h.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.20.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.20.self_attention.dense.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.20.self_attention.query_key_value.bias": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.20.self_attention.query_key_value.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.21.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.21.mlp.dense_4h_to_h.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.21.mlp.dense_h_to_4h.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.21.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.21.self_attention.dense.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.21.self_attention.query_key_value.bias": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.21.self_attention.query_key_value.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.22.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "transformer.encoder.layers.22.mlp.dense_4h_to_h.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.22.mlp.dense_h_to_4h.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.22.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.22.self_attention.dense.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.22.self_attention.query_key_value.bias": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.22.self_attention.query_key_value.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.23.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.23.mlp.dense_4h_to_h.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.23.mlp.dense_h_to_4h.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.23.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.23.self_attention.dense.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.23.self_attention.query_key_value.bias": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.23.self_attention.query_key_value.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.24.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.24.mlp.dense_4h_to_h.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.24.mlp.dense_h_to_4h.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.24.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.24.self_attention.dense.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.24.self_attention.query_key_value.bias": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.24.self_attention.query_key_value.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.25.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.25.mlp.dense_4h_to_h.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.25.mlp.dense_h_to_4h.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.25.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.25.self_attention.dense.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.25.self_attention.query_key_value.bias": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.25.self_attention.query_key_value.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.26.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.26.mlp.dense_4h_to_h.weight": "pytorch_model-00007-of-00007.bin",
    "transformer.encoder.layers.26.mlp.dense_h_to_4h.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.26.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.26.self_attention.dense.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.26.self_attention.query_key_value.bias": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.26.self_attention.query_key_value.weight": "pytorch_model-00006-of-00007.bin",
    "transformer.encoder.layers.27.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "transformer.encoder.layers.27.mlp.dense_4h_to_h.weight": "pytorch_model-00007-of-00007.bin",
    "transformer.encoder.layers.27.mlp.dense_h_to_4h.weight": "pytorch_model-00007-of-00007.bin",
    "transformer.encoder.layers.27.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "transformer.encoder.layers.27.self_attention.dense.weight": "pytorch_model-00007-of-00007.bin",
    "transformer.encoder.layers.27.self_attention.query_key_value.bias": "pytorch_model-00007-of-00007.bin",
    "transformer.encoder.layers.27.self_attention.query_key_value.weight": "pytorch_model-00007-of-00007.bin",
    "transformer.encoder.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.3.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.3.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.3.self_attention.dense.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.3.self_attention.query_key_value.bias": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.3.self_attention.query_key_value.weight": "pytorch_model-00001-of-00007.bin",
    "transformer.encoder.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.4.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.4.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.4.self_attention.dense.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.4.self_attention.query_key_value.bias": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.4.self_attention.query_key_value.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.5.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.5.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.5.self_attention.dense.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.5.self_attention.query_key_value.bias": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.5.self_attention.query_key_value.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.6.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.6.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.6.self_attention.dense.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.6.self_attention.query_key_value.bias": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.6.self_attention.query_key_value.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.7.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.7.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.7.self_attention.dense.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.7.self_attention.query_key_value.bias": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.7.self_attention.query_key_value.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "transformer.encoder.layers.8.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.8.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.8.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.8.self_attention.dense.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.8.self_attention.query_key_value.bias": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.8.self_attention.query_key_value.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.9.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.9.mlp.dense_4h_to_h.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.9.mlp.dense_h_to_4h.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.9.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.9.self_attention.dense.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.9.self_attention.query_key_value.bias": "pytorch_model-00003-of-00007.bin",
    "transformer.encoder.layers.9.self_attention.query_key_value.weight": "pytorch_model-00003-of-00007.bin",
    "transformer.output_layer.weight": "pytorch_model-00007-of-00007.bin",
    "transformer.rotary_pos_emb.inv_freq": "pytorch_model-00001-of-00007.bin"
  }
 }
--- a/quantization.py
+++ b/quantization.py
--- a/tokenization_chatglm.py
+++ b/tokenization_chatglm.py
@ -0,0 +1,283 @@
 import json
 import os
 import torch
 from typing import List, Optional, Union, Dict
 from sentencepiece import SentencePieceProcessor
 from transformers import PreTrainedTokenizer
 from transformers.utils import logging, PaddingStrategy
 from transformers.tokenization_utils_base import EncodedInput, BatchEncoding
 class SPTokenizer:
    def __init__(self, model_path: str):
        # reload tokenizer
        assert os.path.isfile(model_path), model_path
        self.sp_model = SentencePieceProcessor(model_file=model_path)
        # BOS / EOS token IDs
        self.n_words: int = self.sp_model.vocab_size()
        self.bos_id: int = self.sp_model.bos_id()
        self.eos_id: int = self.sp_model.eos_id()
        self.pad_id: int = self.sp_model.unk_id()
        assert self.sp_model.vocab_size() == self.sp_model.get_piece_size()
        special_tokens = ["[MASK]", "[gMASK]", "[sMASK]", "sop", "eop", "<|system|>", "<|user|>", "<|assistant|>",
                          "<|observation|>"]
        self.special_tokens = {}
        self.index_special_tokens = {}
        for token in special_tokens:
            self.special_tokens[token] = self.n_words
            self.index_special_tokens[self.n_words] = token
            self.n_words += 1
    def tokenize(self, s: str):
        return self.sp_model.EncodeAsPieces(s)
    def encode(self, s: str, bos: bool = False, eos: bool = False) -> List[int]:
        assert type(s) is str
        t = self.sp_model.encode(s)
        if bos:
            t = [self.bos_id] + t
        if eos:
            t = t + [self.eos_id]
        return t
    def decode(self, t: List[int]) -> str:
        text, buffer = "", []
        for token in t:
            if token in self.index_special_tokens:
                if buffer:
                    text += self.sp_model.decode(buffer)
                    buffer = []
                text += self.index_special_tokens[token]
            else:
                buffer.append(token)
        if buffer:
            text += self.sp_model.decode(buffer)
        return text
    def decode_tokens(self, tokens: List[str]) -> str:
        text = self.sp_model.DecodePieces(tokens)
        return text
    def convert_token_to_id(self, token):
        """ Converts a token (str) in an id using the vocab. """
        if token in self.special_tokens:
            return self.special_tokens[token]
        return self.sp_model.PieceToId(token)
    def convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        if index in self.index_special_tokens:
            return self.index_special_tokens[index]
        if index in [self.eos_id, self.bos_id, self.pad_id] or index < 0:
            return ""
        return self.sp_model.IdToPiece(index)
 class ChatGLMTokenizer(PreTrainedTokenizer):
    vocab_files_names = {"vocab_file": "tokenizer.model"}
    model_input_names = ["input_ids", "attention_mask", "position_ids"]
    def __init__(self, vocab_file, padding_side="left", clean_up_tokenization_spaces=False, **kwargs):
        self.name = "GLMTokenizer"
        self.vocab_file = vocab_file
        self.tokenizer = SPTokenizer(vocab_file)
        self.special_tokens = {
            "<bos>": self.tokenizer.bos_id,
            "<eos>": self.tokenizer.eos_id,
            "<pad>": self.tokenizer.pad_id
        }
        super().__init__(padding_side=padding_side, clean_up_tokenization_spaces=clean_up_tokenization_spaces, **kwargs)
    def get_command(self, token):
        if token in self.special_tokens:
            return self.special_tokens[token]
        assert token in self.tokenizer.special_tokens, f"{token} is not a special token for {self.name}"
        return self.tokenizer.special_tokens[token]
    @property
    def unk_token(self) -> str:
        return "<unk>"
    @property
    def pad_token(self) -> str:
        return "<unk>"
    @property
    def pad_token_id(self):
        return self.get_command("<pad>")
    @property
    def eos_token(self) -> str:
        return "</s>"
    @property
    def eos_token_id(self):
        return self.get_command("<eos>")
    @property
    def vocab_size(self):
        return self.tokenizer.n_words
    def get_vocab(self):
        """ Returns vocab as a dict """
        vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
        vocab.update(self.added_tokens_encoder)
        return vocab
    def _tokenize(self, text, **kwargs):
        return self.tokenizer.tokenize(text)
    def _convert_token_to_id(self, token):
        """ Converts a token (str) in an id using the vocab. """
        return self.tokenizer.convert_token_to_id(token)
    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        return self.tokenizer.convert_id_to_token(index)
    def convert_tokens_to_string(self, tokens: List[str]) -> str:
        return self.tokenizer.decode_tokens(tokens)
    def save_vocabulary(self, save_directory, filename_prefix=None):
        """
        Save the vocabulary and special tokens file to a directory.
        Args:
            save_directory (`str`):
                The directory in which to save the vocabulary.
            filename_prefix (`str`, *optional*):
                An optional prefix to add to the named of the saved files.
        Returns:
            `Tuple(str)`: Paths to the files saved.
        """
        if os.path.isdir(save_directory):
            vocab_file = os.path.join(
                save_directory, self.vocab_files_names["vocab_file"]
            )
        else:
            vocab_file = save_directory
        with open(self.vocab_file, 'rb') as fin:
            proto_str = fin.read()
        with open(vocab_file, "wb") as writer:
            writer.write(proto_str)
        return (vocab_file,)
    def get_prefix_tokens(self):
        prefix_tokens = [self.get_command("[gMASK]"), self.get_command("sop")]
        return prefix_tokens
    def build_single_message(self, role, metadata, message):
        assert role in ["system", "user", "assistant", "observation"], role
        role_tokens = [self.get_command(f"<|{role}|>")] + self.tokenizer.encode(f"{metadata}\n")
        message_tokens = self.tokenizer.encode(message)
        tokens = role_tokens + message_tokens
        return tokens
    def build_chat_input(self, query, history=None, role="user"):
        if history is None:
            history = []
        input_ids = []
        for item in history:
            content = item["content"]
            if item["role"] == "system" and "tools" in item:
                content = content + "\n" + json.dumps(item["tools"], indent=4, ensure_ascii=False)
            input_ids.extend(self.build_single_message(item["role"], item.get("metadata", ""), content))
        input_ids.extend(self.build_single_message(role, "", query))
        input_ids.extend([self.get_command("<|assistant|>")])
        return self.batch_encode_plus([input_ids], return_tensors="pt", is_split_into_words=True)
    def build_inputs_with_special_tokens(
            self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. A BERT sequence has the following format:
        - single sequence: `[CLS] X [SEP]`
        - pair of sequences: `[CLS] A [SEP] B [SEP]`
        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
        Returns:
            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """
        prefix_tokens = self.get_prefix_tokens()
        token_ids_0 = prefix_tokens + token_ids_0
        if token_ids_1 is not None:
            token_ids_0 = token_ids_0 + token_ids_1 + [self.get_command("<eos>")]
        return token_ids_0
    def _pad(
            self,
            encoded_inputs: Union[Dict[str, EncodedInput], BatchEncoding],
            max_length: Optional[int] = None,
            padding_strategy: PaddingStrategy = PaddingStrategy.DO_NOT_PAD,
            pad_to_multiple_of: Optional[int] = None,
            return_attention_mask: Optional[bool] = None,
    ) -> dict:
        """
        Pad encoded inputs (on left/right and up to predefined length or max length in the batch)
        Args:
            encoded_inputs:
                Dictionary of tokenized inputs (`List[int]`) or batch of tokenized inputs (`List[List[int]]`).
            max_length: maximum length of the returned list and optionally padding length (see below).
                Will truncate by taking into account the special tokens.
            padding_strategy: PaddingStrategy to use for padding.
                - PaddingStrategy.LONGEST Pad to the longest sequence in the batch
                - PaddingStrategy.MAX_LENGTH: Pad to the max length (default)
                - PaddingStrategy.DO_NOT_PAD: Do not pad
                The tokenizer padding sides are defined in self.padding_side:
                    - 'left': pads on the left of the sequences
                    - 'right': pads on the right of the sequences
            pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value.
                This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability
                `>= 7.5` (Volta).
            return_attention_mask:
                (optional) Set to False to avoid returning attention mask (default: set to model specifics)
        """
        # Load from model defaults
        assert self.padding_side == "left"
        required_input = encoded_inputs[self.model_input_names[0]]
        seq_length = len(required_input)
        if padding_strategy == PaddingStrategy.LONGEST:
            max_length = len(required_input)
        if max_length is not None and pad_to_multiple_of is not None and (max_length % pad_to_multiple_of != 0):
            max_length = ((max_length // pad_to_multiple_of) + 1) * pad_to_multiple_of
        needs_to_be_padded = padding_strategy != PaddingStrategy.DO_NOT_PAD and len(required_input) != max_length
        # Initialize attention mask if not present.
        if "attention_mask" not in encoded_inputs:
            encoded_inputs["attention_mask"] = [1] * seq_length
        if "position_ids" not in encoded_inputs:
            encoded_inputs["position_ids"] = list(range(seq_length))
        if needs_to_be_padded:
            difference = max_length - len(required_input)
            if "attention_mask" in encoded_inputs:
                encoded_inputs["attention_mask"] = [0] * difference + encoded_inputs["attention_mask"]
            if "position_ids" in encoded_inputs:
                encoded_inputs["position_ids"] = [0] * difference + encoded_inputs["position_ids"]
            encoded_inputs[self.model_input_names[0]] = [self.pad_token_id] * difference + required_input
        return encoded_inputs
--- a/tokenizer.model
+++ b/tokenizer.model
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@ -0,0 +1,12 @@
 {
  "name_or_path": "THUDM/chatglm3-6b",
  "remove_space": false,
  "do_lower_case": false,
  "tokenizer_class": "ChatGLMTokenizer",
  "auto_map": {
    "AutoTokenizer": [
      "tokenization_chatglm.ChatGLMTokenizer",
      null
      ]
  }
 }