comment with trust_remote_code=True

This commit is contained in:
zR 2024-11-01 18:49:55 +08:00
parent bca86f8c8e
commit d71b8c2284
4 changed files with 58 additions and 88 deletions

View File

@ -11,38 +11,16 @@ Read this in [English](README_en.md)
## 项目更新 ## 项目更新
- 🔥🔥 **News**: ```2024/11/01```: 支持了 GLM-4-9B-Chat-hf 和 GLM-4v-9B 模型在 vLLM 0.6.3 以上版本和 transformers 4.46.0 以上版本运行 - 🔥🔥 **News**: ```2024/11/01```: 本仓库依赖进行升级,请更新`requirements.txt`中的依赖以保证正常运行模型。[glm-4-9b-chat-hf](https://huggingface.co/THUDM/glm-4-9b-chat-hf) 是适配 `transformers>=4.46` 的模型权重,使用 transforemrs 库中的 `GlmModel` 类实现。
- 🔥🔥 **News**: ```2024/10/25```: 我们开源了端到端中英语音对话模型 [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice) 同时,[glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat), [glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b) 中的 `tokenzier_chatglm.py` 已经更新以适配最新版本的 `transforemrs`库。请前往 HuggingFace 更新文件。
- 🔥 **News**: ```2024/10/12```: 增加了 GLM-4v-9B 模型对vllm框架的支持 - 🔥 **News**: ```2024/10/27```: 我们开源了 [LongReward](https://github.com/THUDM/LongReward),这是一个使用 AI 反馈改进长上下文大型语言模型。
- 🔥 **News**: ```2024/09/06```: 增加了在 GLM-4v-9B 模型上构建OpenAI API兼容的服务端 - 🔥 **News**: ```2024/10/25```: 我们开源了端到端中英语音对话模型 [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice)。
- 🔥 **News**: ```2024/09/05``` 我们开源了使LLMs能够在长上下文问答中生成细粒度引用的模型 [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b) - 🔥 **News**: ```2024/09/05``` 我们开源了使LLMs能够在长上下文问答中生成细粒度引用的模型 [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b) 以及数据集 [LongCite-45k](https://huggingface.co/datasets/THUDM/LongCite-45k), 欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongCite) 在线体验。
以及数据集 [LongCite-45k](https://huggingface.co/datasets/THUDM/LongCite-45k), - 🔥**News**: ```2024/08/15```: 我们开源具备长文本输出能力(单轮对话大模型输出可超过1万token) 的模型 [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b) 以及数据集 [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k), 欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongWriter) 或 [魔搭社区空间](https://modelscope.cn/studios/ZhipuAI/LongWriter-glm4-9b-demo) 在线体验。
欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongCite) 在线体验。 - 🔥 **News**: ```2024/07/24```: 我们发布了与长文本相关的最新技术解读,关注 [这里](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85) 查看我们在训练 GLM-4-9B 开源模型中关于长文本技术的技术报告。
- 🔥**News**: ```2024/09/04```: 增加了在 GLM-4-9B-Chat 模型上使用带有 Lora adapter 的 vLLM 演示代码 - 🔥 **News**: ``2024/07/09``: GLM-4-9B-Chat 模型已适配 [Ollama](https://github.com/ollama/ollama), [Llama.cpp](https://github.com/ggerganov/llama.cpp),您可以在 [PR](https://github.com/ggerganov/llama.cpp/pull/8031) 查看具体的细节。
- 🔥**News**: ```2024/08/15```: 我们开源具备长文本输出能力(单轮对话大模型输出可超过1万token) - 🔥 **News**: ``2024/06/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。
的模型 [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b) - 🔥 **News**: ``2024/06/05``: 我们发布 GLM-4-9B 系列开源模型。
以及数据集 [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k),
欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongWriter)
或 [魔搭社区空间](https://modelscope.cn/studios/ZhipuAI/LongWriter-glm4-9b-demo) 在线体验。
- 🔥 **News**: ```2024/08/12```: GLM-4-9B-Chat 模型依赖的`transformers`版本升级到 `4.44.0`,请重新拉取除模型权重(
`*.safetensor` 文件 和 `tokenizer.model`)外的文件并参考 `basic_demo/requirements.txt` 严格更新依赖。
- 🔥 **News**: ```2024/07/24```:
我们发布了与长文本相关的最新技术解读,关注 [这里](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85)
查看我们在训练 GLM-4-9B 开源模型中关于长文本技术的技术报告。
- 🔥 **News**: ``2024/7/16``: GLM-4-9B-Chat 模型依赖的`transformers`版本升级到 `4.42.4`,
请更新模型配置文件并参考 `basic_demo/requirements.txt` 更新依赖。
- 🔥 **News**: ``2024/7/9``: GLM-4-9B-Chat
模型已适配 [Ollama](https://github.com/ollama/ollama),[Llama.cpp](https://github.com/ggerganov/llama.cpp)
,您可以在[PR](https://github.com/ggerganov/llama.cpp/pull/8031) 查看具体的细节。
- 🔥 **News**: ``2024/7/1``: 我们更新了 GLM-4V-9B 的微调,您需要更新我们的模型仓库的运行文件和配置文件,
以支持这个功能,更多微调细节 (例如数据集格式,显存要求),请前往 [查看](finetune_demo)。
- 🔥 **News**: ``2024/6/28``: 我们与英特尔技术团队合作,改进了 GLM-4-9B-Chat 的 ITREX 和 OpenVINO 部署教程。您可以使用英特尔
CPU/GPU 设备高效部署 GLM-4-9B 开源模型。欢迎访问 [查看](intel_device_demo)。
- 🔥 **News**: ``2024/6/24``: 我们更新了模型仓库的运行文件和配置文件,支持 Flash Attention 2,
请更新模型配置文件并参考 `basic_demo/trans_cli_demo.py` 中的示例代码。
- 🔥 **News**: ``2024/6/19``: 我们更新了模型仓库的运行文件和配置文件,修复了部分已知的模型推理的问题,欢迎大家克隆最新的模型仓库。
- 🔥 **News**: ``2024/6/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。
- 🔥 **News**: ``2024/6/05``: 我们发布 GLM-4-9B 系列开源模型
## 模型介绍 ## 模型介绍

View File

@ -8,47 +8,36 @@
</p> </p>
## Update ## Update
- 🔥🔥 **News**: ```2024/11/01```: Support for GLM-4-9B-Chat-hf and GLM-4v-9B models on vLLM >= 0.6.3 and transformers >= 4.46.0
- 🔥🔥 **News**: ```2024/10/25```: We have open-sourced the end-to-end Chinese-English voice dialogue model [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice). - 🔥🔥 **News**: ```2024/11/01```: Dependencies have been updated in this repository. Please update the dependencies in
- 🔥 **News**: ```2024/10/12```: Add GLM-4v-9B model support for vllm framework. `requirements.txt` to ensure the model runs correctly. The model weights
- 🔥 **News**: ```2024/09/06```: Add support for OpenAI API server on the GLM-4v-9B model. for [glm-4-9b-chat-hf](https://huggingface.co/THUDM/glm-4-9b-chat-hf) are compatible with `transformers>=4.46` and can
- 🔥 **News**: ```2024/09/05```: We open-sourced a model enabling LLMs to generate fine-grained citations in be implemented using the `GlmModel` class in the transformers library. Additionally, `tokenizer_chatglm.py`
long-context Q&A: [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b), along with the in [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) and [glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)
dataset [LongCite-14k](https://huggingface.co/datasets/THUDM/LongCite-45k). You are welcome to experience it online has been updated for the latest version of `transformers`. Please update the files on HuggingFace.
- 🔥 **News**: ```2024/10/27```: We have open-sourced [LongReward](https://github.com/THUDM/LongReward), a model that
uses AI feedback to enhance long-context large language models.
- 🔥 **News**: ```2024/10/25```: We have open-sourced the end-to-end Mandarin-English voice dialogue
model [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice).
- 🔥 **News**: ```2024/09/05```: We have open-sourced [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b),
a model enabling LLMs to produce fine-grained citations in long-context Q&A, along with the
dataset [LongCite-45k](https://huggingface.co/datasets/THUDM/LongCite-45k). Try it out online
at [Huggingface Space](https://huggingface.co/spaces/THUDM/LongCite). at [Huggingface Space](https://huggingface.co/spaces/THUDM/LongCite).
- 🔥 **News**: ```2024/09/04```: Add demo code for using vLLM with LoRA adapter on the GLM-4-9B-Chat model. - 🔥 **News**: ```2024/08/15```: We have
- 🔥 **News**: ```2024/08/15```: We have open-sourced a model with long-text output capability (single turn LLM output open-sourced [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b), a model capable of generating over
can exceed 10,000 tokens in single-turn dialogue, along with the
10K tokens) [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b) and the dataset [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k). Experience it online
dataset [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k). You're welcome at [Huggingface Space](https://huggingface.co/spaces/THUDM/LongWriter) or
to [try it online](https://huggingface.co/spaces/THUDM/LongWriter). the [ModelScope Community Space](https://modelscope.cn/studios/ZhipuAI/LongWriter-glm4-9b-demo).
- 🔥 **News**: ```2024/08/12```: The `transformers` version required for the GLM-4-9B-Chat model has been upgraded - 🔥 **News**: ```2024/07/24```: We published the latest technical insights on long-text processing. Check out our
to `4.44.0`. Please pull all files again except for the model weights (`*.safetensor` files and `tokenizer.model`), technical report on training the open-source GLM-4-9B model for long
and strictly update the dependencies as per `basic_demo/requirements.txt`. texts [here](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85).
- 🔥 **News**: ```2024/07/24```: we released the latest technical interpretation related to long texts. Check - 🔥 **News**: ```2024/07/09```: The GLM-4-9B-Chat model is now compatible
out [here](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85) to view with [Ollama](https://github.com/ollama/ollama) and [Llama.cpp](https://github.com/ggerganov/llama.cpp). See detailed
our information in this [PR](https://github.com/ggerganov/llama.cpp/pull/8031).
technical report on long context technology in the training of the open-source GLM-4-9B model. - 🔥 **News**: ```2024/06/18```: We have released a [technical report](https://arxiv.org/pdf/2406.12793), available for
- 🔥 **News**: ``2024/7/16``: The `transformers` version that the GLM-4-9B-Chat model depends on has been upgraded viewing.
to `4.42.4`. Please update the model configuration file and refer to `basic_demo/requirements.txt` to update the - 🔥 **News**: ```2024/06/05```: We released the GLM-4-9B series of open-source models.
dependencies.
- 🔥 **News**: ``2024/7/9``: The GLM-4-9B-Chat model has been adapted to [Ollama](https://github.com/ollama/ollama)
and [Llama.cpp](https://github.com/ggerganov/llama.cpp), you can check the specific details
in [PR](https://github.com/ggerganov/llama.cpp/pull/8031).
- 🔥 **News**: ``2024/7/1``: We have updated the multimodal fine-tuning of GLM-4V-9B. You need to update the run file and
configuration file of our model repository to support this feature. For more fine-tuning details (such as dataset
format, video memory requirements), please go to [view](finetune_demo).
- 🔥 **News**: ``2024/6/28``: We have worked with the Intel technical team to improve the ITREX and OpenVINO deployment
tutorials for GLM-4-9B-Chat. You can use Intel CPU/GPU devices to efficiently deploy the GLM-4-9B open source model.
Welcome to [view](intel_device_demo).
- 🔥 **News**: ``2024/6/24``: We have updated the running files and configuration files of the model repository to
support Flash Attention 2, Please update the model configuration file and refer to the sample code
in `basic_demo/trans_cli_demo.py`.
- 🔥 **News**: ``2024/6/19``: We updated the running files and configuration files of the model repository and fixed some
model inference issues. Welcome to clone the latest model repository.
- 🔥 **News**: ``2024/6/18``: We released a [technical report](https://arxiv.org/pdf/2406.12793), welcome to check it
out.
- 🔥 **News**: ``2024/6/05``: We released the GLM-4-9B series of open source models
## Model Introduction ## Model Introduction
@ -67,15 +56,14 @@ GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
## Model List ## Model List
| Model | Type | Seq Length | Transformers | vLLM | Download | Online Demo | | Model | Type | Seq Length | Transformers | Download | Online Demo |
|:-------------------:|:----:|:----------:|:------------:|:--------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| |:-------------------:|:----:|:----------:|:-------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| GLM-4-9B | Base | 8K | <= 4.45 | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/glm-4-9b) | / | | GLM-4-9B | Base | 8K | `4.44 - 4.45` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/glm-4-9b) | / |
| GLM-4-9B-Chat | Chat | 128K | <= 4.45 | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) | | GLM-4-9B-Chat | Chat | 128K | `4.44 - 4.45` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
| GLM-4-9B-Chat-HF | Chat | 128K | >= 4.46 | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-hf)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-hf) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) | | GLM-4-9B-Chat-HF | Chat | 128K | `>= 4.46.0` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-hf)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-hf) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
| GLM-4-9B-Chat-1M | Chat | 1M | <= 4.45 | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat-1M) | / | | GLM-4-9B-Chat-1M | Chat | 1M | `4.44 - 4.45` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat-1M) | / |
| GLM-4-9B-Chat-1M-HF | Chat | 1M | >= 4.46 | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m-hf)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m-hf) | / | | GLM-4-9B-Chat-1M-HF | Chat | 1M | `>= 4.46.0` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m-hf)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m-hf) | / |
| GLM-4V-9B | Chat | 8K | >= 4.46 | >= 0.6.3 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4V-9B) | [🤖 ModelScope](https://modelscope.cn/studios/ZhipuAI/glm-4v-9b-Demo/summary) | | GLM-4V-9B | Chat | 8K | `>= 4.46.0` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4V-9B) | [🤖 ModelScope](https://modelscope.cn/studios/ZhipuAI/glm-4v-9b-Demo/summary) |
## BenchMarkß ## BenchMarkß

View File

@ -11,21 +11,24 @@ ensuring that the CLI interface displays formatted text correctly.
If you use flash attention, you should install the flash-attn and add attn_implementation="flash_attention_2" in model loading. If you use flash attention, you should install the flash-attn and add attn_implementation="flash_attention_2" in model loading.
Note:
Using with glm-4-9b-chat-hf will require `transformers>=4.46.0".
""" """
import torch import torch
from threading import Thread from threading import Thread
from transformers import AutoTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer from transformers import AutoTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer
MODEL_PATH = "THUDM/glm-4-9b-chat-hf" MODEL_PATH = "THUDM/glm-4-9b-chat"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH) # trust_remote_code=True is needed if you using with `glm-4-9b-chat`
# Not use if you using with `glm-4-9b-chat-hf`
# both tokenizer and model should consider with this issue.
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained( model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH, # attn_implementation="flash_attention_2", # Use Flash Attention MODEL_PATH, # attn_implementation="flash_attention_2", # Use Flash Attention
torch_dtype=torch.bfloat16, # using flash-attn must use bfloat16 or float16 torch_dtype=torch.bfloat16, # using flash-attn must use bfloat16 or float16
trust_remote_code=True,
device_map="auto").eval() device_map="auto").eval()

View File

@ -10,19 +10,20 @@ Note: The script includes a modification to handle markdown to plain text conver
ensuring that the CLI interface displays formatted text correctly. ensuring that the CLI interface displays formatted text correctly.
""" """
import os
import torch import torch
from threading import Thread from threading import Thread
from transformers import ( from transformers import (
AutoTokenizer, AutoTokenizer,
StoppingCriteria, StoppingCriteria,
StoppingCriteriaList, StoppingCriteriaList,
TextIteratorStreamer, AutoModel, BitsAndBytesConfig TextIteratorStreamer,
AutoModel,
BitsAndBytesConfig
) )
from PIL import Image from PIL import Image
MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/glm-4v-9b') MODEL_PATH = "THUDM/glm-4v-9b"
tokenizer = AutoTokenizer.from_pretrained( tokenizer = AutoTokenizer.from_pretrained(
MODEL_PATH, MODEL_PATH,