diff --git a/README.md b/README.md index 9af3a60..7937d60 100644 --- a/README.md +++ b/README.md @@ -11,38 +11,16 @@ Read this in [English](README_en.md) ## 项目更新 -- 🔥🔥 **News**: ```2024/11/01```: 支持了 GLM-4-9B-Chat-hf 和 GLM-4v-9B 模型在 vLLM 0.6.3 以上版本和 transformers 4.46.0 以上版本运行 -- 🔥🔥 **News**: ```2024/10/25```: 我们开源了端到端中英语音对话模型 [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice) -- 🔥 **News**: ```2024/10/12```: 增加了 GLM-4v-9B 模型对vllm框架的支持 -- 🔥 **News**: ```2024/09/06```: 增加了在 GLM-4v-9B 模型上构建OpenAI API兼容的服务端 -- 🔥 **News**: ```2024/09/05``` 我们开源了使LLMs能够在长上下文问答中生成细粒度引用的模型 [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b) - 以及数据集 [LongCite-45k](https://huggingface.co/datasets/THUDM/LongCite-45k), - 欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongCite) 在线体验。 -- 🔥**News**: ```2024/09/04```: 增加了在 GLM-4-9B-Chat 模型上使用带有 Lora adapter 的 vLLM 演示代码 -- 🔥**News**: ```2024/08/15```: 我们开源具备长文本输出能力(单轮对话大模型输出可超过1万token) - 的模型 [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b) - 以及数据集 [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k), - 欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongWriter) - 或 [魔搭社区空间](https://modelscope.cn/studios/ZhipuAI/LongWriter-glm4-9b-demo) 在线体验。 -- 🔥 **News**: ```2024/08/12```: GLM-4-9B-Chat 模型依赖的`transformers`版本升级到 `4.44.0`,请重新拉取除模型权重( - `*.safetensor` 文件 和 `tokenizer.model`)外的文件并参考 `basic_demo/requirements.txt` 严格更新依赖。 -- 🔥 **News**: ```2024/07/24```: - 我们发布了与长文本相关的最新技术解读,关注 [这里](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85) - 查看我们在训练 GLM-4-9B 开源模型中关于长文本技术的技术报告。 -- 🔥 **News**: ``2024/7/16``: GLM-4-9B-Chat 模型依赖的`transformers`版本升级到 `4.42.4`, - 请更新模型配置文件并参考 `basic_demo/requirements.txt` 更新依赖。 -- 🔥 **News**: ``2024/7/9``: GLM-4-9B-Chat - 模型已适配 [Ollama](https://github.com/ollama/ollama),[Llama.cpp](https://github.com/ggerganov/llama.cpp) - ,您可以在[PR](https://github.com/ggerganov/llama.cpp/pull/8031) 查看具体的细节。 -- 🔥 **News**: ``2024/7/1``: 我们更新了 GLM-4V-9B 的微调,您需要更新我们的模型仓库的运行文件和配置文件, - 以支持这个功能,更多微调细节 (例如数据集格式,显存要求),请前往 [查看](finetune_demo)。 -- 🔥 **News**: ``2024/6/28``: 我们与英特尔技术团队合作,改进了 GLM-4-9B-Chat 的 ITREX 和 OpenVINO 部署教程。您可以使用英特尔 - CPU/GPU 设备高效部署 GLM-4-9B 开源模型。欢迎访问 [查看](intel_device_demo)。 -- 🔥 **News**: ``2024/6/24``: 我们更新了模型仓库的运行文件和配置文件,支持 Flash Attention 2, - 请更新模型配置文件并参考 `basic_demo/trans_cli_demo.py` 中的示例代码。 -- 🔥 **News**: ``2024/6/19``: 我们更新了模型仓库的运行文件和配置文件,修复了部分已知的模型推理的问题,欢迎大家克隆最新的模型仓库。 -- 🔥 **News**: ``2024/6/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。 -- 🔥 **News**: ``2024/6/05``: 我们发布 GLM-4-9B 系列开源模型 +- 🔥🔥 **News**: ```2024/11/01```: 本仓库依赖进行升级,请更新`requirements.txt`中的依赖以保证正常运行模型。[glm-4-9b-chat-hf](https://huggingface.co/THUDM/glm-4-9b-chat-hf) 是适配 `transformers>=4.46` 的模型权重,使用 transforemrs 库中的 `GlmModel` 类实现。 +同时,[glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat), [glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b) 中的 `tokenzier_chatglm.py` 已经更新以适配最新版本的 `transforemrs`库。请前往 HuggingFace 更新文件。 +- 🔥 **News**: ```2024/10/27```: 我们开源了 [LongReward](https://github.com/THUDM/LongReward),这是一个使用 AI 反馈改进长上下文大型语言模型。 +- 🔥 **News**: ```2024/10/25```: 我们开源了端到端中英语音对话模型 [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice)。 +- 🔥 **News**: ```2024/09/05``` 我们开源了使LLMs能够在长上下文问答中生成细粒度引用的模型 [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b) 以及数据集 [LongCite-45k](https://huggingface.co/datasets/THUDM/LongCite-45k), 欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongCite) 在线体验。 +- 🔥**News**: ```2024/08/15```: 我们开源具备长文本输出能力(单轮对话大模型输出可超过1万token) 的模型 [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b) 以及数据集 [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k), 欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongWriter) 或 [魔搭社区空间](https://modelscope.cn/studios/ZhipuAI/LongWriter-glm4-9b-demo) 在线体验。 +- 🔥 **News**: ```2024/07/24```: 我们发布了与长文本相关的最新技术解读,关注 [这里](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85) 查看我们在训练 GLM-4-9B 开源模型中关于长文本技术的技术报告。 +- 🔥 **News**: ``2024/07/09``: GLM-4-9B-Chat 模型已适配 [Ollama](https://github.com/ollama/ollama), [Llama.cpp](https://github.com/ggerganov/llama.cpp),您可以在 [PR](https://github.com/ggerganov/llama.cpp/pull/8031) 查看具体的细节。 +- 🔥 **News**: ``2024/06/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。 +- 🔥 **News**: ``2024/06/05``: 我们发布 GLM-4-9B 系列开源模型。 ## 模型介绍 diff --git a/README_en.md b/README_en.md index 9f4cd42..a9b2e59 100644 --- a/README_en.md +++ b/README_en.md @@ -8,47 +8,36 @@

## Update -- 🔥🔥 **News**: ```2024/11/01```: Support for GLM-4-9B-Chat-hf and GLM-4v-9B models on vLLM >= 0.6.3 and transformers >= 4.46.0 -- 🔥🔥 **News**: ```2024/10/25```: We have open-sourced the end-to-end Chinese-English voice dialogue model [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice). -- 🔥 **News**: ```2024/10/12```: Add GLM-4v-9B model support for vllm framework. -- 🔥 **News**: ```2024/09/06```: Add support for OpenAI API server on the GLM-4v-9B model. -- 🔥 **News**: ```2024/09/05```: We open-sourced a model enabling LLMs to generate fine-grained citations in - long-context Q&A: [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b), along with the - dataset [LongCite-14k](https://huggingface.co/datasets/THUDM/LongCite-45k). You are welcome to experience it online + +- 🔥🔥 **News**: ```2024/11/01```: Dependencies have been updated in this repository. Please update the dependencies in + `requirements.txt` to ensure the model runs correctly. The model weights + for [glm-4-9b-chat-hf](https://huggingface.co/THUDM/glm-4-9b-chat-hf) are compatible with `transformers>=4.46` and can + be implemented using the `GlmModel` class in the transformers library. Additionally, `tokenizer_chatglm.py` + in [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) and [glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b) + has been updated for the latest version of `transformers`. Please update the files on HuggingFace. +- 🔥 **News**: ```2024/10/27```: We have open-sourced [LongReward](https://github.com/THUDM/LongReward), a model that + uses AI feedback to enhance long-context large language models. +- 🔥 **News**: ```2024/10/25```: We have open-sourced the end-to-end Mandarin-English voice dialogue + model [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice). +- 🔥 **News**: ```2024/09/05```: We have open-sourced [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b), + a model enabling LLMs to produce fine-grained citations in long-context Q&A, along with the + dataset [LongCite-45k](https://huggingface.co/datasets/THUDM/LongCite-45k). Try it out online at [Huggingface Space](https://huggingface.co/spaces/THUDM/LongCite). -- 🔥 **News**: ```2024/09/04```: Add demo code for using vLLM with LoRA adapter on the GLM-4-9B-Chat model. -- 🔥 **News**: ```2024/08/15```: We have open-sourced a model with long-text output capability (single turn LLM output - can exceed - 10K tokens) [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b) and the - dataset [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k). You're welcome - to [try it online](https://huggingface.co/spaces/THUDM/LongWriter). -- 🔥 **News**: ```2024/08/12```: The `transformers` version required for the GLM-4-9B-Chat model has been upgraded - to `4.44.0`. Please pull all files again except for the model weights (`*.safetensor` files and `tokenizer.model`), - and strictly update the dependencies as per `basic_demo/requirements.txt`. -- 🔥 **News**: ```2024/07/24```: we released the latest technical interpretation related to long texts. Check - out [here](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85) to view - our - technical report on long context technology in the training of the open-source GLM-4-9B model. -- 🔥 **News**: ``2024/7/16``: The `transformers` version that the GLM-4-9B-Chat model depends on has been upgraded - to `4.42.4`. Please update the model configuration file and refer to `basic_demo/requirements.txt` to update the - dependencies. -- 🔥 **News**: ``2024/7/9``: The GLM-4-9B-Chat model has been adapted to [Ollama](https://github.com/ollama/ollama) - and [Llama.cpp](https://github.com/ggerganov/llama.cpp), you can check the specific details - in [PR](https://github.com/ggerganov/llama.cpp/pull/8031). -- 🔥 **News**: ``2024/7/1``: We have updated the multimodal fine-tuning of GLM-4V-9B. You need to update the run file and - configuration file of our model repository to support this feature. For more fine-tuning details (such as dataset - format, video memory requirements), please go to [view](finetune_demo). -- 🔥 **News**: ``2024/6/28``: We have worked with the Intel technical team to improve the ITREX and OpenVINO deployment - tutorials for GLM-4-9B-Chat. You can use Intel CPU/GPU devices to efficiently deploy the GLM-4-9B open source model. - Welcome to [view](intel_device_demo). -- 🔥 **News**: ``2024/6/24``: We have updated the running files and configuration files of the model repository to - support Flash Attention 2, Please update the model configuration file and refer to the sample code - in `basic_demo/trans_cli_demo.py`. -- 🔥 **News**: ``2024/6/19``: We updated the running files and configuration files of the model repository and fixed some - model inference issues. Welcome to clone the latest model repository. -- 🔥 **News**: ``2024/6/18``: We released a [technical report](https://arxiv.org/pdf/2406.12793), welcome to check it - out. -- 🔥 **News**: ``2024/6/05``: We released the GLM-4-9B series of open source models +- 🔥 **News**: ```2024/08/15```: We have + open-sourced [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b), a model capable of generating over + 10,000 tokens in single-turn dialogue, along with the + dataset [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k). Experience it online + at [Huggingface Space](https://huggingface.co/spaces/THUDM/LongWriter) or + the [ModelScope Community Space](https://modelscope.cn/studios/ZhipuAI/LongWriter-glm4-9b-demo). +- 🔥 **News**: ```2024/07/24```: We published the latest technical insights on long-text processing. Check out our + technical report on training the open-source GLM-4-9B model for long + texts [here](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85). +- 🔥 **News**: ```2024/07/09```: The GLM-4-9B-Chat model is now compatible + with [Ollama](https://github.com/ollama/ollama) and [Llama.cpp](https://github.com/ggerganov/llama.cpp). See detailed + information in this [PR](https://github.com/ggerganov/llama.cpp/pull/8031). +- 🔥 **News**: ```2024/06/18```: We have released a [technical report](https://arxiv.org/pdf/2406.12793), available for + viewing. +- 🔥 **News**: ```2024/06/05```: We released the GLM-4-9B series of open-source models. ## Model Introduction @@ -67,15 +56,14 @@ GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus. ## Model List -| Model | Type | Seq Length | Transformers | vLLM | Download | Online Demo | -|:-------------------:|:----:|:----------:|:------------:|:--------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| GLM-4-9B | Base | 8K | <= 4.45 | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b)
[🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/glm-4-9b) | / | -| GLM-4-9B-Chat | Chat | 128K | <= 4.45 | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat)
[🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)
[🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) | -| GLM-4-9B-Chat-HF | Chat | 128K | >= 4.46 | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-hf)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-hf) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)
[🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) | -| GLM-4-9B-Chat-1M | Chat | 1M | <= 4.45 | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m)
[🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat-1M) | / | -| GLM-4-9B-Chat-1M-HF | Chat | 1M | >= 4.46 | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m-hf)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m-hf) | / | -| GLM-4V-9B | Chat | 8K | >= 4.46 | >= 0.6.3 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)
[🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4V-9B) | [🤖 ModelScope](https://modelscope.cn/studios/ZhipuAI/glm-4v-9b-Demo/summary) | - +| Model | Type | Seq Length | Transformers | Download | Online Demo | +|:-------------------:|:----:|:----------:|:-------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| GLM-4-9B | Base | 8K | `4.44 - 4.45` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b)
[🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/glm-4-9b) | / | +| GLM-4-9B-Chat | Chat | 128K | `4.44 - 4.45` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat)
[🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)
[🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) | +| GLM-4-9B-Chat-HF | Chat | 128K | `>= 4.46.0` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-hf)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-hf) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)
[🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) | +| GLM-4-9B-Chat-1M | Chat | 1M | `4.44 - 4.45` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m)
[🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat-1M) | / | +| GLM-4-9B-Chat-1M-HF | Chat | 1M | `>= 4.46.0` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m-hf)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m-hf) | / | +| GLM-4V-9B | Chat | 8K | `>= 4.46.0` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b)
[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)
[🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4V-9B) | [🤖 ModelScope](https://modelscope.cn/studios/ZhipuAI/glm-4v-9b-Demo/summary) | ## BenchMarkß diff --git a/basic_demo/trans_cli_demo.py b/basic_demo/trans_cli_demo.py index c843258..dfff5bd 100644 --- a/basic_demo/trans_cli_demo.py +++ b/basic_demo/trans_cli_demo.py @@ -11,21 +11,24 @@ ensuring that the CLI interface displays formatted text correctly. If you use flash attention, you should install the flash-attn and add attn_implementation="flash_attention_2" in model loading. -Note: - Using with glm-4-9b-chat-hf will require `transformers>=4.46.0". """ import torch from threading import Thread from transformers import AutoTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer -MODEL_PATH = "THUDM/glm-4-9b-chat-hf" +MODEL_PATH = "THUDM/glm-4-9b-chat" -tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH) +# trust_remote_code=True is needed if you using with `glm-4-9b-chat` +# Not use if you using with `glm-4-9b-chat-hf` +# both tokenizer and model should consider with this issue. + +tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( MODEL_PATH, # attn_implementation="flash_attention_2", # Use Flash Attention torch_dtype=torch.bfloat16, # using flash-attn must use bfloat16 or float16 + trust_remote_code=True, device_map="auto").eval() diff --git a/basic_demo/trans_cli_vision_demo.py b/basic_demo/trans_cli_vision_demo.py index adca35d..671e701 100644 --- a/basic_demo/trans_cli_vision_demo.py +++ b/basic_demo/trans_cli_vision_demo.py @@ -10,19 +10,20 @@ Note: The script includes a modification to handle markdown to plain text conver ensuring that the CLI interface displays formatted text correctly. """ -import os import torch from threading import Thread from transformers import ( AutoTokenizer, StoppingCriteria, StoppingCriteriaList, - TextIteratorStreamer, AutoModel, BitsAndBytesConfig + TextIteratorStreamer, + AutoModel, + BitsAndBytesConfig ) from PIL import Image -MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/glm-4v-9b') +MODEL_PATH = "THUDM/glm-4v-9b" tokenizer = AutoTokenizer.from_pretrained( MODEL_PATH,