This commit is contained in:
zR 2024-06-24 23:45:04 +08:00
parent 5722878e25
commit e5b5630498
7 changed files with 19 additions and 6 deletions

View File

@ -11,7 +11,9 @@ Read this in [English](README_en.md)
## 项目更新 ## 项目更新
- 🔥🔥 **News**: ``2024/6/19``: 我们更新了模型仓库的运行文件和配置文件,修复了部分已知的模型推理的问题,欢迎大家克隆最新的模型仓库。 - 🔥🔥 **News**: ``2024/6/24``: 我们更新了模型仓库的运行文件和配置文件,支持 Flash Attention 2,
请更新模型配置文件并参考 `basic_demo/trans_cli_demo.py` 中的示例代码。
- 🔥 **News**: ``2024/6/19``: 我们更新了模型仓库的运行文件和配置文件,修复了部分已知的模型推理的问题,欢迎大家克隆最新的模型仓库。
- 🔥 **News**: ``2024/6/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。 - 🔥 **News**: ``2024/6/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。
- 🔥 **News**: ``2024/6/05``: 我们发布 GLM-4-9B 系列开源模型 - 🔥 **News**: ``2024/6/05``: 我们发布 GLM-4-9B 系列开源模型

View File

@ -9,6 +9,8 @@
</p> </p>
## Update ## Update
- 🔥🔥 **News**: ``2024/6/24``: We have updated the running files and configuration files of the model repository to support Flash Attention 2,
Please update the model configuration file and refer to the sample code in `basic_demo/trans_cli_demo.py`.
- 🔥🔥 **News**: ``2024/6/19``: We updated the running files and configuration files of the model repository and fixed some model inference issues. Welcome to clone the latest model repository. - 🔥🔥 **News**: ``2024/6/19``: We updated the running files and configuration files of the model repository and fixed some model inference issues. Welcome to clone the latest model repository.
- 🔥 **News**: ``2024/6/18``: We released a [technical report](https://arxiv.org/pdf/2406.12793), welcome to check it out. - 🔥 **News**: ``2024/6/18``: We released a [technical report](https://arxiv.org/pdf/2406.12793), welcome to check it out.
- 🔥 **News**: ``2024/6/05``: We released the GLM-4-9B series of open source models - 🔥 **News**: ``2024/6/05``: We released the GLM-4-9B series of open source models

View File

@ -91,10 +91,11 @@ python trans_cli_demo.py # GLM-4-9B-Chat
python trans_cli_vision_demo.py # GLM-4V-9B python trans_cli_vision_demo.py # GLM-4V-9B
``` ```
+ 使用 Gradio 网页端与 GLM-4-9B-Chat 模型进行对话。 + 使用 Gradio 网页端与 GLM-4-9B 模型进行对话。
```shell ```shell
python trans_web_demo.py python trans_web_demo.py # GLM-4-9B-Chat
python trans_web_vision_demo.py # GLM-4V-9B
``` ```
+ 使用 Batch 推理。 + 使用 Batch 推理。

View File

@ -96,10 +96,11 @@ python trans_cli_demo.py # GLM-4-9B-Chat
python trans_cli_vision_demo.py # GLM-4V-9B python trans_cli_vision_demo.py # GLM-4V-9B
``` ```
+ Use the Gradio web client to communicate with the GLM-4-9B-Chat model. + Use the Gradio web client to communicate with the GLM-4-9B model.
```shell ```shell
python trans_web_demo.py python trans_web_demo.py # GLM-4-9B-Chat
python trans_web_vision_demo.py # GLM-4V-9B
``` ```
+ Use Batch inference. + Use Batch inference.

View File

@ -8,6 +8,8 @@ Usage:
Note: The script includes a modification to handle markdown to plain text conversion, Note: The script includes a modification to handle markdown to plain text conversion,
ensuring that the CLI interface displays formatted text correctly. ensuring that the CLI interface displays formatted text correctly.
If you use flash attention, you should install the flash-attn and add attn_implementation="flash_attention_2" in model loading.
""" """
import os import os
@ -40,9 +42,12 @@ tokenizer = AutoTokenizer.from_pretrained(
trust_remote_code=True, trust_remote_code=True,
encode_special_tokens=True encode_special_tokens=True
) )
model = AutoModel.from_pretrained( model = AutoModel.from_pretrained(
MODEL_PATH, MODEL_PATH,
trust_remote_code=True, trust_remote_code=True,
# attn_implementation="flash_attention_2", # Use Flash Attention
# torch_dtype=torch.bfloat16, #using flash-attn must use bfloat16 or float16
device_map="auto").eval() device_map="auto").eval()

View File

@ -32,10 +32,12 @@ tokenizer = AutoTokenizer.from_pretrained(
model = AutoModel.from_pretrained( model = AutoModel.from_pretrained(
MODEL_PATH, MODEL_PATH,
trust_remote_code=True, trust_remote_code=True,
# attn_implementation="flash_attention_2", # Use Flash Attention
# torch_dtype=torch.bfloat16, # using flash-attn must use bfloat16 or float16,
device_map="auto", device_map="auto",
torch_dtype=torch.bfloat16
).eval() ).eval()
## For INT4 inference ## For INT4 inference
# model = AutoModel.from_pretrained( # model = AutoModel.from_pretrained(
# MODEL_PATH, # MODEL_PATH,