fix #232
This commit is contained in:
parent
5722878e25
commit
e5b5630498
|
@ -11,7 +11,9 @@ Read this in [English](README_en.md)
|
||||||
|
|
||||||
## 项目更新
|
## 项目更新
|
||||||
|
|
||||||
- 🔥🔥 **News**: ``2024/6/19``: 我们更新了模型仓库的运行文件和配置文件,修复了部分已知的模型推理的问题,欢迎大家克隆最新的模型仓库。
|
- 🔥🔥 **News**: ``2024/6/24``: 我们更新了模型仓库的运行文件和配置文件,支持 Flash Attention 2,
|
||||||
|
请更新模型配置文件并参考 `basic_demo/trans_cli_demo.py` 中的示例代码。
|
||||||
|
- 🔥 **News**: ``2024/6/19``: 我们更新了模型仓库的运行文件和配置文件,修复了部分已知的模型推理的问题,欢迎大家克隆最新的模型仓库。
|
||||||
- 🔥 **News**: ``2024/6/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。
|
- 🔥 **News**: ``2024/6/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。
|
||||||
- 🔥 **News**: ``2024/6/05``: 我们发布 GLM-4-9B 系列开源模型
|
- 🔥 **News**: ``2024/6/05``: 我们发布 GLM-4-9B 系列开源模型
|
||||||
|
|
||||||
|
|
|
@ -9,6 +9,8 @@
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
## Update
|
## Update
|
||||||
|
- 🔥🔥 **News**: ``2024/6/24``: We have updated the running files and configuration files of the model repository to support Flash Attention 2,
|
||||||
|
Please update the model configuration file and refer to the sample code in `basic_demo/trans_cli_demo.py`.
|
||||||
- 🔥🔥 **News**: ``2024/6/19``: We updated the running files and configuration files of the model repository and fixed some model inference issues. Welcome to clone the latest model repository.
|
- 🔥🔥 **News**: ``2024/6/19``: We updated the running files and configuration files of the model repository and fixed some model inference issues. Welcome to clone the latest model repository.
|
||||||
- 🔥 **News**: ``2024/6/18``: We released a [technical report](https://arxiv.org/pdf/2406.12793), welcome to check it out.
|
- 🔥 **News**: ``2024/6/18``: We released a [technical report](https://arxiv.org/pdf/2406.12793), welcome to check it out.
|
||||||
- 🔥 **News**: ``2024/6/05``: We released the GLM-4-9B series of open source models
|
- 🔥 **News**: ``2024/6/05``: We released the GLM-4-9B series of open source models
|
||||||
|
|
|
@ -91,10 +91,11 @@ python trans_cli_demo.py # GLM-4-9B-Chat
|
||||||
python trans_cli_vision_demo.py # GLM-4V-9B
|
python trans_cli_vision_demo.py # GLM-4V-9B
|
||||||
```
|
```
|
||||||
|
|
||||||
+ 使用 Gradio 网页端与 GLM-4-9B-Chat 模型进行对话。
|
+ 使用 Gradio 网页端与 GLM-4-9B 模型进行对话。
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python trans_web_demo.py
|
python trans_web_demo.py # GLM-4-9B-Chat
|
||||||
|
python trans_web_vision_demo.py # GLM-4V-9B
|
||||||
```
|
```
|
||||||
|
|
||||||
+ 使用 Batch 推理。
|
+ 使用 Batch 推理。
|
||||||
|
|
|
@ -96,10 +96,11 @@ python trans_cli_demo.py # GLM-4-9B-Chat
|
||||||
python trans_cli_vision_demo.py # GLM-4V-9B
|
python trans_cli_vision_demo.py # GLM-4V-9B
|
||||||
```
|
```
|
||||||
|
|
||||||
+ Use the Gradio web client to communicate with the GLM-4-9B-Chat model.
|
+ Use the Gradio web client to communicate with the GLM-4-9B model.
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python trans_web_demo.py
|
python trans_web_demo.py # GLM-4-9B-Chat
|
||||||
|
python trans_web_vision_demo.py # GLM-4V-9B
|
||||||
```
|
```
|
||||||
|
|
||||||
+ Use Batch inference.
|
+ Use Batch inference.
|
||||||
|
|
|
@ -8,6 +8,8 @@ Usage:
|
||||||
|
|
||||||
Note: The script includes a modification to handle markdown to plain text conversion,
|
Note: The script includes a modification to handle markdown to plain text conversion,
|
||||||
ensuring that the CLI interface displays formatted text correctly.
|
ensuring that the CLI interface displays formatted text correctly.
|
||||||
|
|
||||||
|
If you use flash attention, you should install the flash-attn and add attn_implementation="flash_attention_2" in model loading.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import os
|
import os
|
||||||
|
@ -40,9 +42,12 @@ tokenizer = AutoTokenizer.from_pretrained(
|
||||||
trust_remote_code=True,
|
trust_remote_code=True,
|
||||||
encode_special_tokens=True
|
encode_special_tokens=True
|
||||||
)
|
)
|
||||||
|
|
||||||
model = AutoModel.from_pretrained(
|
model = AutoModel.from_pretrained(
|
||||||
MODEL_PATH,
|
MODEL_PATH,
|
||||||
trust_remote_code=True,
|
trust_remote_code=True,
|
||||||
|
# attn_implementation="flash_attention_2", # Use Flash Attention
|
||||||
|
# torch_dtype=torch.bfloat16, #using flash-attn must use bfloat16 or float16
|
||||||
device_map="auto").eval()
|
device_map="auto").eval()
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -32,10 +32,12 @@ tokenizer = AutoTokenizer.from_pretrained(
|
||||||
model = AutoModel.from_pretrained(
|
model = AutoModel.from_pretrained(
|
||||||
MODEL_PATH,
|
MODEL_PATH,
|
||||||
trust_remote_code=True,
|
trust_remote_code=True,
|
||||||
|
# attn_implementation="flash_attention_2", # Use Flash Attention
|
||||||
|
# torch_dtype=torch.bfloat16, # using flash-attn must use bfloat16 or float16,
|
||||||
device_map="auto",
|
device_map="auto",
|
||||||
torch_dtype=torch.bfloat16
|
|
||||||
).eval()
|
).eval()
|
||||||
|
|
||||||
|
|
||||||
## For INT4 inference
|
## For INT4 inference
|
||||||
# model = AutoModel.from_pretrained(
|
# model = AutoModel.from_pretrained(
|
||||||
# MODEL_PATH,
|
# MODEL_PATH,
|
||||||
|
|
Loading…
Reference in New Issue