diff --git a/README.md b/README.md index ef96573..c3d4c2c 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,9 @@ Read this in [English](README_en.md) ## 项目更新 -- 🔥🔥 **News**: ``2024/6/19``: 我们更新了模型仓库的运行文件和配置文件,修复了部分已知的模型推理的问题,欢迎大家克隆最新的模型仓库。 +- 🔥🔥 **News**: ``2024/6/24``: 我们更新了模型仓库的运行文件和配置文件,支持 Flash Attention 2, +请更新模型配置文件并参考 `basic_demo/trans_cli_demo.py` 中的示例代码。 +- 🔥 **News**: ``2024/6/19``: 我们更新了模型仓库的运行文件和配置文件,修复了部分已知的模型推理的问题,欢迎大家克隆最新的模型仓库。 - 🔥 **News**: ``2024/6/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。 - 🔥 **News**: ``2024/6/05``: 我们发布 GLM-4-9B 系列开源模型 diff --git a/README_en.md b/README_en.md index 81292e5..1201e85 100644 --- a/README_en.md +++ b/README_en.md @@ -9,6 +9,8 @@ </p> ## Update +- 🔥🔥 **News**: ``2024/6/24``: We have updated the running files and configuration files of the model repository to support Flash Attention 2, +Please update the model configuration file and refer to the sample code in `basic_demo/trans_cli_demo.py`. - 🔥🔥 **News**: ``2024/6/19``: We updated the running files and configuration files of the model repository and fixed some model inference issues. Welcome to clone the latest model repository. - 🔥 **News**: ``2024/6/18``: We released a [technical report](https://arxiv.org/pdf/2406.12793), welcome to check it out. - 🔥 **News**: ``2024/6/05``: We released the GLM-4-9B series of open source models diff --git a/basic_demo/README.md b/basic_demo/README.md index a56b3b1..0c69526 100644 --- a/basic_demo/README.md +++ b/basic_demo/README.md @@ -91,10 +91,11 @@ python trans_cli_demo.py # GLM-4-9B-Chat python trans_cli_vision_demo.py # GLM-4V-9B ``` -+ 使用 Gradio 网页端与 GLM-4-9B-Chat 模型进行对话。 ++ 使用 Gradio 网页端与 GLM-4-9B 模型进行对话。 ```shell -python trans_web_demo.py +python trans_web_demo.py # GLM-4-9B-Chat +python trans_web_vision_demo.py # GLM-4V-9B ``` + 使用 Batch 推理。 diff --git a/basic_demo/README_en.md b/basic_demo/README_en.md index fd4b18b..570e446 100644 --- a/basic_demo/README_en.md +++ b/basic_demo/README_en.md @@ -96,10 +96,11 @@ python trans_cli_demo.py # GLM-4-9B-Chat python trans_cli_vision_demo.py # GLM-4V-9B ``` -+ Use the Gradio web client to communicate with the GLM-4-9B-Chat model. ++ Use the Gradio web client to communicate with the GLM-4-9B model. ```shell -python trans_web_demo.py +python trans_web_demo.py # GLM-4-9B-Chat +python trans_web_vision_demo.py # GLM-4V-9B ``` + Use Batch inference. diff --git a/basic_demo/trans_cli_demo.py b/basic_demo/trans_cli_demo.py index c7d98e6..cbd0ba0 100644 --- a/basic_demo/trans_cli_demo.py +++ b/basic_demo/trans_cli_demo.py @@ -8,6 +8,8 @@ Usage: Note: The script includes a modification to handle markdown to plain text conversion, ensuring that the CLI interface displays formatted text correctly. + +If you use flash attention, you should install the flash-attn and add attn_implementation="flash_attention_2" in model loading. """ import os @@ -40,9 +42,12 @@ tokenizer = AutoTokenizer.from_pretrained( trust_remote_code=True, encode_special_tokens=True ) + model = AutoModel.from_pretrained( MODEL_PATH, trust_remote_code=True, + # attn_implementation="flash_attention_2", # Use Flash Attention + # torch_dtype=torch.bfloat16, #using flash-attn must use bfloat16 or float16 device_map="auto").eval() diff --git a/basic_demo/trans_cli_vision_demo.py b/basic_demo/trans_cli_vision_demo.py index 72e53db..d1dea76 100644 --- a/basic_demo/trans_cli_vision_demo.py +++ b/basic_demo/trans_cli_vision_demo.py @@ -32,10 +32,12 @@ tokenizer = AutoTokenizer.from_pretrained( model = AutoModel.from_pretrained( MODEL_PATH, trust_remote_code=True, + # attn_implementation="flash_attention_2", # Use Flash Attention + # torch_dtype=torch.bfloat16, # using flash-attn must use bfloat16 or float16, device_map="auto", - torch_dtype=torch.bfloat16 ).eval() + ## For INT4 inference # model = AutoModel.from_pretrained( # MODEL_PATH, diff --git a/basic_demo/trans_cli_vision_gradio_demo.py b/basic_demo/trans_web_vision_demo.py similarity index 100% rename from basic_demo/trans_cli_vision_gradio_demo.py rename to basic_demo/trans_web_vision_demo.py