更新VLLM demo的参数

2024-06-05 11:43:08 +08:00 · 2024-06-05 11:43:08 +08:00 · d95f131b03
parent 214634daf3
commit d95f131b03
3 changed files with 5 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -11,8 +11,7 @@ Read this in [English](README_en.md)

 ## 模型介绍

-GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源版本。 在语义、数学、推理、代码和知识等多方面的数据集测评中，*
-*GLM-4-9B**
+GLM-4-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源版本。 在语义、数学、推理、代码和知识等多方面的数据集测评中，**GLM-4-9B**
 及其人类偏好对齐的版本 **GLM-4-9B-Chat** 均表现出超越 Llama-3-8B 的卓越性能。除了能进行多轮对话，GLM-4-9B-Chat
 还具备网页浏览、代码执行、自定义工具调用（Function Call）和长文本推理（支持最大 128K 上下文）等高级功能。本代模型增加了多语言支持，支持包括日语，韩语，德语在内的
 26 种语言。我们还推出了支持 1M 上下文长度（约 200 万中文字符）的 **GLM-4-9B-Chat-1M** 模型和基于 GLM-4-9B 的多模态模型
--- a/README_en.md
+++ b/README_en.md
@ -66,7 +66,7 @@ The long text capability was further evaluated on LongBench-Chat, and the result
 <img src="resources/longbench.png" alt="Description text" style="display: block; margin: auto; width: 65%;">
 </p>

-### 多语言能力
+### Multi Language

 The tests for GLM-4-9B-Chat and Llama-3-8B-Instruct are conducted on six multilingual datasets. The test results and the corresponding languages selected for each dataset are shown in the table below:

--- a/basic_demo/vllm_cli_demo.py
+++ b/basic_demo/vllm_cli_demo.py
@ -30,6 +30,9 @@ def load_model_and_tokenizer(model_dir: str):
        worker_use_ray=True,
        engine_use_ray=False,
        disable_log_requests=True
+        # 如果遇见 OOM 现象，建议开启下述参数
+        # enable_chunked_prefill=True,
+        # max_num_batched_tokens=8192
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_dir,