update link

2024-06-05 16:24:36 +08:00 · 2024-06-05 16:24:36 +08:00 · 0423d7ca6d
parent 4d5194d758
commit 0423d7ca6d
4 changed files with 85 additions and 54 deletions
--- a/README.md
+++ b/README.md
@ -21,12 +21,12 @@ GLM-4V-9B。**GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多

 ## 模型列表

-| Model            | Type | Seq Length | Download                                                                                                                                | Online Demo                                                                                 |
-|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
-| GLM-4-9B         | Base | 8K         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b)                 | /                                                                                           |
-| GLM-4-9B-Chat    | Chat | 128K       | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat)       | [🤖 ModelScope](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary) |
-| GLM-4-9B-Chat-1M | Chat | 1M         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | /                                                                                           |
-| GLM-4V-9B        | Chat | 8K         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)               | /                                                                                           |
+| Model            | Type | Seq Length | Download                                                                                                                                | Online Demo                                                                                                                                                                                |
+|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| GLM-4-9B         | Base | 8K         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b)                 | /                                                                                                                                                                                          |
+| GLM-4-9B-Chat    | Chat | 128K       | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat)       | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope VLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
+| GLM-4-9B-Chat-1M | Chat | 1M         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | /                                                                                                                                                                                          |
+| GLM-4V-9B        | Chat | 8K         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)               | /                                                                                                                                                                                          |

 ## 评测结果

@ -65,14 +65,14 @@ GLM-4V-9B。**GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多

 在六个多语言数据集上对 GLM-4-9B-Chat 和 Llama-3-8B-Instruct 进行了测试，测试结果及数据集对应选取语言如下表

-| Dataset     | Llama-3-8B-Instruct | GLM-4-9B-Chat |                                           Languages                                            
+| Dataset     | Llama-3-8B-Instruct | GLM-4-9B-Chat |                                           Languages                                            |
 |:------------|:-------------------:|:-------------:|:----------------------------------------------------------------------------------------------:|
-| M-MMLU      |        49.6         |     56.6      |                                              all                                               
-| FLORES      |        25.0         |     28.8      | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no 
-| MGSM        |        54.0         |     65.3      |                           zh, en, bn, de, es, fr, ja, ru, sw, te, th                           
-| XWinograd   |        61.7         |     73.1      |                                     zh, en, fr, jp, ru, pt                                     
-| XStoryCloze |        84.7         |     90.7      |                           zh, en, ar, es, eu, hi, id, my, ru, sw, te                           
-| XCOPA       |        73.3         |     80.1      |                           zh, et, ht, id, it, qu, sw, ta, th, tr, vi                           
+| M-MMLU      |        49.6         |     56.6      |                                              all                                               |
+| FLORES      |        25.0         |     28.8      | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no |
+| MGSM        |        54.0         |     65.3      |                           zh, en, bn, de, es, fr, ja, ru, sw, te, th                           |
+| XWinograd   |        61.7         |     73.1      |                                     zh, en, fr, jp, ru, pt                                     |
+| XStoryCloze |        84.7         |     90.7      |                           zh, en, ar, es, eu, hi, id, my, ru, sw, te                           |
+| XCOPA       |        73.3         |     80.1      |                           zh, et, ht, id, it, qu, sw, ta, th, tr, vi                           |

 ### 工具调用能力

@ -106,6 +106,8 @@ GLM-4V-9B 是一个多模态语言模型，具备视觉理解能力，其相关

 ## 快速调用

+**硬件配置和系统要求，请查看[这里](basic_demo/README.md)。**
+
 ### 使用以下方法快速调用 GLM-4-9B-Chat 语言模型

 使用 transformers 后端进行推理:
@ -232,6 +234,10 @@ with torch.no_grad():
    + PEFT (LORA, P-Tuning) 微调代码
    + SFT 微调代码

+## 友情链接
+
+ [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory): 高效开源微调框架，已支持 GLM-4-9B-Chat 语言模型微调。
+
 ## 协议

 + GLM-4 模型的权重的使用则需要遵循 [模型协议](https://huggingface.co/THUDM/glm-4-9b/blob/main/LICENSE)。
--- a/README_en.md
+++ b/README_en.md
@ -12,24 +12,25 @@

 GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu
 AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, **GLM-4-9B**
-and its human preference-aligned version **GLM-4-9B-Chat** have shown superior performance beyond Llama-3-8B. In addition to
-multi-round conversations, GLM-4-9B-Chat
-also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text
+and its human preference-aligned version **GLM-4-9B-Chat** have shown superior performance beyond Llama-3-8B. In
+addition to multi-round conversations, GLM-4-9B-Chat also has advanced features such as web browsing, code execution,
+custom tool calls (Function Call), and long text
 reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26
 languages including Japanese, Korean, and German. We have also launched the **GLM-4-9B-Chat-1M** model that supports 1M
 context length (about 2 million Chinese characters) and the multimodal model GLM-4V-9B based on GLM-4-9B.
-**GLM-4V-9B** possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120. 
-In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
+**GLM-4V-9B** possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120.
+In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning,
+text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to
+GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.

 ## Model List

-
-| Model            | Type | Seq Length | Download                                                                                                                                | Online Demo                                                                                 |
-|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
-| GLM-4-9B         | Base | 8K         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b)                 | /                                                                                           |
-| GLM-4-9B-Chat    | Chat | 128K       | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat)       | [🤖 ModelScope](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary) |
-| GLM-4-9B-Chat-1M | Chat | 1M         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | /                                                                                           |
-| GLM-4V-9B        | Chat | 8K         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)               | /                                                                                           |
+| Model            | Type | Seq Length | Download                                                                                                                                | Online Demo                                                                                                                                                                                |
+|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| GLM-4-9B         | Base | 8K         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b)                 | /                                                                                                                                                                                          |
+| GLM-4-9B-Chat    | Chat | 128K       | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat)       | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope VLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
+| GLM-4-9B-Chat-1M | Chat | 1M         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | /                                                                                                                                                                                          |
+| GLM-4V-9B        | Chat | 8K         | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b)  [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)               | /                                                                                                                                                                                          |

 ## BenchMark

@ -68,16 +69,17 @@ The long text capability was further evaluated on LongBench-Chat, and the result

 ### Multi Language

-The tests for GLM-4-9B-Chat and Llama-3-8B-Instruct are conducted on six multilingual datasets. The test results and the corresponding languages selected for each dataset are shown in the table below:
+The tests for GLM-4-9B-Chat and Llama-3-8B-Instruct are conducted on six multilingual datasets. The test results and the
+corresponding languages selected for each dataset are shown in the table below:

-| Dataset     | Llama-3-8B-Instruct | GLM-4-9B-Chat |                                           Languages                                            
+| Dataset     | Llama-3-8B-Instruct | GLM-4-9B-Chat |                                           Languages                                            |
 |:------------|:-------------------:|:-------------:|:----------------------------------------------------------------------------------------------:|
-| M-MMLU      |        49.6         |     56.6      |                                              all                                               
-| FLORES      |        25.0         |     28.8      | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no 
-| MGSM        |        54.0         |     65.3      |                           zh, en, bn, de, es, fr, ja, ru, sw, te, th                           
-| XWinograd   |        61.7         |     73.1      |                                     zh, en, fr, jp, ru, pt                                     
-| XStoryCloze |        84.7         |     90.7      |                           zh, en, ar, es, eu, hi, id, my, ru, sw, te                           
-| XCOPA       |        73.3         |     80.1      |                           zh, et, ht, id, it, qu, sw, ta, th, tr, vi                           
+| M-MMLU      |        49.6         |     56.6      |                                              all                                               |
+| FLORES      |        25.0         |     28.8      | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no |
+| MGSM        |        54.0         |     65.3      |                           zh, en, bn, de, es, fr, ja, ru, sw, te, th                           |
+| XWinograd   |        61.7         |     73.1      |                                     zh, en, fr, jp, ru, pt                                     |
+| XStoryCloze |        84.7         |     90.7      |                           zh, en, ar, es, eu, hi, id, my, ru, sw, te                           |
+| XCOPA       |        73.3         |     80.1      |                           zh, et, ht, id, it, qu, sw, ta, th, tr, vi                           |

 ### Function Call

@ -112,6 +114,8 @@ classic tasks are as follows:

 ## Quick call

+**硬件配置和系统要求，请查看[这里](basic_demo/README_en.md)。**
+
 ### Use the following method to quickly call the GLM-4-9B-Chat language model

 Use the transformers backend for inference:
@ -241,6 +245,11 @@ with basic GLM-4-9B usage and development code through the following content
 + PEFT (LORA, P-Tuning) fine-tuning code
 + SFT fine-tuning code

+## Friendly Links
+
+ [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory): Efficient open-source fine-tuning framework,
+  already supports GLM-4-9B-Chat language model fine-tuning.
+
 ## License

 + The use of GLM-4 model weights must follow
--- a/basic_demo/README.md
+++ b/basic_demo/README.md
@ -32,7 +32,6 @@ Read this in [English](README_en.md)
 | BF16 | 20629MiB | 0.8199s         | 31.8613 tokens/s | 输入长度为 8000   |
 | BF16 | 27779MiB | 4.3554s         | 14.4108 tokens/s | 输入长度为 32000  |
 | BF16 | 57379MiB | 38.1467s        | 3.4205  tokens/s | 输入长度为 128000 |
-| BF16 | 74497MiB | 98.4930s        | 2.3653  tokens/s | 输入长度为 200000 |

 | 精度   | 显存占用     | Prefilling / 首响 | Decode Speed     | Remarks     |
 |------|----------|-----------------|------------------|-------------|
@ -40,6 +39,14 @@ Read this in [English](README_en.md)
 | Int4 | 9613MiB  | 0.8629s         | 23.4248 tokens/s | 输入长度为 8000  |
 | Int4 | 16065MiB | 4.3906s         | 14.6553 tokens/s | 输入长度为 32000 |

+### GLM-4-9B-Chat-1M
+
+| 精度   | 显存占用     | Prefilling / 首响 | Decode Speed     | Remarks      |
+|------|----------|-----------------|------------------|--------------|
+| BF16 | 74497MiB | 98.4930s        | 2.3653  tokens/s | 输入长度为 200000 |
+
+如果您的输入超过200K，我们建议您使用VLLM后端进行多卡推理，以获得更好的性能。
+
 #### GLM-4V-9B

 | 精度   | 显存占用     | Prefilling / 首响 | Decode Speed     | Remarks    |
--- a/basic_demo/README_en.md
+++ b/basic_demo/README_en.md
@ -23,34 +23,43 @@ Test hardware information:
 The stress test data of relevant inference are as follows:

 **All tests are performed on a single GPU, and all video memory consumption is calculated based on the peak value**
+
 #
+
 ### GLM-4-9B-Chat

-| 精度   | 显存占用     | Prefilling / 首响 | Decode Speed     | Remarks      |
-|------|----------|-----------------|------------------|--------------|
-| BF16 | 19047MiB | 0.1554s         | 27.8193 tokens/s | Input length is 1000   |
-| BF16 | 20629MiB | 0.8199s         | 31.8613 tokens/s | Input length is 8000   |
-| BF16 | 27779MiB | 4.3554s         | 14.4108 tokens/s | Input length is 32000  |
-| BF16 | 57379MiB | 38.1467s        | 3.4205  tokens/s | Input length is 128000 |
-| BF16 | 74497MiB | 98.4930s        | 2.3653  tokens/s | Input length is 200000 |
+| Dtype | GPU Memory | Prefilling | Decode Speed     | Remarks                |
+|-------|------------|------------|------------------|------------------------|
+| BF16  | 19047MiB   | 0.1554s    | 27.8193 tokens/s | Input length is 1000   |
+| BF16  | 20629MiB   | 0.8199s    | 31.8613 tokens/s | Input length is 8000   |
+| BF16  | 27779MiB   | 4.3554s    | 14.4108 tokens/s | Input length is 32000  |
+| BF16  | 57379MiB   | 38.1467s   | 3.4205  tokens/s | Input length is 128000 |

-| 精度   | 显存占用     | Prefilling / 首响 | Decode Speed     | Remarks               |
-|------|----------|-----------------|------------------|-----------------------|
-| Int4 | 8251MiB  | 0.1667s         | 23.3903 tokens/s | Input length is 1000  |
-| Int4 | 9613MiB  | 0.8629s         | 23.4248 tokens/s | Input length is 8000  |
-| Int4 | 16065MiB | 4.3906s         | 14.6553 tokens/s | Input length is 32000 |
+| Dtype | GPU Memory | Prefilling | Decode Speed     | Remarks               |
+|-------|------------|------------|------------------|-----------------------|
+| Int4  | 8251MiB    | 0.1667s    | 23.3903 tokens/s | Input length is 1000  |
+| Int4  | 9613MiB    | 0.8629s    | 23.4248 tokens/s | Input length is 8000  |
+| Int4  | 16065MiB   | 4.3906s    | 14.6553 tokens/s | Input length is 32000 |
+
+### GLM-4-9B-Chat-1M
+
+| Dtype | GPU Memory | Prefilling | Decode Speed     | Remarks      |
+|-------|------------|------------|------------------|--------------|
+| BF16  | 74497MiB   | 98.4930s   | 2.3653  tokens/s | 输入长度为 200000 |
+
+If your input exceeds 200K, we recommend that you use the VLLM backend with multi gpus for inference to get better performance.

 #### GLM-4V-9B

-| Dtype   | GPU Memory | Prefilling | Decode Speed     | Remarks    |
-|------|------------|-----------------|------------------|------------|
-| BF16 | 28131MiB   | 0.1016s         | 33.4660 tokens/s | Input length is 1000 |
-| BF16 | 33043MiB   | 0.7935a         | 39.2444 tokens/s | Input length is 8000 |
+| Dtype | GPU Memory | Prefilling | Decode Speed     | Remarks              |
+|-------|------------|------------|------------------|----------------------|
+| BF16  | 28131MiB   | 0.1016s    | 33.4660 tokens/s | Input length is 1000 |
+| BF16  | 33043MiB   | 0.7935a    | 39.2444 tokens/s | Input length is 8000 |

-| Dtype |  GPU Memory     | Prefilling  | Decode Speed     | Remarks    |
-|-------|----------|-----------------|------------------|------------|
-| Int4  | 10267MiB | 0.1685a         | 28.7101 tokens/s | Input length is 1000 |
-| Int4  | 14105MiB | 0.8629s         | 24.2370 tokens/s | Input length is 8000 |
+| Dtype | GPU Memory | Prefilling | Decode Speed     | Remarks              |
+|-------|------------|------------|------------------|----------------------|
+| Int4  | 10267MiB   | 0.1685a    | 28.7101 tokens/s | Input length is 1000 |
+| Int4  | 14105MiB   | 0.8629s    | 24.2370 tokens/s | Input length is 8000 |

 ### Minimum hardware requirements