update link

This commit is contained in:
zR 2024-06-05 16:24:36 +08:00
parent 4d5194d758
commit 0423d7ca6d
4 changed files with 85 additions and 54 deletions

View File

@ -21,12 +21,12 @@ GLM-4V-9B。**GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多
## 模型列表
| Model | Type | Seq Length | Download | Online Demo |
|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
| GLM-4-9B | Base | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b) | / |
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary) |
| GLM-4-9B-Chat-1M | Chat | 1M | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | / |
| GLM-4V-9B | Chat | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b) | / |
| Model | Type | Seq Length | Download | Online Demo |
|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| GLM-4-9B | Base | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b) | / |
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope VLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
| GLM-4-9B-Chat-1M | Chat | 1M | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | / |
| GLM-4V-9B | Chat | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b) | / |
## 评测结果
@ -65,14 +65,14 @@ GLM-4V-9B。**GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多
在六个多语言数据集上对 GLM-4-9B-Chat 和 Llama-3-8B-Instruct 进行了测试,测试结果及数据集对应选取语言如下表
| Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages
| Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages |
|:------------|:-------------------:|:-------------:|:----------------------------------------------------------------------------------------------:|
| M-MMLU | 49.6 | 56.6 | all
| FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no
| MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th
| XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt
| XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te
| XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi
| M-MMLU | 49.6 | 56.6 | all |
| FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no |
| MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th |
| XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt |
| XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te |
| XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi |
### 工具调用能力
@ -106,6 +106,8 @@ GLM-4V-9B 是一个多模态语言模型,具备视觉理解能力,其相关
## 快速调用
**硬件配置和系统要求,请查看[这里](basic_demo/README.md)。**
### 使用以下方法快速调用 GLM-4-9B-Chat 语言模型
使用 transformers 后端进行推理:
@ -232,6 +234,10 @@ with torch.no_grad():
+ PEFT (LORA, P-Tuning) 微调代码
+ SFT 微调代码
## 友情链接
+ [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory): 高效开源微调框架,已支持 GLM-4-9B-Chat 语言模型微调。
## 协议
+ GLM-4 模型的权重的使用则需要遵循 [模型协议](https://huggingface.co/THUDM/glm-4-9b/blob/main/LICENSE)。

View File

@ -12,24 +12,25 @@
GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu
AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, **GLM-4-9B**
and its human preference-aligned version **GLM-4-9B-Chat** have shown superior performance beyond Llama-3-8B. In addition to
multi-round conversations, GLM-4-9B-Chat
also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text
and its human preference-aligned version **GLM-4-9B-Chat** have shown superior performance beyond Llama-3-8B. In
addition to multi-round conversations, GLM-4-9B-Chat also has advanced features such as web browsing, code execution,
custom tool calls (Function Call), and long text
reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26
languages including Japanese, Korean, and German. We have also launched the **GLM-4-9B-Chat-1M** model that supports 1M
context length (about 2 million Chinese characters) and the multimodal model GLM-4V-9B based on GLM-4-9B.
**GLM-4V-9B** possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120.
In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
**GLM-4V-9B** possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120.
In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning,
text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to
GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
## Model List
| Model | Type | Seq Length | Download | Online Demo |
|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
| GLM-4-9B | Base | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b) | / |
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary) |
| GLM-4-9B-Chat-1M | Chat | 1M | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | / |
| GLM-4V-9B | Chat | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b) | / |
| Model | Type | Seq Length | Download | Online Demo |
|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| GLM-4-9B | Base | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b) | / |
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope VLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
| GLM-4-9B-Chat-1M | Chat | 1M | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | / |
| GLM-4V-9B | Chat | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b) | / |
## BenchMark
@ -68,16 +69,17 @@ The long text capability was further evaluated on LongBench-Chat, and the result
### Multi Language
The tests for GLM-4-9B-Chat and Llama-3-8B-Instruct are conducted on six multilingual datasets. The test results and the corresponding languages selected for each dataset are shown in the table below:
The tests for GLM-4-9B-Chat and Llama-3-8B-Instruct are conducted on six multilingual datasets. The test results and the
corresponding languages selected for each dataset are shown in the table below:
| Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages
| Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages |
|:------------|:-------------------:|:-------------:|:----------------------------------------------------------------------------------------------:|
| M-MMLU | 49.6 | 56.6 | all
| FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no
| MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th
| XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt
| XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te
| XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi
| M-MMLU | 49.6 | 56.6 | all |
| FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no |
| MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th |
| XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt |
| XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te |
| XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi |
### Function Call
@ -112,6 +114,8 @@ classic tasks are as follows:
## Quick call
**硬件配置和系统要求,请查看[这里](basic_demo/README_en.md)。**
### Use the following method to quickly call the GLM-4-9B-Chat language model
Use the transformers backend for inference:
@ -241,6 +245,11 @@ with basic GLM-4-9B usage and development code through the following content
+ PEFT (LORA, P-Tuning) fine-tuning code
+ SFT fine-tuning code
## Friendly Links
+ [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory): Efficient open-source fine-tuning framework,
already supports GLM-4-9B-Chat language model fine-tuning.
## License
+ The use of GLM-4 model weights must follow

View File

@ -32,7 +32,6 @@ Read this in [English](README_en.md)
| BF16 | 20629MiB | 0.8199s | 31.8613 tokens/s | 输入长度为 8000 |
| BF16 | 27779MiB | 4.3554s | 14.4108 tokens/s | 输入长度为 32000 |
| BF16 | 57379MiB | 38.1467s | 3.4205 tokens/s | 输入长度为 128000 |
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | 输入长度为 200000 |
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|-------------|
@ -40,6 +39,14 @@ Read this in [English](README_en.md)
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | 输入长度为 8000 |
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | 输入长度为 32000 |
### GLM-4-9B-Chat-1M
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|--------------|
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | 输入长度为 200000 |
如果您的输入超过200K我们建议您使用VLLM后端进行多卡推理以获得更好的性能。
#### GLM-4V-9B
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |

View File

@ -23,34 +23,43 @@ Test hardware information:
The stress test data of relevant inference are as follows:
**All tests are performed on a single GPU, and all video memory consumption is calculated based on the peak value**
#
### GLM-4-9B-Chat
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|--------------|
| BF16 | 19047MiB | 0.1554s | 27.8193 tokens/s | Input length is 1000 |
| BF16 | 20629MiB | 0.8199s | 31.8613 tokens/s | Input length is 8000 |
| BF16 | 27779MiB | 4.3554s | 14.4108 tokens/s | Input length is 32000 |
| BF16 | 57379MiB | 38.1467s | 3.4205 tokens/s | Input length is 128000 |
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | Input length is 200000 |
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|-------|------------|------------|------------------|------------------------|
| BF16 | 19047MiB | 0.1554s | 27.8193 tokens/s | Input length is 1000 |
| BF16 | 20629MiB | 0.8199s | 31.8613 tokens/s | Input length is 8000 |
| BF16 | 27779MiB | 4.3554s | 14.4108 tokens/s | Input length is 32000 |
| BF16 | 57379MiB | 38.1467s | 3.4205 tokens/s | Input length is 128000 |
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|-----------------------|
| Int4 | 8251MiB | 0.1667s | 23.3903 tokens/s | Input length is 1000 |
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | Input length is 8000 |
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | Input length is 32000 |
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|-------|------------|------------|------------------|-----------------------|
| Int4 | 8251MiB | 0.1667s | 23.3903 tokens/s | Input length is 1000 |
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | Input length is 8000 |
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | Input length is 32000 |
### GLM-4-9B-Chat-1M
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|-------|------------|------------|------------------|--------------|
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | 输入长度为 200000 |
If your input exceeds 200K, we recommend that you use the VLLM backend with multi gpus for inference to get better performance.
#### GLM-4V-9B
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|------|------------|-----------------|------------------|------------|
| BF16 | 28131MiB | 0.1016s | 33.4660 tokens/s | Input length is 1000 |
| BF16 | 33043MiB | 0.7935a | 39.2444 tokens/s | Input length is 8000 |
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|-------|------------|------------|------------------|----------------------|
| BF16 | 28131MiB | 0.1016s | 33.4660 tokens/s | Input length is 1000 |
| BF16 | 33043MiB | 0.7935a | 39.2444 tokens/s | Input length is 8000 |
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|-------|----------|-----------------|------------------|------------|
| Int4 | 10267MiB | 0.1685a | 28.7101 tokens/s | Input length is 1000 |
| Int4 | 14105MiB | 0.8629s | 24.2370 tokens/s | Input length is 8000 |
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|-------|------------|------------|------------------|----------------------|
| Int4 | 10267MiB | 0.1685a | 28.7101 tokens/s | Input length is 1000 |
| Int4 | 14105MiB | 0.8629s | 24.2370 tokens/s | Input length is 8000 |
### Minimum hardware requirements