update link
This commit is contained in:
parent
4d5194d758
commit
0423d7ca6d
32
README.md
32
README.md
|
@ -21,12 +21,12 @@ GLM-4V-9B。**GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多
|
|||
|
||||
## 模型列表
|
||||
|
||||
| Model | Type | Seq Length | Download | Online Demo |
|
||||
|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
|
||||
| GLM-4-9B | Base | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b) | / |
|
||||
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary) |
|
||||
| GLM-4-9B-Chat-1M | Chat | 1M | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | / |
|
||||
| GLM-4V-9B | Chat | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b) | / |
|
||||
| Model | Type | Seq Length | Download | Online Demo |
|
||||
|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| GLM-4-9B | Base | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b) | / |
|
||||
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope VLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
|
||||
| GLM-4-9B-Chat-1M | Chat | 1M | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | / |
|
||||
| GLM-4V-9B | Chat | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b) | / |
|
||||
|
||||
## 评测结果
|
||||
|
||||
|
@ -65,14 +65,14 @@ GLM-4V-9B。**GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多
|
|||
|
||||
在六个多语言数据集上对 GLM-4-9B-Chat 和 Llama-3-8B-Instruct 进行了测试,测试结果及数据集对应选取语言如下表
|
||||
|
||||
| Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages
|
||||
| Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages |
|
||||
|:------------|:-------------------:|:-------------:|:----------------------------------------------------------------------------------------------:|
|
||||
| M-MMLU | 49.6 | 56.6 | all
|
||||
| FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no
|
||||
| MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th
|
||||
| XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt
|
||||
| XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te
|
||||
| XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi
|
||||
| M-MMLU | 49.6 | 56.6 | all |
|
||||
| FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no |
|
||||
| MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th |
|
||||
| XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt |
|
||||
| XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te |
|
||||
| XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi |
|
||||
|
||||
### 工具调用能力
|
||||
|
||||
|
@ -106,6 +106,8 @@ GLM-4V-9B 是一个多模态语言模型,具备视觉理解能力,其相关
|
|||
|
||||
## 快速调用
|
||||
|
||||
**硬件配置和系统要求,请查看[这里](basic_demo/README.md)。**
|
||||
|
||||
### 使用以下方法快速调用 GLM-4-9B-Chat 语言模型
|
||||
|
||||
使用 transformers 后端进行推理:
|
||||
|
@ -232,6 +234,10 @@ with torch.no_grad():
|
|||
+ PEFT (LORA, P-Tuning) 微调代码
|
||||
+ SFT 微调代码
|
||||
|
||||
## 友情链接
|
||||
|
||||
+ [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory): 高效开源微调框架,已支持 GLM-4-9B-Chat 语言模型微调。
|
||||
|
||||
## 协议
|
||||
|
||||
+ GLM-4 模型的权重的使用则需要遵循 [模型协议](https://huggingface.co/THUDM/glm-4-9b/blob/main/LICENSE)。
|
||||
|
|
49
README_en.md
49
README_en.md
|
@ -12,24 +12,25 @@
|
|||
|
||||
GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu
|
||||
AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, **GLM-4-9B**
|
||||
and its human preference-aligned version **GLM-4-9B-Chat** have shown superior performance beyond Llama-3-8B. In addition to
|
||||
multi-round conversations, GLM-4-9B-Chat
|
||||
also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text
|
||||
and its human preference-aligned version **GLM-4-9B-Chat** have shown superior performance beyond Llama-3-8B. In
|
||||
addition to multi-round conversations, GLM-4-9B-Chat also has advanced features such as web browsing, code execution,
|
||||
custom tool calls (Function Call), and long text
|
||||
reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26
|
||||
languages including Japanese, Korean, and German. We have also launched the **GLM-4-9B-Chat-1M** model that supports 1M
|
||||
context length (about 2 million Chinese characters) and the multimodal model GLM-4V-9B based on GLM-4-9B.
|
||||
**GLM-4V-9B** possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120.
|
||||
In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning, text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
|
||||
**GLM-4V-9B** possesses dialogue capabilities in both Chinese and English at a high resolution of 1120*1120.
|
||||
In various multimodal evaluations, including comprehensive abilities in Chinese and English, perception & reasoning,
|
||||
text recognition, and chart understanding, GLM-4V-9B demonstrates superior performance compared to
|
||||
GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
|
||||
|
||||
## Model List
|
||||
|
||||
|
||||
| Model | Type | Seq Length | Download | Online Demo |
|
||||
|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
|
||||
| GLM-4-9B | Base | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b) | / |
|
||||
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary) |
|
||||
| GLM-4-9B-Chat-1M | Chat | 1M | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | / |
|
||||
| GLM-4V-9B | Chat | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b) | / |
|
||||
| Model | Type | Seq Length | Download | Online Demo |
|
||||
|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| GLM-4-9B | Base | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b) | / |
|
||||
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope VLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
|
||||
| GLM-4-9B-Chat-1M | Chat | 1M | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | / |
|
||||
| GLM-4V-9B | Chat | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b) | / |
|
||||
|
||||
## BenchMark
|
||||
|
||||
|
@ -68,16 +69,17 @@ The long text capability was further evaluated on LongBench-Chat, and the result
|
|||
|
||||
### Multi Language
|
||||
|
||||
The tests for GLM-4-9B-Chat and Llama-3-8B-Instruct are conducted on six multilingual datasets. The test results and the corresponding languages selected for each dataset are shown in the table below:
|
||||
The tests for GLM-4-9B-Chat and Llama-3-8B-Instruct are conducted on six multilingual datasets. The test results and the
|
||||
corresponding languages selected for each dataset are shown in the table below:
|
||||
|
||||
| Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages
|
||||
| Dataset | Llama-3-8B-Instruct | GLM-4-9B-Chat | Languages |
|
||||
|:------------|:-------------------:|:-------------:|:----------------------------------------------------------------------------------------------:|
|
||||
| M-MMLU | 49.6 | 56.6 | all
|
||||
| FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no
|
||||
| MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th
|
||||
| XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt
|
||||
| XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te
|
||||
| XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi
|
||||
| M-MMLU | 49.6 | 56.6 | all |
|
||||
| FLORES | 25.0 | 28.8 | ru, es, de, fr, it, pt, pl, ja, nl, ar, tr, cs, vi, fa, hu, el, ro, sv, uk, fi, ko, da, bg, no |
|
||||
| MGSM | 54.0 | 65.3 | zh, en, bn, de, es, fr, ja, ru, sw, te, th |
|
||||
| XWinograd | 61.7 | 73.1 | zh, en, fr, jp, ru, pt |
|
||||
| XStoryCloze | 84.7 | 90.7 | zh, en, ar, es, eu, hi, id, my, ru, sw, te |
|
||||
| XCOPA | 73.3 | 80.1 | zh, et, ht, id, it, qu, sw, ta, th, tr, vi |
|
||||
|
||||
### Function Call
|
||||
|
||||
|
@ -112,6 +114,8 @@ classic tasks are as follows:
|
|||
|
||||
## Quick call
|
||||
|
||||
**硬件配置和系统要求,请查看[这里](basic_demo/README_en.md)。**
|
||||
|
||||
### Use the following method to quickly call the GLM-4-9B-Chat language model
|
||||
|
||||
Use the transformers backend for inference:
|
||||
|
@ -241,6 +245,11 @@ with basic GLM-4-9B usage and development code through the following content
|
|||
+ PEFT (LORA, P-Tuning) fine-tuning code
|
||||
+ SFT fine-tuning code
|
||||
|
||||
## Friendly Links
|
||||
|
||||
+ [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory): Efficient open-source fine-tuning framework,
|
||||
already supports GLM-4-9B-Chat language model fine-tuning.
|
||||
|
||||
## License
|
||||
|
||||
+ The use of GLM-4 model weights must follow
|
||||
|
|
|
@ -32,7 +32,6 @@ Read this in [English](README_en.md)
|
|||
| BF16 | 20629MiB | 0.8199s | 31.8613 tokens/s | 输入长度为 8000 |
|
||||
| BF16 | 27779MiB | 4.3554s | 14.4108 tokens/s | 输入长度为 32000 |
|
||||
| BF16 | 57379MiB | 38.1467s | 3.4205 tokens/s | 输入长度为 128000 |
|
||||
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | 输入长度为 200000 |
|
||||
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|-------------|
|
||||
|
@ -40,6 +39,14 @@ Read this in [English](README_en.md)
|
|||
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | 输入长度为 8000 |
|
||||
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | 输入长度为 32000 |
|
||||
|
||||
### GLM-4-9B-Chat-1M
|
||||
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|--------------|
|
||||
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | 输入长度为 200000 |
|
||||
|
||||
如果您的输入超过200K,我们建议您使用VLLM后端进行多卡推理,以获得更好的性能。
|
||||
|
||||
#### GLM-4V-9B
|
||||
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|
|
|
@ -23,34 +23,43 @@ Test hardware information:
|
|||
The stress test data of relevant inference are as follows:
|
||||
|
||||
**All tests are performed on a single GPU, and all video memory consumption is calculated based on the peak value**
|
||||
|
||||
#
|
||||
|
||||
### GLM-4-9B-Chat
|
||||
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|--------------|
|
||||
| BF16 | 19047MiB | 0.1554s | 27.8193 tokens/s | Input length is 1000 |
|
||||
| BF16 | 20629MiB | 0.8199s | 31.8613 tokens/s | Input length is 8000 |
|
||||
| BF16 | 27779MiB | 4.3554s | 14.4108 tokens/s | Input length is 32000 |
|
||||
| BF16 | 57379MiB | 38.1467s | 3.4205 tokens/s | Input length is 128000 |
|
||||
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | Input length is 200000 |
|
||||
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|
||||
|-------|------------|------------|------------------|------------------------|
|
||||
| BF16 | 19047MiB | 0.1554s | 27.8193 tokens/s | Input length is 1000 |
|
||||
| BF16 | 20629MiB | 0.8199s | 31.8613 tokens/s | Input length is 8000 |
|
||||
| BF16 | 27779MiB | 4.3554s | 14.4108 tokens/s | Input length is 32000 |
|
||||
| BF16 | 57379MiB | 38.1467s | 3.4205 tokens/s | Input length is 128000 |
|
||||
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|-----------------------|
|
||||
| Int4 | 8251MiB | 0.1667s | 23.3903 tokens/s | Input length is 1000 |
|
||||
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | Input length is 8000 |
|
||||
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | Input length is 32000 |
|
||||
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|
||||
|-------|------------|------------|------------------|-----------------------|
|
||||
| Int4 | 8251MiB | 0.1667s | 23.3903 tokens/s | Input length is 1000 |
|
||||
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | Input length is 8000 |
|
||||
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | Input length is 32000 |
|
||||
|
||||
### GLM-4-9B-Chat-1M
|
||||
|
||||
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|
||||
|-------|------------|------------|------------------|--------------|
|
||||
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | 输入长度为 200000 |
|
||||
|
||||
If your input exceeds 200K, we recommend that you use the VLLM backend with multi gpus for inference to get better performance.
|
||||
|
||||
#### GLM-4V-9B
|
||||
|
||||
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|
||||
|------|------------|-----------------|------------------|------------|
|
||||
| BF16 | 28131MiB | 0.1016s | 33.4660 tokens/s | Input length is 1000 |
|
||||
| BF16 | 33043MiB | 0.7935a | 39.2444 tokens/s | Input length is 8000 |
|
||||
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|
||||
|-------|------------|------------|------------------|----------------------|
|
||||
| BF16 | 28131MiB | 0.1016s | 33.4660 tokens/s | Input length is 1000 |
|
||||
| BF16 | 33043MiB | 0.7935a | 39.2444 tokens/s | Input length is 8000 |
|
||||
|
||||
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|
||||
|-------|----------|-----------------|------------------|------------|
|
||||
| Int4 | 10267MiB | 0.1685a | 28.7101 tokens/s | Input length is 1000 |
|
||||
| Int4 | 14105MiB | 0.8629s | 24.2370 tokens/s | Input length is 8000 |
|
||||
| Dtype | GPU Memory | Prefilling | Decode Speed | Remarks |
|
||||
|-------|------------|------------|------------------|----------------------|
|
||||
| Int4 | 10267MiB | 0.1685a | 28.7101 tokens/s | Input length is 1000 |
|
||||
| Int4 | 14105MiB | 0.8629s | 24.2370 tokens/s | Input length is 8000 |
|
||||
|
||||
### Minimum hardware requirements
|
||||
|
||||
|
|
Loading…
Reference in New Issue