add glm-4v-9b stress test

This commit is contained in:
zR 2024-06-05 12:55:41 +08:00
parent 44a55818ef
commit 12dd318013
2 changed files with 40 additions and 12 deletions

View File

@ -24,6 +24,8 @@ Read this in [English](README_en.md)
**所有测试均在单张GPU上进行测试,所有显存消耗都按照峰值左右进行测算**
#### GLM-4-9B-Chat
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|--------------|
| BF16 | 19047MiB | 0.1554s | 27.8193 tokens/s | 输入长度为 1000 |
@ -38,6 +40,18 @@ Read this in [English](README_en.md)
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | 输入长度为 8000 |
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | 输入长度为 32000 |
#### GLM-4V-9B
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|------------|
| BF16 | 28131MiB | 0.1016s | 33.4660 tokens/s | 输入长度为 1000 |
| BF16 | 33043MiB | 0.7935a | 39.2444 tokens/s | 输入长度为 8000 |
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|------------|
| Int4 | 10267MiB | 0.1685a | 28.7101 tokens/s | 输入长度为 1000 |
| Int4 | 14105MiB | 0.8629s | 40.7134 tokens/s | 输入长度为 8000 |
### 最低硬件要求
如果您希望运行官方提供的最基础代码 (transformers 后端) 您需要:

View File

@ -23,20 +23,34 @@ Test hardware information:
The stress test data of relevant inference are as follows:
**All tests are performed on a single GPU, and all video memory consumption is calculated based on the peak value**
#
### GLM-4-9B-Chat
| Accuracy | Video memory usage | Prefilling / First ring | Decode Speed | Remarks |
|----------|--------------------|-------------------------|------------------|------------------------|
| BF16 | 19047MiB | 0.1554s | 27.8193 tokens/s | Input length is 1000 |
| BF16 | 20629MiB | 0.8199s | 31.8613 tokens/s | Input length is 8000 |
| BF16 | 27779MiB | 4.3554s | 14.4108 tokens/s | Input length is 32000 |
| BF16 | 57379MiB | 38.1467s | 3.4205 tokens/s | Input length is 128000 |
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | Input length is 200000 |
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|--------------|
| BF16 | 19047MiB | 0.1554s | 27.8193 tokens/s | Input length is 1000 |
| BF16 | 20629MiB | 0.8199s | 31.8613 tokens/s | Input length is 8000 |
| BF16 | 27779MiB | 4.3554s | 14.4108 tokens/s | Input length is 32000 |
| BF16 | 57379MiB | 38.1467s | 3.4205 tokens/s | Input length is 128000 |
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | Input length is 200000 |
| Precision | Video Memory | Prefilling / First Sound | Decode Speed | Remarks |
|-----------|--------------|--------------------------|------------------|-----------------------|
| Int4 | 8251MiB | 0.1667s | 23.3903 tokens/s | Input length is 1000 |
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | Input length is 8000 |
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | Input length is 32000 |
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|-----------------------|
| Int4 | 8251MiB | 0.1667s | 23.3903 tokens/s | Input length is 1000 |
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | Input length is 8000 |
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | Input length is 32000 |
#### GLM-4V-9B
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|------------|
| BF16 | 28131MiB | 0.1016s | 33.4660 tokens/s | Input length is 1000 |
| BF16 | 33043MiB | 0.7935a | 39.2444 tokens/s | Input length is 8000 |
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|------|----------|-----------------|------------------|------------|
| Int4 | 10267MiB | 0.1685a | 28.7101 tokens/s | Input length is 1000 |
| Int4 | 14105MiB | 0.8629s | 40.7134 tokens/s | Input length is 8000 |
### Minimum hardware requirements