add glm-4v-9b stress test
This commit is contained in:
parent
44a55818ef
commit
12dd318013
|
@ -24,6 +24,8 @@ Read this in [English](README_en.md)
|
|||
|
||||
**所有测试均在单张GPU上进行测试,所有显存消耗都按照峰值左右进行测算**
|
||||
|
||||
#### GLM-4-9B-Chat
|
||||
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|--------------|
|
||||
| BF16 | 19047MiB | 0.1554s | 27.8193 tokens/s | 输入长度为 1000 |
|
||||
|
@ -38,6 +40,18 @@ Read this in [English](README_en.md)
|
|||
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | 输入长度为 8000 |
|
||||
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | 输入长度为 32000 |
|
||||
|
||||
#### GLM-4V-9B
|
||||
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|------------|
|
||||
| BF16 | 28131MiB | 0.1016s | 33.4660 tokens/s | 输入长度为 1000 |
|
||||
| BF16 | 33043MiB | 0.7935a | 39.2444 tokens/s | 输入长度为 8000 |
|
||||
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|------------|
|
||||
| Int4 | 10267MiB | 0.1685a | 28.7101 tokens/s | 输入长度为 1000 |
|
||||
| Int4 | 14105MiB | 0.8629s | 40.7134 tokens/s | 输入长度为 8000 |
|
||||
|
||||
### 最低硬件要求
|
||||
|
||||
如果您希望运行官方提供的最基础代码 (transformers 后端) 您需要:
|
||||
|
|
|
@ -23,20 +23,34 @@ Test hardware information:
|
|||
The stress test data of relevant inference are as follows:
|
||||
|
||||
**All tests are performed on a single GPU, and all video memory consumption is calculated based on the peak value**
|
||||
#
|
||||
### GLM-4-9B-Chat
|
||||
|
||||
| Accuracy | Video memory usage | Prefilling / First ring | Decode Speed | Remarks |
|
||||
|----------|--------------------|-------------------------|------------------|------------------------|
|
||||
| BF16 | 19047MiB | 0.1554s | 27.8193 tokens/s | Input length is 1000 |
|
||||
| BF16 | 20629MiB | 0.8199s | 31.8613 tokens/s | Input length is 8000 |
|
||||
| BF16 | 27779MiB | 4.3554s | 14.4108 tokens/s | Input length is 32000 |
|
||||
| BF16 | 57379MiB | 38.1467s | 3.4205 tokens/s | Input length is 128000 |
|
||||
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | Input length is 200000 |
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|--------------|
|
||||
| BF16 | 19047MiB | 0.1554s | 27.8193 tokens/s | Input length is 1000 |
|
||||
| BF16 | 20629MiB | 0.8199s | 31.8613 tokens/s | Input length is 8000 |
|
||||
| BF16 | 27779MiB | 4.3554s | 14.4108 tokens/s | Input length is 32000 |
|
||||
| BF16 | 57379MiB | 38.1467s | 3.4205 tokens/s | Input length is 128000 |
|
||||
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | Input length is 200000 |
|
||||
|
||||
| Precision | Video Memory | Prefilling / First Sound | Decode Speed | Remarks |
|
||||
|-----------|--------------|--------------------------|------------------|-----------------------|
|
||||
| Int4 | 8251MiB | 0.1667s | 23.3903 tokens/s | Input length is 1000 |
|
||||
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | Input length is 8000 |
|
||||
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | Input length is 32000 |
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|-----------------------|
|
||||
| Int4 | 8251MiB | 0.1667s | 23.3903 tokens/s | Input length is 1000 |
|
||||
| Int4 | 9613MiB | 0.8629s | 23.4248 tokens/s | Input length is 8000 |
|
||||
| Int4 | 16065MiB | 4.3906s | 14.6553 tokens/s | Input length is 32000 |
|
||||
|
||||
#### GLM-4V-9B
|
||||
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|------------|
|
||||
| BF16 | 28131MiB | 0.1016s | 33.4660 tokens/s | Input length is 1000 |
|
||||
| BF16 | 33043MiB | 0.7935a | 39.2444 tokens/s | Input length is 8000 |
|
||||
|
||||
| 精度 | 显存占用 | Prefilling / 首响 | Decode Speed | Remarks |
|
||||
|------|----------|-----------------|------------------|------------|
|
||||
| Int4 | 10267MiB | 0.1685a | 28.7101 tokens/s | Input length is 1000 |
|
||||
| Int4 | 14105MiB | 0.8629s | 40.7134 tokens/s | Input length is 8000 |
|
||||
|
||||
### Minimum hardware requirements
|
||||
|
||||
|
|
Loading…
Reference in New Issue