fix readme error

This commit is contained in:
zR 2024-06-06 10:00:11 +08:00
parent 3fa8a482c5
commit 3f26ccc208
9 changed files with 74 additions and 62 deletions

View File

@ -24,7 +24,7 @@ GLM-4V-9B。**GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多
| Model | Type | Seq Length | Download | Online Demo |
|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| GLM-4-9B | Base | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b) | / |
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope VLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
| GLM-4-9B-Chat-1M | Chat | 1M | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | / |
| GLM-4V-9B | Chat | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b) | / |
@ -152,11 +152,6 @@ from vllm import LLM, SamplingParams
# GLM-4-9B-Chat-1M
# max_model_len, tp_size = 1048576, 4
# GLM-4-9B-Chat
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# 如果遇见 OOM 现象建议减少max_model_len或者增加tp_size
max_model_len, tp_size = 131072, 1
model_name = "THUDM/glm-4-9b-chat"
@ -223,12 +218,12 @@ with torch.no_grad():
如果你想更进一步了解 GLM-4-9B 系列开源模型,本开源仓库通过以下内容为开发者提供基础的 GLM-4-9B的使用和开发代码
+ [base](basic_demo/README.md): 在这里包含了
+ 使用 transformers 和 VLLM 后端的交互代码
+ 使用 transformers 和 vLLM 后端的交互代码
+ OpenAI API 后端交互代码
+ Batch 推理代码
+ [composite_demo](composite_demo/README.md): 在这里包含了
+ GLM-4-9B 以及 GLM-4V-9B 开源模型的完整功能演示代码,包含了 All Tools 能力、长文档解读和多模态能力的展示。
+ GLM-4-9B-Chat 以及 GLM-4V-9B 开源模型的完整功能演示代码,包含了 All Tools 能力、长文档解读和多模态能力的展示。
+ [fintune_demo](finetune_demo/README.md): 在这里包含了
+ PEFT (LORA, P-Tuning) 微调代码

View File

@ -28,7 +28,7 @@ GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
| Model | Type | Seq Length | Download | Online Demo |
|------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| GLM-4-9B | Base | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b) | / |
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope VLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
| GLM-4-9B-Chat | Chat | 128K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
| GLM-4-9B-Chat-1M | Chat | 1M | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m) | / |
| GLM-4V-9B | Chat | 8K | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b) | / |
@ -158,13 +158,8 @@ Use the vLLM backend for inference:
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# GLM-4-9B-Chat-1M
# max_model_len, tp_size = 1048576, 4
# GLM-4-9B-Chat
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# If you encounter OOM, you can try to reduce max_model_len or increase tp_size
max_model_len, tp_size = 131072, 1
model_name = "THUDM/glm-4-9b-chat"
@ -233,7 +228,7 @@ If you want to learn more about the GLM-4-9B series open source models, this ope
with basic GLM-4-9B usage and development code through the following content
+ [base](basic_demo/README.md): Contains
+ Interaction code using transformers and VLLM backend
+ Interaction code using transformers and vLLM backend
+ OpenAI API backend interaction code
+ Batch reasoning code

View File

@ -2,7 +2,7 @@
Read this in [English](README_en.md)
本 demo 中,你将体验到如何使用 glm-4-9b 开源模型进行基本的任务。
本 demo 中,你将体验到如何使用 GLM-4-9B 开源模型进行基本的任务。
请严格按照文档的步骤进行操作,以避免不必要的错误。
@ -45,7 +45,7 @@ Read this in [English](README_en.md)
|------|----------|-----------------|------------------|--------------|
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | 输入长度为 200000 |
如果您的输入超过200K我们建议您使用VLLM后端进行多卡推理以获得更好的性能。
如果您的输入超过200K我们建议您使用vLLM后端进行多卡推理以获得更好的性能。
#### GLM-4V-9B
@ -83,13 +83,15 @@ pip install -r requirements.txt
### 使用 transformers 后端代码
+ 使用 命令行 与 glm-4-9b 模型进行对话。
+ 使用命令行与 GLM-4-9B 模型进行对话。
```shell
python trans_cli_demo.py
python trans_cli_demo.py # GLM-4-9B-Chat
python trans_cli_vision_demo.py # GLM-4V-9B
```
+ 使用 Gradio 网页端与 glm-4-9b 模型进行对话。
+ 使用 Gradio 网页端与 GLM-4-9B-Chat 模型进行对话。
```shell
python trans_web_demo.py
@ -101,15 +103,15 @@ python trans_web_demo.py
python cli_batch_request_demo.py
```
### 使用 VLLM 后端代码
### 使用 vLLM 后端代码
+ 使用命令行与 glm-4-9b 模型进行对话。
+ 使用命令行与 GLM-4-9B-Chat 模型进行对话。
```shell
python vllm_cli_demo.py
```
+ 自行构建服务端,并使用 `OpenAI API` 的请求格式与 glm-4-9b 模型进行对话。本 demo 支持 Function Call 和 All Tools功能。
+ 自行构建服务端,并使用 `OpenAI API` 的请求格式与 GLM-4-9B-Chat 模型进行对话。本 demo 支持 Function Call 和 All Tools功能。
启动服务端:

View File

@ -1,6 +1,6 @@
# Basic Demo
In this demo, you will experience how to use the glm-4-9b open source model to perform basic tasks.
In this demo, you will experience how to use the GLM-4-9B open source model to perform basic tasks.
Please follow the steps in the document strictly to avoid unnecessary errors.
@ -47,7 +47,7 @@ The stress test data of relevant inference are as follows:
|-------|------------|------------|------------------|--------------|
| BF16 | 74497MiB | 98.4930s | 2.3653 tokens/s | 输入长度为 200000 |
If your input exceeds 200K, we recommend that you use the VLLM backend with multi gpus for inference to get better performance.
If your input exceeds 200K, we recommend that you use the vLLM backend with multi gpus for inference to get better performance.
#### GLM-4V-9B
@ -87,13 +87,14 @@ pip install -r requirements.txt
### Use transformers backend code
+ Use the command line to communicate with the glm-4-9b model.
+ Use the command line to communicate with the GLM-4-9B model.
```shell
python trans_cli_demo.py
python trans_cli_demo.py # GLM-4-9B-Chat
python trans_cli_vision_demo.py # GLM-4V-9B
```
+ Use the Gradio web client to communicate with the glm-4-9b model.
+ Use the Gradio web client to communicate with the GLM-4-9B-Chat model.
```shell
python trans_web_demo.py
@ -105,9 +106,9 @@ python trans_web_demo.py
python cli_batch_request_demo.py
```
### Use VLLM backend code
### Use vLLM backend code
+ Use the command line to communicate with the glm-4-9b model.
+ Use the command line to communicate with the GLM-4-9B-Chat model.
```shell
python vllm_cli_demo.py

View File

@ -17,7 +17,7 @@ from transformers import AutoTokenizer, LogitsProcessor
from sse_starlette.sse import EventSourceResponse
EventSourceResponse.DEFAULT_PING_INTERVAL = 1000
MODEL_PATH = 'THUDM/glm-4-9b'
MODEL_PATH = 'THUDM/glm-4-9b-chat'
MAX_MODEL_LENGTH = 8192

View File

@ -30,7 +30,7 @@ from transformers import (
ModelType = Union[PreTrainedModel, PeftModelForCausalLM]
TokenizerType = Union[PreTrainedTokenizer, PreTrainedTokenizerFast]
MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/glm-4-9b')
MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/glm-4-9b-chat')
def load_model_and_tokenizer(

View File

@ -79,7 +79,7 @@ def process_response(output, history):
return content, history
# glm-4v-9b is not available in VLLM backend, use HFClient instead.
# glm-4v-9b is not available in vLLM backend, use HFClient instead.
@st.cache_resource(max_entries=1, show_spinner="Loading model...")
def get_client(model_path, typ: ClientType) -> Client:
match typ:

View File

@ -2,7 +2,7 @@
Read this in [English](README_en.md)
本 demo 中,你将体验到如何微调 glm-4-9b 对话开源模型(不支持视觉理解模型)。 请严格按照文档的步骤进行操作,以避免不必要的错误。
本 demo 中,你将体验到如何微调 GLM-4-9B-Chat 对话开源模型(不支持视觉理解模型)。 请严格按照文档的步骤进行操作,以避免不必要的错误。
## 硬件检查

View File

@ -1,10 +1,13 @@
# GLM-4-9B Chat dialogue model fine-tuning
In this demo, you will experience how to fine-tune the glm-4-9b dialogue open source model (visual understanding model is not supported). Please strictly follow the steps in the document to avoid unnecessary errors.
In this demo, you will experience how to fine-tune the GLM-4-9B-Chat open source model (visual understanding model is
not supported). Please strictly follow the steps in the document to avoid unnecessary errors.
## Hardware check
**The data in this document are tested in the following hardware environment. The actual operating environment requirements and the video memory occupied by the operation are slightly different. Please refer to the actual operating environment. **
**The data in this document are tested in the following hardware environment. The actual operating environment
requirements and the video memory occupied by the operation are slightly different. Please refer to the actual operating
environment. **
Test hardware information:
+ OS: Ubuntu 22.04
@ -14,13 +17,14 @@ Test hardware information:
+ GPU Driver: 535.104.05
+ GPU: NVIDIA A100-SXM4-80GB * 8
| Fine-tuning solution | Video memory usage | Weight save point size |
|--------------------|-----------------------------------|---------|
| lora (PEFT) | 21531MiB | 17M |
| p-tuning v2 (PEFT) | 21381MiB | 121M |
| SFT (Zero3 method) | 80935MiB<br/>(Each GPU, 8 GPUs are required) | 20G |
| Fine-tuning solution | Video memory usage | Weight save point size |
|----------------------|----------------------------------------------|------------------------|
| lora (PEFT) | 21531MiB | 17M |
| p-tuning v2 (PEFT) | 21381MiB | 121M |
| SFT (Zero3 method) | 80935MiB<br/>(Each GPU, 8 GPUs are required) | 20G |
Before starting fine-tuning, please install the dependencies in `basic_demo` first. You also need to install the dependencies in this directory:
Before starting fine-tuning, please install the dependencies in `basic_demo` first. You also need to install the
dependencies in this directory:
```bash
pip install -r requirements.txt
@ -28,7 +32,8 @@ pip install -r requirements.txt
## Multi-round dialogue format
The multi-round dialogue fine-tuning example uses the GLM-4 dialogue format convention, adding different `loss_mask` to different roles to calculate `loss` for multiple rounds of replies in one calculation.
The multi-round dialogue fine-tuning example uses the GLM-4 dialogue format convention, adding different `loss_mask` to
different roles to calculate `loss` for multiple rounds of replies in one calculation.
For data files, the sample uses the following format:
@ -100,8 +105,11 @@ This is a sample with tools:
{"messages": [{"role": "system", "content": "", "tools": [{"type": "function", "function": {"name": "get_recommended_books", "description": "Get recommended books based on user's interests", "parameters": {"type": "object", "properties": {"interests": {"type": "array", "items": {"type": "string"}, "description": "The interests to recommend books for"}}, "required": ["interests"]}}}]}, {"role": "user", "content": "Hi, I am looking for some book recommendations. I am interested in history and science fiction."}, {"role": "assistant", "content": "{\"name\": \"get_recommended_books\", \"arguments\": {\"interests\": [\"history\", \"science fiction\"]}}"}, {"role": "observation", "content": "{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"}, {"role": "assistant", "content": "Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."}]}
```
- The `system` role is optional, but if it exists, it must appear before the `user` role, and a complete conversation data (whether single-round or multi-round conversation) can only have one `system` role.
- The `tools` field is optional. If it exists, it must appear after the `system` role, and a complete conversation data (whether single-round or multi-round conversation) can only have one `tools` field. When the `tools` field exists, the `system` role must exist and the `content` field is empty.
- The `system` role is optional, but if it exists, it must appear before the `user` role, and a complete conversation
data (whether single-round or multi-round conversation) can only have one `system` role.
- The `tools` field is optional. If it exists, it must appear after the `system` role, and a complete conversation
data (whether single-round or multi-round conversation) can only have one `tools` field. When the `tools` field
exists, the `system` role must exist and the `content` field is empty.
## Configuration file
@ -110,7 +118,9 @@ The fine-tuning configuration file is located in the `config` directory, includi
1. `ds_zereo_2 / ds_zereo_3.json`: deepspeed configuration file.
2. `lora.yaml / ptuning_v2
3. .yaml / sft.yaml`: Configuration files for different modes of models, including model parameters, optimizer parameters, training parameters, etc. Some important parameters are explained as follows:
3. .yaml / sft.yaml`: Configuration files for different modes of models, including model parameters, optimizer
parameters, training parameters, etc. Some important parameters are explained as follows:
+ data_config section
+ train_file: File path of training dataset.
+ val_file: File path of validation dataset.
@ -149,7 +159,8 @@ The fine-tuning configuration file is located in the `config` directory, includi
## Start fine-tuning
Execute **single machine multi-card/multi-machine multi-card** run through the following code, which uses `deepspeed` as the acceleration solution, and you need to install `deepspeed`.
Execute **single machine multi-card/multi-machine multi-card** run through the following code, which uses `deepspeed` as
the acceleration solution, and you need to install `deepspeed`.
```shell
OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=8 finetune_hf.py data/AdvertiseGen/ THUDM/glm-4-9b configs/lora.yaml
@ -163,7 +174,8 @@ python finetune_hf.py data/AdvertiseGen/ THUDM/glm-4-9b-chat configs/lora.yaml
## Fine-tune from a saved point
If you train as described above, each fine-tuning will start from the beginning. If you want to fine-tune from a half-trained model, you can add a fourth parameter, which can be passed in two ways:
If you train as described above, each fine-tuning will start from the beginning. If you want to fine-tune from a
half-trained model, you can add a fourth parameter, which can be passed in two ways:
1. `yes`, automatically start training from the last saved Checkpoint
@ -189,38 +201,45 @@ In this way, the answer you get is the fine-tuned answer.
### Use the fine-tuned model in other demos in this repository or external repositories
You can use our `LORA` and fully fine-tuned models in any demo. This requires you to modify the code yourself according to the following tutorial.
You can use our `LORA` and fully fine-tuned models in any demo. This requires you to modify the code yourself according
to the following tutorial.
1. Replace the way to read the model in the demo with the way to read the model in `finetune_demo/inference.py`.
> Please note that for LORA and P-TuningV2, we did not merge the trained models, but recorded the fine-tuned path in `adapter_config.json`
> If the location of your original model changes, you should modify the path of `base_model_name_or_path` in `adapter_config.json`.
> Please note that for LORA and P-TuningV2, we did not merge the trained models, but recorded the fine-tuned path
> in `adapter_config.json`
> If the location of your original model changes, you should modify the path of `base_model_name_or_path`
> in `adapter_config.json`.
```python
def load_model_and_tokenizer(
model_dir: Union[str, Path], trust_remote_code: bool = True
model_dir: Union[str, Path], trust_remote_code: bool = True
) -> tuple[ModelType, TokenizerType]:
model_dir = _resolve_path(model_dir)
model_dir = _resolve_path(model_dir)
if (model_dir / 'adapter_config.json').exists():
model = AutoPeftModelForCausalLM.from_pretrained(
model_dir, trust_remote_code=trust_remote_code, device_map='auto'
)
model = AutoPeftModelForCausalLM.from_pretrained(
model_dir, trust_remote_code=trust_remote_code, device_map='auto'
)
tokenizer_dir = model.peft_config['default'].base_model_name_or_path
else:
model = AutoModelForCausalLM.from_pretrained(
model_dir, trust_remote_code=trust_remote_code, device_map='auto'
model_dir, trust_remote_code=trust_remote_code, device_map='auto'
)
tokenizer_dir = model_dir
tokenizer = AutoTokenizer.from_pretrained(
tokenizer_dir, trust_remote_code=trust_remote_code
tokenizer_dir, trust_remote_code=trust_remote_code
)
return model, tokenizer
```
2. Read the fine-tuned model. Please note that you should use the location of the fine-tuned model. For example, if your model location is `/path/to/finetune_adapter_model`
and the original model address is `path/to/base_model`, you should use `/path/to/finetune_adapter_model` as `model_dir`.
3. After completing the above operations, you can use the fine-tuned model normally. Other calling methods remain unchanged.
2. Read the fine-tuned model. Please note that you should use the location of the fine-tuned model. For example, if your
model location is `/path/to/finetune_adapter_model`
and the original model address is `path/to/base_model`, you should use `/path/to/finetune_adapter_model`
as `model_dir`.
3. After completing the above operations, you can use the fine-tuned model normally. Other calling methods remain
unchanged.
## Reference