glm4/finetune_demo/README_en.md

# GLM-4-9B Chat dialogue model fine-tuning

In this demo, you will experience how to fine-tune the GLM-4-9B-Chat open source model (visual understanding model is
not supported). Please strictly follow the steps in the document to avoid unnecessary errors.

## Hardware check

**The data in this document are tested in the following hardware environment. The actual operating environment
requirements and the video memory occupied by the operation are slightly different. Please refer to the actual operating
environment. The fine-tuned resource usage is set according to the configuration file in the
configs folder**

Test hardware information:

+ OS: Ubuntu 22.04
+ Memory: 512GB
+ Python:  Python: 3.10.12 / 3.12.3 (Currently, you need to install nltk from the git source code if you use Python
  3.12.3)
+ CUDA Version: 12.3
+ GPU Driver: 535.104.05
+ GPU: NVIDIA A100-SXM4-80GB * 8

| Fine-tuning Model | Fine-tuning solution               | GPU memory usage            | Weight save point size |
|-------------------|------------------------------------|-----------------------------|------------------------|
| GLM-4-9B-Chat     | lora (PEFT)                        | 22G                         | 17M                    |
| GLM-4-9B-Chat     | p-tuning v2 (PEFT)                 | 21G                         | 121M                   |
| GLM-4-9B-Chat     | SFT (Zero3 method)                 | 80G (Each GPU, Need 8 GPUs) | 20G                    |
| GLM-4V-9B         | lora (PEFT), Include EVA2CLIPModel | 75G                         | 37M                    |
| GLM-4V-9B         | SFT                                | Not Support in this Code    | 28G                    |

**GLM-4V-9B fine-tuning cannot work properly with deepspeed, the official fine-tuning script only does the most basic
fine-tuning solution, more optimizations require developers to explore on their own**

Before starting fine-tuning, please install the dependencies in `basic_demo` and clone the latest model repos (Hugging
Face) first. You also need to install the dependencies in this directory:

```bash
pip install -r requirements.txt
```

> NOTE: Some codes in NLTK 3.8.1 might not yet be compatible with Python 3.12. For adaptation methods in such cases,
> please refer to [issues #38](https://github.com/THUDM/GLM-4/issues/38).

## Multi-round dialogue format

The multi-round dialogue fine-tuning example uses the GLM-4 dialogue format convention, adding different `loss_mask` to
different roles to calculate `loss` for multiple rounds of replies in one calculation.

For data files, the sample uses the following format:

```json
[
  {
    "messages": [
      {
        "role": "system",
        "content": "<system prompt text>",
        "tools": [
          {
            "name": "<tool name>",
            "args": {
              "<arg name>": "<arg value>"
            }
          }
          // Add more tools if needed
        ]
      },
      {
        "role": "user",
        "content": "<user prompt text>"
      },
      {
        "role": "assistant",
        "content": "<assistant response text>"
      },
      // If Tool Using
      {
        "role": "user",
        "content": "<user prompt text>"
      },
      {
        "role": "assistant",
        "content": "<assistant response text>"
      },
      {
        "role": "observation",
        "content": "<observation prompt text>"
      },
      {
        "role": "assistant",
        "content": "<assistant response observation>"
      },
      // Multi_turns
      {
        "role": "user",
        "content": "<user prompt text>"
      },
      {
        "role": "assistant",
        "content": "<assistant response text>"
      }
    ]
  }
]
```

This is a sample without tools:

```json
{
  "messages": [
    {
      "role": "user",
      "content": "类型#裤*材质#牛仔布*风格#性感"
    },
    {
      "role": "assistant",
      "content": "3x1的这款牛仔裤采用浅白的牛仔面料为裤身材质，其柔然的手感和细腻的质地，在穿着舒适的同时，透露着清纯甜美的个性气质。除此之外，流畅的裤身剪裁将性感的腿部曲线彰显的淋漓尽致，不失为一款随性出街的必备单品。"
    }
  ]
}
```

This is a sample with tools:

```json
{
  "messages": [
    {
      "role": "system",
      "content": "",
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "get_recommended_books",
            "description": "Get recommended books based on user's interests",
            "parameters": {
              "type": "object",
              "properties": {
                "interests": {
                  "type": "array",
                  "items": {
                    "type": "string"
                  },
                  "description": "The interests to recommend books for"
                }
              },
              "required": [
                "interests"
              ]
            }
          }
        }
      ]
    },
    {
      "role": "user",
      "content": "Hi, I am looking for some book recommendations. I am interested in history and science fiction."
    },
    {
      "role": "assistant",
      "content": "{\"name\": \"get_recommended_books\", \"arguments\": {\"interests\": [\"history\", \"science fiction\"]}}"
    },
    {
      "role": "observation",
      "content": "{\"books\": [\"Sapiens: A Brief History of Humankind by Yuval Noah Harari\", \"A Brief History of Time by Stephen Hawking\", \"Dune by Frank Herbert\", \"The Martian by Andy Weir\"]}"
    },
    {
      "role": "assistant",
      "content": "Based on your interests in history and science fiction, I would recommend the following books: \"Sapiens: A Brief History of Humankind\" by Yuval Noah Harari, \"A Brief History of Time\" by Stephen Hawking, \"Dune\" by Frank Herbert, and \"The Martian\" by Andy Weir."
    }
  ]
}
```

This is a sample with VQA Task:

```json
{
  "messages": [
    {
      "role": "user",
      "content": "图片中的动物是什么？",
      "image": "/root/images/0001.jpg"
    },
    {
      "role": "assistant",
      "content": "图片中有一只猫。"
    },
    {
      "role": "user",
      "content": "图片中的猫在做什么？"
    },
    {
      "role": "assistant",
      "content": "这只猫坐在或站在桌子上，桌上有很多食物。"
    }
  ]
}
```

- The `system` role is optional, but if it exists, it must appear before the `user` role, and the `system` role can only
  appear once in a complete conversation (whether it is a single round or a multi-round conversation).
- The `tools` field is optional, but if it exists, it must appear after the `system` role, and the `tools` field can
  only appear once in a complete conversation (whether it is a single round or a multi-round conversation). When
  the `tools` field exists, the `system` role must exist and the `content` field is empty.
- `GLM-4V-9B` does not support the `tools` field and the `system` field. And `image` must be placed in the first
  message. The `image` field needs to contain the `absolute path` of the image.

## Configuration file

The fine-tuning configuration file is located in the `config` directory, including the following files:

1. `ds_zereo_2 / ds_zereo_3.json`: deepspeed configuration file.

2. `lora.yaml / ptuning_v2
3. .yaml / sft.yaml`: Configuration files for different modes of models, including model parameters, optimizer
   parameters, training parameters, etc. Some important parameters are explained as follows: + data_config section

+ train_file: File path of training dataset.
+ val_file: File path of validation dataset.
+ test_file: File path of test dataset.
+ num_proc: Number of processes to use when loading data.
+ max_input_length: Maximum length of input sequence.
+ max_output_length: Maximum length of output sequence.
+ training_args section
+ output_dir: Directory for saving model and other outputs.
+ max_steps: Maximum number of training steps.
+ per_device_train_batch_size: Training batch size per device (such as GPU).
+ dataloader_num_workers: Number of worker threads to use when loading data.
+ remove_unused_columns: Whether to remove unused columns in data.
+ save_strategy: Model saving strategy (for example, how many steps to save).
+ save_steps: How many steps to save the model.
+ log_level: Log level (such as info).
+ logging_strategy: logging strategy.
+ logging_steps: how many steps to log at.
+ per_device_eval_batch_size: per-device evaluation batch size.
+ evaluation_strategy: evaluation strategy (e.g. how many steps to evaluate at).
+ eval_steps: how many steps to evaluate at.
+ predict_with_generate: whether to use generation mode for prediction.
+ generation_config section
+ max_new_tokens: maximum number of new tokens to generate.
+ peft_config section
+ peft_type: type of parameter tuning to use (supports LORA and PREFIX_TUNING).
+ task_type: task type, here is causal language model (don't change).
+ Lora parameters:
+ r: rank of LoRA.
+ lora_alpha: scaling factor of LoRA.
+ lora_dropout: dropout probability to use in LoRA layer.
+ P-TuningV2 parameters: + num_virtual_tokens: the number of virtual tokens.
+ num_attention_heads: 2: the number of attention heads of P-TuningV2 (do not change).
+ token_dim: 256: the token dimension of P-TuningV2 (do not change).

## Start fine-tuning

Execute **single machine multi-card/multi-machine multi-card** run through the following code, which uses `deepspeed` as
the acceleration solution, and you need to install `deepspeed`.

```shell
OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=8  finetune.py  data/AdvertiseGen/  THUDM/glm-4-9b-chat  configs/lora.yaml # For Chat Fine-tune
OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=8  finetune_vision.py  data/CogVLM-311K/  THUDM/glm-4v-9b  configs/lora.yaml  # For VQA Fine-tune
```

Execute **single machine single card** run through the following code.

```shell
python finetune.py  data/AdvertiseGen/  THUDM/glm-4-9b-chat  configs/lora.yaml # For Chat Fine-tune
python finetune_vision.py  data/CogVLM-311K/  THUDM/glm-4v-9b configs/lora.yaml # For VQA Fine-tune
```

## Fine-tune from a saved point

If you train as described above, each fine-tuning will start from the beginning. If you want to fine-tune from a
half-trained model, you can add a fourth parameter, which can be passed in two ways:

1. `yes`, automatically start training from the last saved Checkpoint

2. `XX`, breakpoint number, for example `600`, start training from Checkpoint 600

For example, this is an example code to continue fine-tuning from the last saved point

```shell
python finetune.py data/AdvertiseGen/ THUDM/glm-4-9b-chat configs/lora.yaml yes
```

## Use the fine-tuned model

### Verify the fine-tuned model in inference.py

You can Use our fine-tuned model in `finetune_demo/inference.py`, and you can easily test it with just one line of code.

```shell
python inference.py your_finetune_path
```

In this way, the answer you get is the fine-tuned answer.

### Use the fine-tuned model in other demos in this repository or external repositories

You can use our `LORA` and fully fine-tuned models in any demo. This requires you to modify the code yourself according
to the following tutorial.

1. Replace the way to read the model in the demo with the way to read the model in `finetune_demo/inference.py`.

> Please note that for LORA and P-TuningV2, we did not merge the trained models, but recorded the fine-tuned path
> in `adapter_config.json`
> If the location of your original model changes, you should modify the path of `base_model_name_or_path`
> in `adapter_config.json`.

```python
def load_model_and_tokenizer(
        model_dir: Union[str, Path], trust_remote_code: bool = True
) -> tuple[ModelType, TokenizerType]:
    model_dir = _resolve_path(model_dir)


if (model_dir / 'adapter_config.json').exists():
    model = AutoPeftModelForCausalLM.from_pretrained(
        model_dir, trust_remote_code=trust_remote_code, device_map='auto'
    )
tokenizer_dir = model.peft_config['default'].base_model_name_or_path
else:
model = AutoModelForCausalLM.from_pretrained(
    model_dir, trust_remote_code=trust_remote_code, device_map='auto'
)
tokenizer_dir = model_dir
tokenizer = AutoTokenizer.from_pretrained(
    tokenizer_dir, trust_remote_code=trust_remote_code
)
return model, tokenizer
```

2. Read the fine-tuned model. Please note that you should use the location of the fine-tuned model. For example, if your
   model location is `/path/to/finetune_adapter_model`
   and the original model address is `path/to/base_model`, you should use `/path/to/finetune_adapter_model`
   as `model_dir`.
3. After completing the above operations, you can use the fine-tuned model normally. Other calling methods remain
   unchanged.
4. This fine-tuning script has not been tested on long texts of 128K or 1M tokens. Fine-tuning long texts requires GPU
   devices with larger memory and more efficient fine-tuning solutions, which developers need to handle on their own.

## Reference

```

@inproceedings{liu2022p,
title={P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks},
author={Liu, Xiao and Ji, Kaixuan and Fu, Yicheng and Tam, Weng and Du, Zhengxiao and Yang, Zhilin and Tang, Jie},
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short
Papers)},
pages={61--68},
year={2022}
}

@misc{tang2023toolalpaca,
title={ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases},
author={Qiaoyu Tang and Ziliang Deng and Hongyu Lin and Xianpei Han and Qiao Liang and Le Sun},
year={2023},
eprint={2306.05301},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

```