Update README_en.md
This commit is contained in:
parent
7422d118e8
commit
cb038cd2d3
|
@ -117,6 +117,13 @@ python trans_batch_demo.py
|
|||
python vllm_cli_demo.py
|
||||
```
|
||||
|
||||
+ use LoRA adapters with vLLM on GLM-4-9B-Chat model.
|
||||
|
||||
```python
|
||||
# vllm_cli_demo.py
|
||||
# add LORA_PATH = ''
|
||||
```
|
||||
|
||||
+ Build the server by yourself and use the request format of `OpenAI API` to communicate with the glm-4-9b model. This
|
||||
demo supports Function Call and All Tools functions.
|
||||
|
||||
|
@ -132,17 +139,10 @@ Client request:
|
|||
python openai_api_request.py
|
||||
```
|
||||
|
||||
### LoRA adapters with vLLM
|
||||
+ use LoRA adapters with vLLM on GLM-4-9B-Chat model.
|
||||
|
||||
```shell
|
||||
python vllm_cli_lora_demo.py
|
||||
```
|
||||
|
||||
## Stress test
|
||||
|
||||
Users can use this code to test the generation speed of the model on the transformers backend on their own devices:
|
||||
|
||||
```shell
|
||||
python trans_stress_test.py
|
||||
```
|
||||
```
|
||||
|
|
Loading…
Reference in New Issue