diff --git a/basic_demo/README_en.md b/basic_demo/README_en.md index 07086d1..412ee90 100644 --- a/basic_demo/README_en.md +++ b/basic_demo/README_en.md @@ -117,6 +117,13 @@ python trans_batch_demo.py python vllm_cli_demo.py ``` ++ use LoRA adapters with vLLM on GLM-4-9B-Chat model. + +```python +# vllm_cli_demo.py +# add LORA_PATH = '' +``` + + Build the server by yourself and use the request format of `OpenAI API` to communicate with the glm-4-9b model. This demo supports Function Call and All Tools functions. @@ -132,17 +139,10 @@ Client request: python openai_api_request.py ``` -### LoRA adapters with vLLM -+ use LoRA adapters with vLLM on GLM-4-9B-Chat model. - -```shell -python vllm_cli_lora_demo.py -``` - ## Stress test Users can use this code to test the generation speed of the model on the transformers backend on their own devices: ```shell python trans_stress_test.py -``` \ No newline at end of file +```