diff --git a/README.md b/README.md
index 9af3a60..7937d60 100644
--- a/README.md
+++ b/README.md
@@ -11,38 +11,16 @@ Read this in [English](README_en.md)
 
 ## 项目更新
 
-- 🔥🔥 **News**: ```2024/11/01```: 支持了 GLM-4-9B-Chat-hf 和 GLM-4v-9B 模型在 vLLM 0.6.3 以上版本和 transformers 4.46.0 以上版本运行
-- 🔥🔥 **News**: ```2024/10/25```: 我们开源了端到端中英语音对话模型 [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice)
-- 🔥 **News**: ```2024/10/12```: 增加了 GLM-4v-9B 模型对vllm框架的支持
-- 🔥 **News**: ```2024/09/06```: 增加了在 GLM-4v-9B 模型上构建OpenAI API兼容的服务端
-- 🔥 **News**: ```2024/09/05``` 我们开源了使LLMs能够在长上下文问答中生成细粒度引用的模型 [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b)
-  以及数据集 [LongCite-45k](https://huggingface.co/datasets/THUDM/LongCite-45k),
-  欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongCite) 在线体验。
-- 🔥**News**: ```2024/09/04```: 增加了在 GLM-4-9B-Chat 模型上使用带有 Lora adapter 的 vLLM 演示代码
-- 🔥**News**: ```2024/08/15```: 我们开源具备长文本输出能力(单轮对话大模型输出可超过1万token)
-  的模型 [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b)
-  以及数据集 [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k),
-  欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongWriter)
-  或 [魔搭社区空间](https://modelscope.cn/studios/ZhipuAI/LongWriter-glm4-9b-demo) 在线体验。
-- 🔥 **News**: ```2024/08/12```: GLM-4-9B-Chat 模型依赖的`transformers`版本升级到 `4.44.0`，请重新拉取除模型权重(
-  `*.safetensor` 文件 和 `tokenizer.model`)外的文件并参考 `basic_demo/requirements.txt` 严格更新依赖。
-- 🔥 **News**: ```2024/07/24```:
-  我们发布了与长文本相关的最新技术解读，关注 [这里](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85)
-  查看我们在训练 GLM-4-9B 开源模型中关于长文本技术的技术报告。
-- 🔥 **News**: ``2024/7/16``: GLM-4-9B-Chat 模型依赖的`transformers`版本升级到 `4.42.4`,
-  请更新模型配置文件并参考 `basic_demo/requirements.txt` 更新依赖。
-- 🔥 **News**: ``2024/7/9``: GLM-4-9B-Chat
-  模型已适配 [Ollama](https://github.com/ollama/ollama),[Llama.cpp](https://github.com/ggerganov/llama.cpp)
-  ，您可以在[PR](https://github.com/ggerganov/llama.cpp/pull/8031) 查看具体的细节。
-- 🔥 **News**: ``2024/7/1``: 我们更新了 GLM-4V-9B 的微调，您需要更新我们的模型仓库的运行文件和配置文件，
-  以支持这个功能，更多微调细节 (例如数据集格式，显存要求)，请前往 [查看](finetune_demo)。
-- 🔥 **News**: ``2024/6/28``: 我们与英特尔技术团队合作，改进了 GLM-4-9B-Chat 的 ITREX 和 OpenVINO 部署教程。您可以使用英特尔
-  CPU/GPU 设备高效部署 GLM-4-9B 开源模型。欢迎访问 [查看](intel_device_demo)。
-- 🔥 **News**: ``2024/6/24``: 我们更新了模型仓库的运行文件和配置文件，支持 Flash Attention 2,
-  请更新模型配置文件并参考 `basic_demo/trans_cli_demo.py` 中的示例代码。
-- 🔥 **News**: ``2024/6/19``: 我们更新了模型仓库的运行文件和配置文件，修复了部分已知的模型推理的问题，欢迎大家克隆最新的模型仓库。
-- 🔥 **News**: ``2024/6/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。
-- 🔥 **News**: ``2024/6/05``: 我们发布 GLM-4-9B 系列开源模型
+- 🔥🔥 **News**: ```2024/11/01```: 本仓库依赖进行升级，请更新`requirements.txt`中的依赖以保证正常运行模型。[glm-4-9b-chat-hf](https://huggingface.co/THUDM/glm-4-9b-chat-hf) 是适配 `transformers>=4.46` 的模型权重，使用 transforemrs 库中的 `GlmModel` 类实现。
+同时，[glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat), [glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b) 中的 `tokenzier_chatglm.py` 已经更新以适配最新版本的 `transforemrs`库。请前往 HuggingFace 更新文件。
+- 🔥 **News**: ```2024/10/27```: 我们开源了 [LongReward](https://github.com/THUDM/LongReward)，这是一个使用 AI 反馈改进长上下文大型语言模型。
+- 🔥 **News**: ```2024/10/25```: 我们开源了端到端中英语音对话模型 [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice)。
+- 🔥 **News**: ```2024/09/05``` 我们开源了使LLMs能够在长上下文问答中生成细粒度引用的模型 [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b) 以及数据集 [LongCite-45k](https://huggingface.co/datasets/THUDM/LongCite-45k), 欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongCite) 在线体验。
+- 🔥**News**: ```2024/08/15```: 我们开源具备长文本输出能力(单轮对话大模型输出可超过1万token) 的模型 [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b) 以及数据集 [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k),  欢迎在 [Huggingface Space](https://huggingface.co/spaces/THUDM/LongWriter) 或 [魔搭社区空间](https://modelscope.cn/studios/ZhipuAI/LongWriter-glm4-9b-demo) 在线体验。
+- 🔥 **News**: ```2024/07/24```: 我们发布了与长文本相关的最新技术解读，关注 [这里](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85) 查看我们在训练 GLM-4-9B 开源模型中关于长文本技术的技术报告。
+- 🔥 **News**: ``2024/07/09``: GLM-4-9B-Chat 模型已适配 [Ollama](https://github.com/ollama/ollama), [Llama.cpp](https://github.com/ggerganov/llama.cpp)，您可以在 [PR](https://github.com/ggerganov/llama.cpp/pull/8031) 查看具体的细节。
+- 🔥 **News**: ``2024/06/18``: 我们发布 [技术报告](https://arxiv.org/pdf/2406.12793), 欢迎查看。
+- 🔥 **News**: ``2024/06/05``: 我们发布 GLM-4-9B 系列开源模型。
 
 ## 模型介绍
 
diff --git a/README_en.md b/README_en.md
index 9f4cd42..a9b2e59 100644
--- a/README_en.md
+++ b/README_en.md
@@ -8,47 +8,36 @@
 </p>
 
 ## Update
-- 🔥🔥 **News**: ```2024/11/01```: Support for GLM-4-9B-Chat-hf and GLM-4v-9B models on vLLM >= 0.6.3 and transformers >= 4.46.0
-- 🔥🔥 **News**: ```2024/10/25```: We have open-sourced the end-to-end Chinese-English voice dialogue model [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice).
-- 🔥 **News**: ```2024/10/12```: Add GLM-4v-9B model support for vllm framework.
-- 🔥 **News**: ```2024/09/06```: Add support for OpenAI API server on the GLM-4v-9B model.
-- 🔥 **News**: ```2024/09/05```: We open-sourced a model enabling LLMs to generate fine-grained citations in
-  long-context Q&A: [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b), along with the
-  dataset [LongCite-14k](https://huggingface.co/datasets/THUDM/LongCite-45k). You are welcome to experience it online
+
+- 🔥🔥 **News**: ```2024/11/01```: Dependencies have been updated in this repository. Please update the dependencies in
+  `requirements.txt` to ensure the model runs correctly. The model weights
+  for [glm-4-9b-chat-hf](https://huggingface.co/THUDM/glm-4-9b-chat-hf) are compatible with `transformers>=4.46` and can
+  be implemented using the `GlmModel` class in the transformers library. Additionally, `tokenizer_chatglm.py`
+  in [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) and [glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)
+  has been updated for the latest version of `transformers`. Please update the files on HuggingFace.
+- 🔥 **News**: ```2024/10/27```: We have open-sourced [LongReward](https://github.com/THUDM/LongReward), a model that
+  uses AI feedback to enhance long-context large language models.
+- 🔥 **News**: ```2024/10/25```: We have open-sourced the end-to-end Mandarin-English voice dialogue
+  model [GLM-4-Voice](https://github.com/THUDM/GLM-4-Voice).
+- 🔥 **News**: ```2024/09/05```: We have open-sourced [longcite-glm4-9b](https://huggingface.co/THUDM/LongCite-glm4-9b),
+  a model enabling LLMs to produce fine-grained citations in long-context Q&A, along with the
+  dataset [LongCite-45k](https://huggingface.co/datasets/THUDM/LongCite-45k). Try it out online
   at [Huggingface Space](https://huggingface.co/spaces/THUDM/LongCite).
-- 🔥 **News**: ```2024/09/04```: Add demo code for using vLLM with LoRA adapter on the GLM-4-9B-Chat model.
-- 🔥 **News**: ```2024/08/15```: We have open-sourced a model with long-text output capability (single turn LLM output
-  can exceed
-  10K tokens) [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b) and the
-  dataset [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k). You're welcome
-  to [try it online](https://huggingface.co/spaces/THUDM/LongWriter).
-- 🔥 **News**: ```2024/08/12```: The `transformers` version required for the GLM-4-9B-Chat model has been upgraded
-  to `4.44.0`. Please pull all files again except for the model weights (`*.safetensor` files and `tokenizer.model`),
-  and strictly update the dependencies as per `basic_demo/requirements.txt`.
-- 🔥 **News**: ```2024/07/24```:  we released the latest technical interpretation related to long texts. Check
-  out [here](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85) to view
-  our
-  technical report on long context technology in the training of the open-source GLM-4-9B model.
-- 🔥 **News**: ``2024/7/16``: The `transformers` version that the GLM-4-9B-Chat model depends on has been upgraded
-  to `4.42.4`. Please update the model configuration file and refer to `basic_demo/requirements.txt` to update the
-  dependencies.
-- 🔥 **News**: ``2024/7/9``: The GLM-4-9B-Chat model has been adapted to [Ollama](https://github.com/ollama/ollama)
-  and [Llama.cpp](https://github.com/ggerganov/llama.cpp), you can check the specific details
-  in [PR](https://github.com/ggerganov/llama.cpp/pull/8031).
-- 🔥 **News**: ``2024/7/1``: We have updated the multimodal fine-tuning of GLM-4V-9B. You need to update the run file and
-  configuration file of our model repository to support this feature. For more fine-tuning details (such as dataset
-  format, video memory requirements), please go to [view](finetune_demo).
-- 🔥 **News**: ``2024/6/28``: We have worked with the Intel technical team to improve the ITREX and OpenVINO deployment
-  tutorials for GLM-4-9B-Chat. You can use Intel CPU/GPU devices to efficiently deploy the GLM-4-9B open source model.
-  Welcome to [view](intel_device_demo).
-- 🔥 **News**: ``2024/6/24``: We have updated the running files and configuration files of the model repository to
-  support Flash Attention 2, Please update the model configuration file and refer to the sample code
-  in `basic_demo/trans_cli_demo.py`.
-- 🔥 **News**: ``2024/6/19``: We updated the running files and configuration files of the model repository and fixed some
-  model inference issues. Welcome to clone the latest model repository.
-- 🔥 **News**: ``2024/6/18``: We released a [technical report](https://arxiv.org/pdf/2406.12793), welcome to check it
-  out.
-- 🔥 **News**: ``2024/6/05``: We released the GLM-4-9B series of open source models
+- 🔥 **News**: ```2024/08/15```: We have
+  open-sourced [longwriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b), a model capable of generating over
+  10,000 tokens in single-turn dialogue, along with the
+  dataset [LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k). Experience it online
+  at [Huggingface Space](https://huggingface.co/spaces/THUDM/LongWriter) or
+  the [ModelScope Community Space](https://modelscope.cn/studios/ZhipuAI/LongWriter-glm4-9b-demo).
+- 🔥 **News**: ```2024/07/24```: We published the latest technical insights on long-text processing. Check out our
+  technical report on training the open-source GLM-4-9B model for long
+  texts [here](https://medium.com/@ChatGLM/glm-long-scaling-pre-trained-model-contexts-to-millions-caa3c48dea85).
+- 🔥 **News**: ```2024/07/09```: The GLM-4-9B-Chat model is now compatible
+  with [Ollama](https://github.com/ollama/ollama) and [Llama.cpp](https://github.com/ggerganov/llama.cpp). See detailed
+  information in this [PR](https://github.com/ggerganov/llama.cpp/pull/8031).
+- 🔥 **News**: ```2024/06/18```: We have released a [technical report](https://arxiv.org/pdf/2406.12793), available for
+  viewing.
+- 🔥 **News**: ```2024/06/05```: We released the GLM-4-9B series of open-source models.
 
 ## Model Introduction
 
@@ -67,15 +56,14 @@ GPT-4-turbo-2024-04-09, Gemini 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
 
 ## Model List
 
-| Model               | Type | Seq Length | Transformers |   vLLM   | Download                                                                                                                                                                                                                | Online Demo                                                                                                                                                                                |
-|:-------------------:|:----:|:----------:|:------------:|:--------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
-| GLM-4-9B            | Base | 8K         |   <= 4.45    | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/glm-4-9b)                             | /                                                                                                                                                                                          |
-| GLM-4-9B-Chat       | Chat | 128K       |   <= 4.45    | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat)              | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
-| GLM-4-9B-Chat-HF    | Chat | 128K       |   >= 4.46    | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-hf)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-hf)                                                                              | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
-| GLM-4-9B-Chat-1M    | Chat | 1M         |   <= 4.45    | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat-1M)     | /                                                                                                                                                                                          |
-| GLM-4-9B-Chat-1M-HF | Chat | 1M         |   >= 4.46    | <= 0.6.2 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m-hf)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m-hf)                                                                        | /                                                                                                                                                                                          |
-| GLM-4V-9B           | Chat | 8K         |   >= 4.46    | >= 0.6.3 | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4V-9B)                          | [🤖 ModelScope](https://modelscope.cn/studios/ZhipuAI/glm-4v-9b-Demo/summary)                                                                                                              |
-
+|        Model        | Type | Seq Length | Transformers  |                                                                                                      Download                                                                                                       |                                                                                        Online Demo                                                                                         |
+|:-------------------:|:----:|:----------:|:-------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+|      GLM-4-9B       | Base |     8K     | `4.44 - 4.45` |             [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/glm-4-9b)             |                                                                                             /                                                                                              |
+|    GLM-4-9B-Chat    | Chat |    128K    | `4.44 - 4.45` |     [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat)      | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
+|  GLM-4-9B-Chat-HF   | Chat |    128K    |  `>= 4.46.0`  |                                     [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-hf)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-hf)                                      | [🤖 ModelScope CPU](https://modelscope.cn/studios/dash-infer/GLM-4-Chat-DashInfer-Demo/summary)<br> [🤖 ModelScope vLLM](https://modelscope.cn/studios/ZhipuAI/glm-4-9b-chat-vllm/summary) |
+|  GLM-4-9B-Chat-1M   | Chat |     1M     | `4.44 - 4.45` | [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4-9B-Chat-1M) |                                                                                             /                                                                                              |
+| GLM-4-9B-Chat-1M-HF | Chat |     1M     |  `>= 4.46.0`  |                                  [🤗 Huggingface](https://huggingface.co/THUDM/glm-4-9b-chat-1m-hf)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m-hf)                                   |                                                                                             /                                                                                              |
+|      GLM-4V-9B      | Chat |     8K     |  `>= 4.46.0`  |           [🤗 Huggingface](https://huggingface.co/THUDM/glm-4v-9b)<br> [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)<br> [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/GLM-4V-9B)            |                                                       [🤖 ModelScope](https://modelscope.cn/studios/ZhipuAI/glm-4v-9b-Demo/summary)                                                        |
 
 ## BenchMarkß
 
diff --git a/basic_demo/trans_cli_demo.py b/basic_demo/trans_cli_demo.py
index c843258..dfff5bd 100644
--- a/basic_demo/trans_cli_demo.py
+++ b/basic_demo/trans_cli_demo.py
@@ -11,21 +11,24 @@ ensuring that the CLI interface displays formatted text correctly.
 
 If you use flash attention, you should install the flash-attn and  add attn_implementation="flash_attention_2" in model loading.
 
-Note:
-    Using with glm-4-9b-chat-hf will require `transformers>=4.46.0".
 """
 
 import torch
 from threading import Thread
 from transformers import AutoTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer
 
-MODEL_PATH = "THUDM/glm-4-9b-chat-hf"
+MODEL_PATH = "THUDM/glm-4-9b-chat"
 
-tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
+# trust_remote_code=True is needed if you using with `glm-4-9b-chat`
+# Not use if you using with `glm-4-9b-chat-hf`
+# both tokenizer and model should consider with this issue.
+
+tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
 
 model = AutoModelForCausalLM.from_pretrained(
     MODEL_PATH, # attn_implementation="flash_attention_2", # Use Flash Attention
     torch_dtype=torch.bfloat16,  # using flash-attn must use bfloat16 or float16
+    trust_remote_code=True,
     device_map="auto").eval()
 
 
diff --git a/basic_demo/trans_cli_vision_demo.py b/basic_demo/trans_cli_vision_demo.py
index adca35d..671e701 100644
--- a/basic_demo/trans_cli_vision_demo.py
+++ b/basic_demo/trans_cli_vision_demo.py
@@ -10,19 +10,20 @@ Note: The script includes a modification to handle markdown to plain text conver
 ensuring that the CLI interface displays formatted text correctly.
 """
 
-import os
 import torch
 from threading import Thread
 from transformers import (
     AutoTokenizer,
     StoppingCriteria,
     StoppingCriteriaList,
-    TextIteratorStreamer, AutoModel, BitsAndBytesConfig
+    TextIteratorStreamer,
+    AutoModel,
+    BitsAndBytesConfig
 )
 
 from PIL import Image
 
-MODEL_PATH = os.environ.get('MODEL_PATH', 'THUDM/glm-4v-9b')
+MODEL_PATH = "THUDM/glm-4v-9b"
 
 tokenizer = AutoTokenizer.from_pretrained(
     MODEL_PATH,