first commit

This commit is contained in:
xxl 2024-11-21 13:31:42 +08:00
parent 748ae2d524
commit f3a6cf78e1
10 changed files with 111506 additions and 2 deletions

33
MODEL_LICENSE Normal file
View File

@ -0,0 +1,33 @@
The aiXcoder Model License
1. Definitions
“Licensor” means the aiXcoder Model Team that distributes its Software.
“Software” means the aiXcoder model parameters made available under this license.
2. License Grant
Subject to the terms and conditions of this License, the Licensor hereby grants to you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license to use the Software solely for your non-commercial research purposes.
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
3. Restriction
You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any commercial, military, or illegal purposes.
You will not use the Software for any act that may undermine China's national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings.
4. Disclaimer
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
5. Limitation of Liability
EXCEPT TO THE EXTENT PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER BASED IN TORT, NEGLIGENCE, CONTRACT, LIABILITY, OR OTHERWISE WILL ANY LICENSOR BE LIABLE TO YOU FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES, OR ANY OTHER COMMERCIAL LOSSES, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
6. Dispute Resolution
This license shall be governed and construed in accordance with the laws of Peoples Republic of China. Any dispute arising from or in connection with this License shall be submitted to Haidian District People's Court in Beijing.
Note that the license is subject to update to a more comprehensive version. For any questions related to the license and copyright, please contact us at license@aixcoder.com.

311
README.md
View File

@ -1,3 +1,310 @@
# aiXcoder-7b-base_a13670361290633216590012 # aiXcoder-7B Code Large Language Model
aiXcoder-7b-base <p align="center">
🏠 <a href="https://www.aixcoder.com/" target="_blank">官网</a>|🛠 <a href="https://marketplace.visualstudio.com/items?itemName=aixcoder-plugin.aixcoder" target="_blank">VS Code 插件</a>|🛠 <a href="https://plugins.jetbrains.com/plugin/13574-aixcoder-code-completer" target="_blank">Jetbrains 插件</a>|🤗 <a href="https://huggingface.co/aiXcoder/aixcoder-7b-base" target="_blank">模型下载</a><a href="./assets/wechat_1.jpg" target="_blank">技术交流群</a><a href="./assets/wechat_2.jpg" target="_blank">公众号</a>
</p>
欢迎来到aiXcoder-7B代码大型语言模型的官方仓库。该模型旨在理解和生成多种编程语言中的代码提供在代码完成、理解、生成以及更多关于编程语言的任务中的最先进性能。
目录
1. [模型简介](#模型简介)
2. [快速上手](#快速上手)
- [运行环境](#运行环境)
- [模型权重](#模型权重)
- [推理示例](#推理示例)
6. [License](#license)
7. [Acknowledgments](#acknowledgments)
## 模型简介
随着代码大模型的能力逐渐被挖掘出来aiXcoder 也一直在思考怎样才能令代码大模型在实际开发场景中有更大的帮助。为此,我们开源了 aiXcoder 7B Base该模型在1.2T Unique Tokens上做了大量的训练并且该模型的预训练任务及上下文信息都为真实代码生成场景做了独特的设计。
aiXcoder 7B Base 在代码补全场景下是所有同等级参数量模型中效果最好的主流多语言nl2code benchmark 平均上效果也超过codellama 34B 和StarCoder2 15B。
在我们持续推动代码大模型应用的探索过程中aiXcoder 7B Base 模型的发布标志着一个重要的里程碑。当前版本的 aiXcoder 7B Base 是一个基础模型专注于提升代码补全和代码生成的效率与准确性旨在为开发人员在这些场景下提供强有力的支持。值得注意的是这个版本尚未经过特别的instruct微调意味着它在特定的高级任务如测试用例生成和代码调试方面可能还未达到最优表现。
然而我们已经在规划中包含了对aiXcoder模型系列的进一步发展。在不久的将来我们计划发布新的模型版本这些版本将经过精心的Instruct微调专门针对更广泛的编程任务包括但不限于测试用例生成和代码调试。通过这些经过Instruct微调的模型我们期待能够为开发者提供更全面、更深入的编程支持帮助他们在软件开发的每一个阶段都能发挥出最大的效率。
## 快速上手
### 运行环境
#### 选择一:构建一个运行环境
主要的环境依赖为:
- Python 3.8 or higher
- PyTorch 2.1.0 or higher
- sentencepiece 0.2.0 or higher
- transformers 4.34.1 or higher (if run inference by transformers library)
在支持CUDA环境的宿主机或者容器内执行以下命令安装环境依赖项
```bash
conda create -n aixcoder-7b python=3.11
conda activate aixcoder-7b
git clone git@github.com:aixcoder-plugin/aiXcoder-7b.git
cd aiXcoder-7b
pip install -r requirements.txt
```
`requirements.txt` 列举了所有的依赖项及其版本号。
如果想要加快推理速度,我们强烈建议安装 FlashAttention 库可选。在确定您的芯片版本与CUDA版本支持FlashAttention 的条件下,可通过以下步骤进行安装:
```bash
git clone git@github.com:Dao-AILab/flash-attention.git
cd flash-attention
MAX_JOBS=8 python setup.py install
```
#### Option 2: Docker
为了更好地隔离开发环境,我们建议您可以在 Docker 容器内运行模型推理。如下是启动准备 docker 环境的步骤:
1. 安装 Docker如果您的机器还没有安装Docker您可以参考官方的安装步骤安装。
2. 拉取镜像: 从 Docker Hub 拉取 PyTorch 镜像。
```bash
docker pull pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel
```
3. 启动容器: 拉取docker 镜像后,可以启动容器,并在容器中运行模型。
```bash
docker run --gpus all -it -v /dev/shm:/dev/shm --name aix_instance pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel /bin/bash
pip install sentencepiece
git clone git@github.com:aixcoder-plugin/aiXcoder-7b.git
cd aiXcoder-7b
```
如果想要加快推理速度,我们强烈建议安装 FlashAttention 库可选。在确定您的芯片版本与CUDA版本支持FlashAttention 的条件下,可通过以下步骤进行安装:
```bash
git clone git@github.com:Dao-AILab/flash-attention.git
cd flash-attention
MAX_JOBS=8 python setup.py install
```
4. 模型推理: 在容器内,您可以安装推理示例代码进行预测。
### 模型权重
您能从以下地址下载模型:
- [aiXcoder Base Download](https://www.modelscope.cn/models/aiXcoder/aiXcoder-7b-base)
- aiXcoder Instruct Download (Comming soon...)
### 推理示例
#### 命令行执行
如果需要快速执行,只需要通过以下命令行即可运行推理样本:
```bash
torchrun --nproc_per_node 1 sess_megatron.py --model_dir "path/to/model_weights_dir"
```
将 "path/to/model_weights_dir"替换为您下载模型权重后的本地地址。
或者通过 huggingface 的 transformers 库进行推理测试:
```bash
python sess_huggingface.py
```
或者通过魔塔的 modelscope 库进行推理测试:
```bash
python sess_modelscope.py
```
#### Python 脚本
如果您想嵌入自己的工具流,或者获得更灵活的使用方式,您能通过以下代码直接调用:
```python
from sess_megatron import TestInference
infer = TestInference()
res = infer.run_infer(
# for FIM style input, code_string stands for prefix context
code_string="""# 快速排序算法""",
# for FIM style input, later_code stands for suffix context
later_code="\n",
# file_path should be a path from project to file
file_path="test.py",
# max num for generated tokens
max_new_tokens=256,
)
print(res)
"""output:
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0]
less = [i for i in arr[1:] if i <= pivot]
greater = [i for i in arr[1:] if i > pivot]
return quick_sort(less) + [pivot] + quick_sort(greater)
# 测试
arr = [3, 2, 1, 4, 5]
print(quick_sort(arr)) # [1, 2, 3, 4, 5]
"""
```
```python
import torch
import sys
from hf_mini.utils import input_wrapper
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
tokenizer = AutoTokenizer.from_pretrained("aiXcoder/aixcoder-7b-base")
model = AutoModelForCausalLM.from_pretrained("aiXcoder/aixcoder-7b-base", torch_dtype=torch.bfloat16)
text = input_wrapper(
# for FIM style input, code_string stands for prefix context
code_string="# 快速排序算法",
# for FIM style input, later_code stands for suffix context
later_code="\n# 测试\narr = [3, 2, 1, 4, 5]\nprint(quick_sort(arr)) # [1, 2, 3, 4, 5]",
# file_path should be a path from project to file
path="test.py"
)
if len(text) == 0:
sys.exit()
inputs = tokenizer(text, return_tensors="pt", return_token_type_ids=False)
inputs = inputs.to(device)
model.to(device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
"""output:
def quick_sort(arr):
# 如果数组长度小于等于1直接返回
if len(arr) <= 1:
return arr
# 选择数组的第一个元素作为基准
pivot = arr[0]
# 初始化左右指针
left, right = 1, len(arr) - 1
# 循环直到左指针小于右指针
while left < right:
# 从右到左找到第一个小于基准的元素,与左指针元素交换
if arr[right] < pivot:
arr[left], arr[right] = arr[right], arr[left]
left += 1
# 从左到右找到第一个大于等于基准的元素,与右指针元素交换
if arr[left] >= pivot:
right -= 1
# 将基准元素与左指针元素交换
arr[left], arr[0] = arr[0], arr[left]
# 对左半部分进行递归排序
quick_sort(arr[:left])
# 对右半部分进行递归排序
quick_sort(arr[left + 1:])
return arr</s>
"""
```
如果要在 modelscope 中使用模型,请先通过 `pip install modelscope` 命令安装推理库。
```python
import torch
import sys
from hf_mini.utils import input_wrapper
from modelscope import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
tokenizer = AutoTokenizer.from_pretrained("aiXcoder/aixcoder-7b-base")
model = AutoModelForCausalLM.from_pretrained("aiXcoder/aixcoder-7b-base", torch_dtype=torch.bfloat16)
text = input_wrapper(
# for FIM style input, code_string stands for prefix context
code_string="# 快速排序算法",
# for FIM style input, later_code stands for suffix context
later_code="\n# 测试\narr = [3, 2, 1, 4, 5]\nprint(quick_sort(arr)) # [1, 2, 3, 4, 5]",
# file_path should be a path from project to file
path="test.py"
)
if len(text) == 0:
sys.exit()
inputs = tokenizer(text, return_tensors="pt", return_token_type_ids=False)
inputs = inputs.to(device)
model.to(device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
"""output:
def quick_sort(arr):
# 如果数组长度小于等于1直接返回
if len(arr) <= 1:
return arr
# 选择数组的第一个元素作为基准
pivot = arr[0]
# 初始化左右指针
left, right = 1, len(arr) - 1
# 循环直到左指针小于右指针
while left < right:
# 从右到左找到第一个小于基准的元素,与左指针元素交换
if arr[right] < pivot:
arr[left], arr[right] = arr[right], arr[left]
left += 1
# 从左到右找到第一个大于等于基准的元素,与右指针元素交换
if arr[left] >= pivot:
right -= 1
# 将基准元素与左指针元素交换
arr[left], arr[0] = arr[0], arr[left]
# 对左半部分进行递归排序
quick_sort(arr[:left])
# 对右半部分进行递归排序
quick_sort(arr[left + 1:])
return arr</s>
"""
```
## License
The model weights are licensed under the [Model License](./MODEL_LICENSE) for academic research use; for commercial use, please apply by sending an email to support@aixcoder.com.
## Acknowledgments
We would like to thank all contributors to the open-source projects and datasets that made this work possible.
For any questions or issues, please open an issue on this repository.
Thank you for your interest in our Code Large Language Model. We look forward to your contributions and feedback!

25
config.json Normal file
View File

@ -0,0 +1,25 @@
{
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14464,
"max_position_embeddings": 32768,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-06,
"rope_theta": 256000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.34.1",
"use_cache": true,
"vocab_size": 49152
}

1
configuration.json Normal file
View File

@ -0,0 +1 @@
{"framework":"Pytorch","task":"text-generation"}

6
generation_config.json Normal file
View File

@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.34.1"
}

View File

@ -0,0 +1,299 @@
{
"metadata": {
"total_size": 7432572928
},
"weight_map": {
"lm_head.weight": "pytorch_model-00001-of-00001.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00001.bin",
"model.norm.weight": "pytorch_model-00001-of-00001.bin"
}
}

1
special_tokens_map.json Normal file
View File

@ -0,0 +1 @@
{}

110789
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

BIN
tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

40
tokenizer_config.json Normal file
View File

@ -0,0 +1,40 @@
{
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [],
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": true,
"model_max_length": 1000000000000000019884624838656,
"pad_token": null,
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "PreTrainedTokenizerFast",
"unk_token": "<unk>",
"use_default_system_prompt": false
}