first commit

This commit is contained in:
xxl 2025-03-03 14:38:15 +08:00
parent 07b8382957
commit cb9fab3c2b
15 changed files with 4108 additions and 2 deletions

10
1_Pooling/config.json Normal file
View File

@ -0,0 +1,10 @@
{
"word_embedding_dimension": 2304,
"pooling_mode_cls_token": false,
"pooling_mode_mean_tokens": false,
"pooling_mode_max_tokens": false,
"pooling_mode_mean_sqrt_len_tokens": false,
"pooling_mode_weightedmean_tokens": false,
"pooling_mode_lasttoken": true,
"include_prompt": true
}

124
README.md
View File

@ -1,3 +1,123 @@
# SFR-Embedding-Code-2B_R
---
license: cc-by-nc-4.0
pipeline_tag: feature-extraction
tags:
- transformers
- sentence-transformers
- code
- retrieval
---
<h1 align="center">Salesforce/SFR-Embedding-Code-2B_R</h1>
**SFR-Embedding by Salesforce Research.**
The Salesforce/SFR-Embedding-Code is a generalist embedding model family for multilingual and multi-task code and Text retrieval. It demonstrates superior performance compared to various open-source code embedding models across multiple code retrieval tasks.
Check out our [paper](https://arxiv.org/abs/2411.12644) for more details!
We also offer 400M-size model [Salesforce/SFR-Embedding-Code-400_R](https://huggingface.co/Salesforce/SFR-Embedding-Code-400M_R)
### Ethical Considerations
This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact peoples lives, rights, or safety. For further guidance on use cases, refer to our [AUP](https://www.salesforce.com/content/dam/web/en_us/www/documents/legal/Agreements/policies/ExternalFacing_Services_Policy.pdf) and [AI AUP](https://www.salesforce.com/content/dam/web/en_us/www/documents/legal/Agreements/policies/ai-acceptable-use-policy.pdf).
### License Statement:
Users need to make their own assessment regarding any obligations or responsibilities under the corresponding licenses or terms and conditions pertaining to the original datasets and data. This release is for research purposes only in support of an academic paper.
This released model is a fine-tuned version of Gemma and Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms. Additionally, the use of this model is restricted as set forth in the Gemma Prohibited Use Policy at ai.google.dev/gemma/prohibited_use_policy ("Prohibited Use Policy"), which is hereby incorporated by reference into this Agreement.
### Performance on CoIR Benchmark
| Model | Model Size | CoIR AVG (NDCG@10) |
|-----------------------|------------|---------------------|
| **SFR-Embedding-Code** | 2B | 67.4 |
| CodeSage-Large-v2 | 1.3B | 64.2 |
| CodeSage-Large | 1.3B | 61.0 |
| **SFR-Embedding-Code** | 400M | 61.9 |
| CodeRankEmbed | 137M | 60.1 |
| CodeSage-Base | 356M | 57.5 |
| Voyage-Code-002 | - | 56.3 |
| CodeSage-Small | 130M | 54.4 |
SFR-Embedding Team († indicates co-leaders)
* Ye Liu
* Rui Meng
* Shafiq Rayhan Joty
* Silvio Savarese
* Caiming Xiong †
* Yingbo Zhou †
* Semih Yavuz †
## How to run
#### Transformers
```python
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
# Each query needs to be accompanied by an corresponding instruction describing the task.
query_instruction_example = "Given Code or Text, retrieval relevant content"
queries = [
"how to implement quick sort in Python?"
]
# No instruction needed for retrieval passages
passages = [
"def quick_sort(arr):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)",
"def bubble_sort(arr):\n n = len(arr)\n for i in range(n):\n for j in range(0, n-i-1):\n if arr[j] > arr[j+1]:\n arr[j], arr[j+1] = arr[j+1], arr[j]\n return arr"
]
# load model with tokenizer
model = AutoModel.from_pretrained('Salesforce/SFR-Embedding-Code-2B_R', trust_remote_code=True)
# get the embeddings
max_length = 32768
query_embeddings = model.encode_queries(queries, instruction=query_instruction_example, max_length=max_length)
passage_embeddings = model.encode_corpus(passages, max_length=max_length)
# normalize embeddings
query_embeddings = F.normalize(query_embeddings, p=2, dim=1)
passage_embeddings = F.normalize(passage_embeddings, p=2, dim=1)
scores = (query_embeddings @ passage_embeddings.T) * 100
print(scores.tolist())
# [[69.26929473876953, 58.41606903076172]]
```
#### Sentence Transformers
```python
from sentence_transformers import SentenceTransformer
# Each query needs to be accompanied by an corresponding instruction describing the task.
query_instruction_example = "Instruct: Given Code or Text, retrieval relevant content\nQuery: "
queries = ["how to implement quick sort in Python?"]
# No instruction needed for retrieval passages
passages = [
"def quick_sort(arr):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)",
"def bubble_sort(arr):\n n = len(arr)\n for i in range(n):\n for j in range(0, n-i-1):\n if arr[j] > arr[j+1]:\n arr[j], arr[j+1] = arr[j+1], arr[j]\n return arr"
]
# Load the Sentence Transformer model, including pooling
model = SentenceTransformer('Salesforce/SFR-Embedding-Code-2B_R', trust_remote_code=True)
# Compute the embeddings for both queries and passages. Use 'prompt' for queries only
query_embeddings = model.encode(queries, prompt=query_instruction_example)
passage_embeddings = model.encode(passages)
# Compute the similarities between the queries and passages
similarities = model.similarity(query_embeddings, passage_embeddings)
print(similarities)
# tensor([[0.6927, 0.5842]])
```
### Citation
```bibtex
@article{liu2024codexembed,
title={CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval},
author={Liu, Ye and Meng, Rui and Jot, Shafiq and Savarese, Silvio and Xiong, Caiming and Zhou, Yingbo and Yavuz, Semih},
journal={arXiv preprint arXiv:2411.12644},
year={2024}
}
```
SFR-Embedding-Code-2B_R

40
config.json Normal file
View File

@ -0,0 +1,40 @@
{
"_name_or_path": "Salesforce/SFR-Embedding-Code-2B_R",
"architectures": [
"CodeXEmbedModel2B"
],
"auto_map": {
"AutoConfig": "configuration_gemma2.CodeXEmbedConfig",
"AutoModel": "modeling_gemma2.CodeXEmbedModel2B"
},
"attention_bias": false,
"attention_dropout": 0.0,
"attn_logit_softcapping": 50.0,
"bos_token_id": 2,
"cache_implementation": "hybrid",
"eos_token_id": [
1,
107
],
"final_logit_softcapping": 30.0,
"head_dim": 256,
"hidden_act": "gelu_pytorch_tanh",
"hidden_activation": "gelu_pytorch_tanh",
"hidden_size": 2304,
"initializer_range": 0.02,
"intermediate_size": 9216,
"max_position_embeddings": 8192,
"model_type": "codexembed2b",
"num_attention_heads": 8,
"num_hidden_layers": 26,
"num_key_value_heads": 4,
"pad_token_id": 0,
"query_pre_attn_scalar": 256,
"rms_norm_eps": 1e-06,
"rope_theta": 10000.0,
"sliding_window": 4096,
"torch_dtype": "bfloat16",
"transformers_version": "4.45.1",
"use_cache": true,
"vocab_size": 256000
}

View File

@ -0,0 +1,10 @@
{
"__version__": {
"sentence_transformers": "3.0.1",
"transformers": "4.41.2",
"pytorch": "2.3.0+cu121"
},
"prompts": {},
"default_prompt_name": null,
"similarity_fn_name": "cosine"
}

156
configuration_gemma2.py Normal file
View File

@ -0,0 +1,156 @@
# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
# This file was automatically generated from <path_to_diff_file.py>.
# Do NOT edit this file manually as any edits will be overwritten by the generation of
# the file from the diff. If any change should be done, please apply the change to the
# diff.py file directly.
# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
# coding=utf-8
# Copyright 2024 Google Inc. HuggingFace Inc. team. All rights reserved.
#
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from transformers import PretrainedConfig
class CodeXEmbedConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`Gemma2Model`]. It is used to instantiate an Gemma2
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to that of the Gemma2-7B.
e.g. [google/gemma2-7b](https://huggingface.co/google/gemma2-7b)
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*, defaults to 256000):
Vocabulary size of the Gemma2 model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`Gemma2Model`]
hidden_size (`int`, *optional*, defaults to 3072):
Dimension of the hidden representations.
intermediate_size (`int`, *optional*, defaults to 24576):
Dimension of the MLP representations.
num_hidden_layers (`int`, *optional*, defaults to 28):
Number of hidden layers in the Transformer decoder.
num_attention_heads (`int`, *optional*, defaults to 16):
Number of attention heads for each attention layer in the Transformer decoder.
num_key_value_heads (`int`, *optional*, defaults to 16):
This is the number of key_value heads that should be used to implement Grouped Query Attention. If
`num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
`num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
by meanpooling all the original heads within that group. For more details checkout [this
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
`num_attention_heads`.
head_dim (`int`, *optional*, defaults to 256):
The attention head dimension.
hidden_activation (`str` or `function`, *optional*, defaults to `"gelu_pytorch_tanh"`):
The non-linear activation function (function or string) in the decoder.
max_position_embeddings (`int`, *optional*, defaults to 8192):
The maximum sequence length that this model might ever be used with.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
rms_norm_eps (`float`, *optional*, defaults to 1e-06):
The epsilon used by the rms normalization layers.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
pad_token_id (`int`, *optional*, defaults to 0):
Padding token id.
eos_token_id (`int`, *optional*, defaults to 1):
End of stream token id.
bos_token_id (`int`, *optional*, defaults to 2):
Beginning of stream token id.
tie_word_embeddings (`bool`, *optional*, defaults to `True`):
Whether to tie weight embeddings
rope_theta (`float`, *optional*, defaults to 10000.0):
The base period of the RoPE embeddings.
attention_bias (`bool`, defaults to `False`, *optional*, defaults to `False`):
Whether to use a bias in the query, key, value and output projection layers during self-attention.
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
final_logit_softcapping (`float`, *optional*, defaults to 30.0): scaling factor when applying tanh softcapping on the logits.
attn_logit_softcapping (`float`, *optional*, defaults to 50.0): scaling factor when applying tanh softcapping on the attention scores.
query_pre_attn_scalar (`float`, *optional*, defaults to 224): scaling factor used on the attention scores
sliding_window (`int`, *optional*, defaults to 4096): in Gemma2, every other layer uses sliding window attention. This is the
size of the sliding window.
```python
>>> from transformers import Gemma2Model, CodeXEmbedConfig
>>> # Initializing a Gemma2 gemma2-9b style configuration
>>> configuration = CodeXEmbedConfig()
>>> # Initializing a model from the gemma2-9b style configuration
>>> model = Gemma2Model(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "codexembed2b"
keys_to_ignore_at_inference = ["past_key_values"]
def __init__(
self,
vocab_size=256000,
hidden_size=3072,
intermediate_size=24576,
num_hidden_layers=28,
num_attention_heads=16,
num_key_value_heads=16,
head_dim=256,
hidden_activation="gelu_pytorch_tanh",
max_position_embeddings=8192,
initializer_range=0.02,
rms_norm_eps=1e-6,
use_cache=True,
pad_token_id=0,
eos_token_id=1,
bos_token_id=2,
tie_word_embeddings=True,
rope_theta=10000.0,
attention_bias=False,
attention_dropout=0.0,
final_logit_softcapping=30.0,
attn_logit_softcapping=50.0,
query_pre_attn_scalar=224,
sliding_window=4096,
**kwargs,
):
self.vocab_size = vocab_size
self.max_position_embeddings = max_position_embeddings
self.hidden_size = hidden_size
self.intermediate_size = intermediate_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.head_dim = head_dim
self.num_key_value_heads = num_key_value_heads
self.hidden_activation = hidden_activation
self.initializer_range = initializer_range
self.rms_norm_eps = rms_norm_eps
self.use_cache = use_cache
self.rope_theta = rope_theta
self.attention_bias = attention_bias
self.attention_dropout = attention_dropout
self.attn_logit_softcapping = attn_logit_softcapping
super().__init__(
pad_token_id=pad_token_id,
bos_token_id=bos_token_id,
eos_token_id=eos_token_id,
tie_word_embeddings=tie_word_embeddings,
**kwargs,
)
self.final_logit_softcapping = final_logit_softcapping
self.query_pre_attn_scalar = query_pre_attn_scalar
self.sliding_window = sliding_window
self.cache_implementation = "hybrid"
MODEL_TYPE = "codexembed2b"
from transformers import AutoConfig
AutoConfig.register(MODEL_TYPE, CodeXEmbedConfig)

BIN
model-00001-of-00002.safetensors (Stored with Git LFS) Normal file

Binary file not shown.

BIN
model-00002-of-00002.safetensors (Stored with Git LFS) Normal file

Binary file not shown.

View File

@ -0,0 +1,295 @@
{
"metadata": {
"total_size": 5228683776
},
"weight_map": {
"embed_tokens.weight": "model-00001-of-00002.safetensors",
"layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.0.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.0.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.1.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.1.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.10.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.10.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.11.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.11.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.12.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.12.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.13.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.13.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.14.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.14.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.15.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.15.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.16.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.16.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.17.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.17.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.18.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.18.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.19.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.19.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.2.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.2.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.20.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.20.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.21.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.21.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.21.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.22.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.22.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.22.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.22.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.22.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.22.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.23.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.23.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.23.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.23.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.23.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.23.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.23.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.23.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
"layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"layers.24.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"layers.24.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"layers.24.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"layers.24.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.24.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.24.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
"layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"layers.25.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"layers.25.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.3.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.3.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.4.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.4.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.5.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.5.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.6.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.6.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.7.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.7.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.8.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.8.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.9.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.9.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"norm.weight": "model-00002-of-00002.safetensors"
}
}

1398
modeling_gemma2.py Normal file

File diff suppressed because it is too large Load Diff

14
modules.json Normal file
View File

@ -0,0 +1,14 @@
[
{
"idx": 0,
"name": "0",
"path": "",
"type": "sentence_transformers.models.Transformer"
},
{
"idx": 1,
"name": "1",
"path": "1_Pooling",
"type": "sentence_transformers.models.Pooling"
}
]

View File

@ -0,0 +1,4 @@
{
"max_seq_length": 4096,
"do_lower_case": false
}

34
special_tokens_map.json Normal file
View File

@ -0,0 +1,34 @@
{
"additional_special_tokens": [
"<start_of_turn>",
"<end_of_turn>"
],
"bos_token": {
"content": "<bos>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<eos>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

BIN
tokenizer.json (Stored with Git LFS) Normal file

Binary file not shown.

BIN
tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

2013
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff