first commit
This commit is contained in:
parent
a66465f4e6
commit
c9834d923b
628
README.md
628
README.md
|
@ -1,3 +1,627 @@
|
||||||
# paligemma2-3b-pt-224_a13975912753655808226059
|
---
|
||||||
|
library_name: transformers
|
||||||
|
license: gemma
|
||||||
|
pipeline_tag: image-text-to-text
|
||||||
|
extra_gated_heading: Access PaliGemma on Hugging Face
|
||||||
|
extra_gated_prompt: To access PaliGemma on Hugging Face, you’re required to review
|
||||||
|
and agree to Google’s usage license. To do this, please ensure you’re logged-in
|
||||||
|
to Hugging Face and click below. Requests are processed immediately.
|
||||||
|
extra_gated_button_content: Acknowledge license
|
||||||
|
---
|
||||||
|
# PaliGemma 2 model card
|
||||||
|
|
||||||
paligemma2-3b-pt-224
|
**Model page:** [PaliGemma](https://ai.google.dev/gemma/docs/paligemma)
|
||||||
|
|
||||||
|
Transformers PaliGemma 2 3B weights, pre-trained with 224*224 input images and 128 token input/output text sequences.
|
||||||
|
The model is available the `bfloat16` format for fine-tuning.
|
||||||
|
|
||||||
|
**Resources and technical documentation:**
|
||||||
|
|
||||||
|
* [PaliGemma 2 on Kaggle](https://www.kaggle.com/models/google/paligemma-2)
|
||||||
|
* [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
|
||||||
|
|
||||||
|
**Terms of Use:** [Terms](https://ai.google.dev/gemma/terms)
|
||||||
|
|
||||||
|
**Authors:** Google
|
||||||
|
|
||||||
|
## Model information
|
||||||
|
|
||||||
|
### Model summary
|
||||||
|
|
||||||
|
PaliGemma 2 is an update of the [PaliGemma](https://arxiv.org/abs/2407.07726)
|
||||||
|
vision-language model (VLM) which incorporates the capabilities of the
|
||||||
|
[Gemma 2](https://arxiv.org/abs/2408.00118) models. The PaliGemma family of
|
||||||
|
models is inspired by [PaLI-3](https://arxiv.org/abs/2310.09199) and based on
|
||||||
|
open components such as the [SigLIP](https://arxiv.org/abs/2303.15343) vision
|
||||||
|
model and [Gemma 2](https://arxiv.org/abs/2408.00118) language models. It takes
|
||||||
|
both image and text as input and generates text as output, supporting multiple
|
||||||
|
languages. It is designed for class-leading fine-tune performance on a wide
|
||||||
|
range of vision-language tasks such as image and short video caption, visual
|
||||||
|
question answering, text reading, object detection and object segmentation.
|
||||||
|
|
||||||
|
#### Model architecture
|
||||||
|
|
||||||
|
PaliGemma 2 is the composition of a
|
||||||
|
[Transformer decoder](https://arxiv.org/abs/1706.03762) and a
|
||||||
|
[Vision Transformer image encoder](https://arxiv.org/abs/2010.11929).
|
||||||
|
The text decoder is initialized from
|
||||||
|
[Gemma 2](https://ai.google.dev/gemma/docs/base) in the 2B, 9B, and 27B
|
||||||
|
parameter sizes. The image encoder is initialized from
|
||||||
|
[SigLIP-So400m/14](https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP_demo.ipynb).
|
||||||
|
Similar to the original PaliGemma model, PaliGemma 2 is trained following the
|
||||||
|
[PaLI-3](https://arxiv.org/abs/2310.09199) recipes.
|
||||||
|
|
||||||
|
#### Inputs and outputs
|
||||||
|
|
||||||
|
* **Input:** Image and text string, such as a prompt to caption the image, or
|
||||||
|
a question.
|
||||||
|
* **Output:** Generated text in response to the input, such as a caption of
|
||||||
|
the image, an answer to a question, a list of object bounding box
|
||||||
|
coordinates, or segmentation codewords.
|
||||||
|
|
||||||
|
#### Citation
|
||||||
|
|
||||||
|
```none
|
||||||
|
@article{
|
||||||
|
title={PaliGemma 2: A Family of Versatile VLMs for Transfer},
|
||||||
|
author={Andreas Steiner and André Susano Pinto and Michael Tschannen and Daniel Keysers and Xiao Wang and Yonatan Bitton and Alexey Gritsenko and Matthias Minderer and Anthony Sherbondy and Shangbang Long and Siyang Qin and Reeve Ingle and Emanuele Bugliarello and Sahar Kazemzadeh and Thomas Mesnard and Ibrahim Alabdulmohsin and Lucas Beyer and Xiaohua Zhai},
|
||||||
|
year={2024},
|
||||||
|
journal={arXiv preprint arXiv:2412.03555}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Model data
|
||||||
|
|
||||||
|
#### Pre-train datasets
|
||||||
|
|
||||||
|
PaliGemma 2 is pre-trained on the following mixture of datasets:
|
||||||
|
|
||||||
|
* **WebLI:** [WebLI (Web Language Image)](https://arxiv.org/abs/2209.06794) is
|
||||||
|
a web-scale multilingual image-text dataset built from the public web. A
|
||||||
|
wide range of WebLI splits are used to acquire versatile model capabilities,
|
||||||
|
such as visual semantic understanding, object localization,
|
||||||
|
visually-situated text understanding, and multilinguality.
|
||||||
|
* **CC3M-35L:** Curated English image-alt_text pairs from webpages
|
||||||
|
([Sharma et al., 2018](https://aclanthology.org/P18-1238/)). We used the
|
||||||
|
[Google Cloud Translation API](https://cloud.google.com/translate) to
|
||||||
|
translate into 34 additional languages.
|
||||||
|
* **VQ²A-CC3M-35L/VQG-CC3M-35L:** A subset of VQ2A-CC3M
|
||||||
|
([Changpinyo et al., 2022a](https://aclanthology.org/2022.naacl-main.142/)),
|
||||||
|
translated into the same additional 34 languages as CC3M-35L, using the
|
||||||
|
[Google Cloud Translation API](https://cloud.google.com/translate).
|
||||||
|
* **OpenImages:** Detection and object-aware questions and answers
|
||||||
|
([Piergiovanni et al. 2022](https://arxiv.org/abs/2209.04372)) generated by
|
||||||
|
handcrafted rules on the [OpenImages dataset].
|
||||||
|
* **WIT:** Images and texts collected from Wikipedia
|
||||||
|
([Srinivasan et al., 2021](https://arxiv.org/abs/2103.01913)).
|
||||||
|
|
||||||
|
[OpenImages dataset]: https://storage.googleapis.com/openimages/web/factsfigures_v7.html
|
||||||
|
PaliGemma 2 is based on Gemma 2, and you can find information on the
|
||||||
|
pre-training datasets for Gemma 2 in the
|
||||||
|
[Gemma 2 model card](https://ai.google.dev/gemma/docs/model_card_2).
|
||||||
|
|
||||||
|
#### Data responsibility filtering
|
||||||
|
|
||||||
|
The following filters are applied to WebLI, with the goal of training PaliGemma
|
||||||
|
2 on safe and responsible data:
|
||||||
|
|
||||||
|
* **Pornographic image filtering:** This filter removes images deemed to be of
|
||||||
|
pornographic nature.
|
||||||
|
* **Text safety filtering:** We identify and filter out images that are paired
|
||||||
|
with unsafe text. Unsafe text is any text deemed to contain or be about
|
||||||
|
child sexual abuse imagery (CSAI), pornography, vulgarities, or is otherwise
|
||||||
|
offensive.
|
||||||
|
* **Text toxicity filtering:** We further use the [Perspective
|
||||||
|
API](https://perspectiveapi.com/) to identify and filter out images that are
|
||||||
|
paired with text deemed insulting, obscene, hateful or otherwise toxic.
|
||||||
|
* **Text personal information filtering:** We filtered certain personal
|
||||||
|
information and other sensitive data using the [Cloud Data Loss Prevention
|
||||||
|
(DLP) API](https://cloud.google.com/security/products/dlp) to protect the
|
||||||
|
privacy of individuals. Identifiers such as social security numbers and
|
||||||
|
[other sensitive information types] were removed.
|
||||||
|
* **Additional methods:** Filtering based on content quality and safety in
|
||||||
|
line with our policies and practices.
|
||||||
|
|
||||||
|
[other sensitive information types]: https://cloud.google.com/sensitive-data-protection/docs/high-sensitivity-infotypes-reference?_gl=1*jg604m*_ga*ODk5MzA3ODQyLjE3MTAzMzQ3NTk.*_ga_WH2QY8WWF5*MTcxMDUxNTkxMS4yLjEuMTcxMDUxNjA2NC4wLjAuMA..&_ga=2.172110058.-899307842.1710334759
|
||||||
|
|
||||||
|
## Use in Transformers
|
||||||
|
|
||||||
|
The following snippet uses model `google/paligemma2-3b-pt-224` for reference purposes.
|
||||||
|
It is a base model and is recommended to use after fine tuning it on a downstream task.
|
||||||
|
|
||||||
|
Here is a [notebook](https://github.com/merveenoyan/smol-vision/blob/main/Fine_tune_PaliGemma.ipynb)
|
||||||
|
that showcases fine-tuning PaliGemma 2.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import (
|
||||||
|
PaliGemmaProcessor,
|
||||||
|
PaliGemmaForConditionalGeneration,
|
||||||
|
)
|
||||||
|
from transformers.image_utils import load_image
|
||||||
|
import torch
|
||||||
|
|
||||||
|
model_id = "google/paligemma2-3b-pt-224"
|
||||||
|
|
||||||
|
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg"
|
||||||
|
image = load_image(url)
|
||||||
|
|
||||||
|
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto").eval()
|
||||||
|
processor = PaliGemmaProcessor.from_pretrained(model_id)
|
||||||
|
|
||||||
|
# Leaving the prompt blank for pre-trained models
|
||||||
|
prompt = ""
|
||||||
|
model_inputs = processor(text=prompt, images=image, return_tensors="pt").to(torch.bfloat16).to(model.device)
|
||||||
|
input_len = model_inputs["input_ids"].shape[-1]
|
||||||
|
|
||||||
|
with torch.inference_mode():
|
||||||
|
generation = model.generate(**model_inputs, max_new_tokens=100, do_sample=False)
|
||||||
|
generation = generation[0][input_len:]
|
||||||
|
decoded = processor.decode(generation, skip_special_tokens=True)
|
||||||
|
print(decoded)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Implementation information
|
||||||
|
|
||||||
|
### Hardware
|
||||||
|
|
||||||
|
PaliGemma 2 was trained using the latest generation of Tensor Processing Unit
|
||||||
|
(TPU) hardware (TPUv5e).
|
||||||
|
|
||||||
|
### Software
|
||||||
|
|
||||||
|
Training was completed using [JAX](https://github.com/google/jax),
|
||||||
|
[Flax](https://github.com/google/flax),
|
||||||
|
[TFDS](https://github.com/tensorflow/datasets) and
|
||||||
|
[`big_vision`](https://github.com/google-research/big_vision).
|
||||||
|
|
||||||
|
JAX allows researchers to take advantage of the latest generation of hardware,
|
||||||
|
including TPUs, for faster and more efficient training of large models.
|
||||||
|
|
||||||
|
TFDS is used to access datasets and Flax is used for model architecture. The
|
||||||
|
PaliGemma 2 fine-tune code and inference code are released in the `big_vision`
|
||||||
|
GitHub repository.
|
||||||
|
|
||||||
|
## Evaluation information
|
||||||
|
|
||||||
|
### Benchmark results
|
||||||
|
|
||||||
|
In order to verify the transferability of PaliGemma 2 to a wide variety of
|
||||||
|
academic tasks, we fine-tune the pretrained models on each task. We report results on
|
||||||
|
different resolutions to provide an impression of which tasks benefit from
|
||||||
|
increased resolution. Importantly, none of these tasks or datasets are part of
|
||||||
|
the pretraining data mixture, and their images are explicitly removed from the
|
||||||
|
web-scale pre-training data.
|
||||||
|
|
||||||
|
#### PaliGemma 2 results by model resolution and size
|
||||||
|
|
||||||
|
| Benchmark | 224-3B | 224-10B | 224-28B | 448-3B | 448-10B | 448-28B |
|
||||||
|
|-------------------------------|:------:|:-------:|:-------:|:------:|:-------:|:-------:|
|
||||||
|
| [AI2D][ai2d] | 74.7 | 83.1 | 83.2 | 76.0 | 84.4 | 84.6 |
|
||||||
|
| [AOKVQA-DA][aokvqa-da] (val) | 64.2 | 68.9 | 70.2 | 67.9 | 70.8 | 71.2 |
|
||||||
|
| [AOKVQA-MC][aokvqa-mc] (val) | 79.7 | 83.7 | 84.7 | 82.5 | 85.9 | 87.0 |
|
||||||
|
| [ActivityNet-CAP][anet-cap] | 34.2 | 35.9 | - | - | - | - |
|
||||||
|
| [ActivityNet-QA][anet-qa] | 51.3 | 53.2 | - | - | - | - |
|
||||||
|
| [COCO-35L][coco-35l] (avg34) | 113.9 | 115.8 | 116.5 | 115.8 | 117.2 | 117.2 |
|
||||||
|
| [COCO-35L][coco-35l] (en) | 138.4 | 140.8 | 142.4 | 140.4 | 142.4 | 142.3 |
|
||||||
|
| [COCOcap][coco-cap] | 141.3 | 143.7 | 144.0 | 143.4 | 145.0 | 145.2 |
|
||||||
|
| [ChartQA][chartqa] (aug) | 74.4 | 74.2 | 68.9 | 89.2 | 90.1 | 85.1 |
|
||||||
|
| [ChartQA][chartqa] (human) | 42.0 | 48.4 | 46.8 | 54.0 | 66.4 | 61.3 |
|
||||||
|
| [CountBenchQA][countbenchqa] | 81.0 | 84.0 | 86.4 | 82.0 | 85.3 | 87.4 |
|
||||||
|
| [DocVQA][docvqa] (val) | 39.9 | 43.9 | 44.9 | 73.6 | 76.6 | 76.1 |
|
||||||
|
| [GQA][gqa] | 66.2 | 67.2 | 67.3 | 68.1 | 68.3 | 68.3 |
|
||||||
|
| [InfoVQA][info-vqa] (val) | 25.2 | 33.6 | 36.4 | 37.5 | 47.8 | 46.7 |
|
||||||
|
| [MARVL][marvl] (avg5) | 83.5 | 89.5 | 90.6 | 82.7 | 89.1 | 89.7 |
|
||||||
|
| [MSRVTT-CAP][msrvtt] | 68.5 | 72.1 | - | - | - | - |
|
||||||
|
| [MSRVTT-QA][msrvtt] | 50.5 | 51.9 | - | - | - | - |
|
||||||
|
| [MSVD-QA][msvd-qa] | 61.1 | 62.5 | - | - | - | - |
|
||||||
|
| [NLVR2][nlvr2] | 91.4 | 93.9 | 94.2 | 91.6 | 93.7 | 94.1 |
|
||||||
|
| [NoCaps][nocaps] | 123.1 | 126.3 | 127.1 | 123.5 | 126.9 | 127.0 |
|
||||||
|
| [OCR-VQA][ocr-vqa] | 73.4 | 74.7 | 75.3 | 75.7 | 76.3 | 76.6 |
|
||||||
|
| [OKVQA][okvqa] | 64.2 | 68.0 | 71.2 | 64.1 | 68.6 | 70.6 |
|
||||||
|
| [RSVQA-hr][rsvqa-hr] (test) | 92.7 | 92.6 | 92.7 | 92.8 | 92.8 | 92.8 |
|
||||||
|
| [RSVQA-hr][rsvqa-hr] (test2) | 90.9 | 90.8 | 90.9 | 90.7 | 90.7 | 90.8 |
|
||||||
|
| [RSVQA-lr][rsvqa-lr] | 93.0 | 92.8 | 93.5 | 92.7 | 93.1 | 93.7 |
|
||||||
|
| [RefCOCO][refcoco] (testA) | 75.7 | 77.2 | 76.8 | 78.6 | 79.7 | 79.3 |
|
||||||
|
| [RefCOCO][refcoco] (testB) | 71.0 | 74.2 | 73.9 | 73.5 | 76.2 | 74.8 |
|
||||||
|
| [RefCOCO][refcoco] (val) | 73.4 | 75.9 | 75.0 | 76.3 | 78.2 | 77.3 |
|
||||||
|
| [RefCOCO+][refcoco+] (testA) | 72.7 | 74.7 | 73.6 | 76.1 | 77.7 | 76.6 |
|
||||||
|
| [RefCOCO+][refcoco+] (testB) | 64.2 | 68.4 | 67.1 | 67.0 | 71.1 | 68.6 |
|
||||||
|
| [RefCOCO+][refcoco+] (val) | 68.6 | 72.0 | 70.3 | 72.1 | 74.4 | 72.8 |
|
||||||
|
| [RefCOCOg][refcocog] (test) | 69.0 | 71.9 | 70.7 | 72.7 | 74.8 | 73.7 |
|
||||||
|
| [RefCOCOg][refcocog] (val) | 68.3 | 71.4 | 70.5 | 72.3 | 74.4 | 73.0 |
|
||||||
|
| [ST-VQA][st-vqa] (val) | 61.9 | 64.3 | 65.1 | 80.5 | 82.0 | 81.8 |
|
||||||
|
| [SciCap][scicap] | 165.1 | 159.5 | 156.9 | 183.3 | 177.2 | 172.7 |
|
||||||
|
| [ScienceQA][scienceqa] | 96.1 | 98.2 | 98.2 | 96.2 | 98.5 | 98.6 |
|
||||||
|
| [Screen2Words][screen2words] | 113.3 | 117.8 | 122.8 | 114.0 | 119.1 | 123.4 |
|
||||||
|
| [TallyQA][tallyqa] (complex) | 70.3 | 73.4 | 74.2 | 73.6 | 76.7 | 76.8 |
|
||||||
|
| [TallyQA][tallyqa] (simple) | 81.8 | 83.2 | 83.4 | 85.3 | 86.2 | 85.7 |
|
||||||
|
| [TextCaps][textcaps] | 127.5 | 137.9 | 139.9 | 152.1 | 157.7 | 153.6 |
|
||||||
|
| [TextVQA][textvqa] (val) | 59.6 | 64.0 | 64.7 | 75.2 | 76.6 | 76.2 |
|
||||||
|
| [VATEX][vatex] | 80.8 | 82.7 | - | - | - | - |
|
||||||
|
| [VQAv2][vqav2] (minival) | 83.0 | 84.3 | 84.5 | 84.8 | 85.8 | 85.8 |
|
||||||
|
| [VizWizVQA][vizwiz-vqa] (val) | 76.4 | 78.1 | 78.7 | 77.5 | 78.6 | 78.9 |
|
||||||
|
| [WidgetCap][widgetcap] | 138.1 | 139.8 | 138.8 | 151.4 | 151.9 | 148.9 |
|
||||||
|
| [XM3600][xm3600] (avg35) | 42.8 | 44.5 | 45.2 | 43.2 | 44.6 | 45.2 |
|
||||||
|
| [XM3600][xm3600] (en) | 79.8 | 80.7 | 81.0 | 80.3 | 81.5 | 81.0 |
|
||||||
|
| [xGQA][xgqa] (avg7) | 58.6 | 61.4 | 61.1 | 60.4 | 62.6 | 62.1 |
|
||||||
|
|
||||||
|
|
||||||
|
#### Additional Benchmarks
|
||||||
|
|
||||||
|
**[ICDAR 2015 Incidental][icdar2015-inc]**
|
||||||
|
|
||||||
|
| Model | Precision | Recall | F1 |
|
||||||
|
|-----------------|-----------|:------:|:-----:|
|
||||||
|
| PaliGemma 2 3B | 81.88 | 70.73 | 75.9 |
|
||||||
|
|
||||||
|
**[Total-Text][total-text]**
|
||||||
|
|
||||||
|
| Model | Precision | Recall | F1 |
|
||||||
|
|-----------------|-----------|:------:|:-----:|
|
||||||
|
| PaliGemma 2 3B | 73.8. | 74.54 | 74.17 |
|
||||||
|
|
||||||
|
**[FinTabNet][fintabnet]**
|
||||||
|
|
||||||
|
| Model | S-TEDS | TEDS | GriTS-Top | GriTS-Con |
|
||||||
|
|-----------------|--------|-------|-----------|-----------|
|
||||||
|
| PaliGemma 2 3B | 99.18 | 98.94 | 99.43 | 99.21 |
|
||||||
|
|
||||||
|
**[PubTabNet][pubtabnet]**
|
||||||
|
|
||||||
|
| Model | S-TEDS | TEDS | GriTS-Top | GriTS-Con |
|
||||||
|
|-----------------|--------|-------|-----------|-----------|
|
||||||
|
| PaliGemma 2 3B | 97.6 | 97.31 | 97.99 | 97.84 |
|
||||||
|
|
||||||
|
**[GrandStaff][grandstaff]**
|
||||||
|
|
||||||
|
| Model | CER | LER | SER |
|
||||||
|
|-----------------|-----|-----|-----|
|
||||||
|
| PaliGemma 2 3B | 1.6 | 6.7 | 2.3 |
|
||||||
|
|
||||||
|
**[PubChem][pubchem]**
|
||||||
|
|
||||||
|
* PaliGemma 2 3B, Full Match: 94.8
|
||||||
|
|
||||||
|
**[DOCCI][docci]**
|
||||||
|
|
||||||
|
| Model | avg#char | avg#sent | NES % |
|
||||||
|
|-----------------|----------|----------|---------|
|
||||||
|
| PaliGemma 2 3B | 529 | 7.74 | 28.42 |
|
||||||
|
| PaliGemma 2 10B | 521 | 7.45 | 20.27 |
|
||||||
|
|
||||||
|
- *avg#char*: Average number of characters
|
||||||
|
- *avg#sent*: Average number of sentences
|
||||||
|
- *NES*: Non entailment sentences
|
||||||
|
|
||||||
|
**[MIMIC-CXR][mimic-cxr]**
|
||||||
|
|
||||||
|
| Model | CIDEr | BLEU4 | Rouge-L | RadGraph F1 |
|
||||||
|
|-----------------|-------|-------|---------|-------------|
|
||||||
|
| PaliGemma 2 3B | 19.9% | 14.6% | 31.92% | 28.8% |
|
||||||
|
| PaliGemma 2 10B | 17.4% | 15% | 32.41% | 29.5% |
|
||||||
|
|
||||||
|
**[Visual Spatial Reasoning][vsr]**
|
||||||
|
|
||||||
|
| Model | VSR zeroshot split (test) | VSR random split (test) |
|
||||||
|
|-----------------|---------------------------|--------------------------|
|
||||||
|
| PaliGemma 2 3B | 0.75 | 0.82 |
|
||||||
|
| PaliGemma 2 10B | 0.80 | 0.87 |
|
||||||
|
|
||||||
|
## Ethics and safety
|
||||||
|
|
||||||
|
### Evaluation approach
|
||||||
|
|
||||||
|
Our evaluation methods include structured ethics and safety evaluations across
|
||||||
|
relevant content policies, including:
|
||||||
|
|
||||||
|
* Human evaluation on prompts covering child safety, content safety and
|
||||||
|
representational harms. See the [Gemma model
|
||||||
|
card](https://ai.google.dev/gemma/docs/model_card#evaluation_approach) for
|
||||||
|
more details on evaluation approach, but with image captioning and visual
|
||||||
|
question answering setups.
|
||||||
|
* Image-to-Text benchmark evaluation: Benchmark against relevant academic
|
||||||
|
datasets such as FairFace Dataset ([Karkkainen et al.,
|
||||||
|
2021](https://arxiv.org/abs/1908.04913)).
|
||||||
|
|
||||||
|
### Evaluation results
|
||||||
|
|
||||||
|
* The human evaluation results of ethics and safety evaluations are within
|
||||||
|
acceptable thresholds for meeting [internal
|
||||||
|
policies](https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11)
|
||||||
|
for categories such as child safety, content safety and representational
|
||||||
|
harms.
|
||||||
|
* On top of robust internal evaluations, we also use the Perspective API
|
||||||
|
(threshold of 0.8) to measure toxicity, profanity, and other potential
|
||||||
|
issues in the generated captions for images sourced from the FairFace
|
||||||
|
dataset. We report the maximum and median values observed across subgroups
|
||||||
|
for each of the perceived gender, ethnicity, and age attributes.
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<col>
|
||||||
|
<colgroup span="3"></colgroup>
|
||||||
|
<colgroup span="3"></colgroup>
|
||||||
|
<colgroup span="3"></colgroup>
|
||||||
|
<th>Metric</th>
|
||||||
|
<th colspan="3" scope="colgroup">Perceived gender</th>
|
||||||
|
<th colspan="3" scope="colgroup">Ethnicity</th>
|
||||||
|
<th colspan="3" scope="colgroup">Age group</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th>Model size</th>
|
||||||
|
<th scope="col">3B</th>
|
||||||
|
<th scope="col">10B</th>
|
||||||
|
<th scope="col">28B</th>
|
||||||
|
<th scope="col">3B</th>
|
||||||
|
<th scope="col">10B</th>
|
||||||
|
<th scope="col">28B</th>
|
||||||
|
<th scope="col">3B</th>
|
||||||
|
<th scope="col">10B</th>
|
||||||
|
<th scope="col">28B</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th></th>
|
||||||
|
<th colspan="9" scope="colgroup">Maximum</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Toxicity</td>
|
||||||
|
<td>0.14%</td>
|
||||||
|
<td>0.15%</td>
|
||||||
|
<td>0.19%</td>
|
||||||
|
<td>0.29%</td>
|
||||||
|
<td>0.39%</td>
|
||||||
|
<td>0.39%</td>
|
||||||
|
<td>0.26%</td>
|
||||||
|
<td>0.18%</td>
|
||||||
|
<td>0.32%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Identity Attack</td>
|
||||||
|
<td>0.04%</td>
|
||||||
|
<td>0.02%</td>
|
||||||
|
<td>0.02%</td>
|
||||||
|
<td>0.13%</td>
|
||||||
|
<td>0.06%</td>
|
||||||
|
<td>0.06%</td>
|
||||||
|
<td>0.06%</td>
|
||||||
|
<td>0.03%</td>
|
||||||
|
<td>0.06%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Insult</td>
|
||||||
|
<td>0.17%</td>
|
||||||
|
<td>0.25%</td>
|
||||||
|
<td>0.17%</td>
|
||||||
|
<td>0.37%</td>
|
||||||
|
<td>0.52%</td>
|
||||||
|
<td>0.52%</td>
|
||||||
|
<td>0.27%</td>
|
||||||
|
<td>0.39%</td>
|
||||||
|
<td>0.24%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Threat</td>
|
||||||
|
<td>0.55%</td>
|
||||||
|
<td>0.43%</td>
|
||||||
|
<td>0.57%</td>
|
||||||
|
<td>0.83%</td>
|
||||||
|
<td>0.48%</td>
|
||||||
|
<td>0.48%</td>
|
||||||
|
<td>0.64%</td>
|
||||||
|
<td>0.43%</td>
|
||||||
|
<td>0.64%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Profanity</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<th></th>
|
||||||
|
<th colspan="9" scope="colgroup">Median</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Toxicity</td>
|
||||||
|
<td>0.13%</td>
|
||||||
|
<td>0.10%</td>
|
||||||
|
<td>0.18%</td>
|
||||||
|
<td>0.07%</td>
|
||||||
|
<td>0.07%</td>
|
||||||
|
<td>0.14%</td>
|
||||||
|
<td>0.12%</td>
|
||||||
|
<td>0.08%</td>
|
||||||
|
<td>0.12%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Identity Attack</td>
|
||||||
|
<td>0.02%</td>
|
||||||
|
<td>0.01%</td>
|
||||||
|
<td>0.02%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Insult</td>
|
||||||
|
<td>0.15%</td>
|
||||||
|
<td>0.23%</td>
|
||||||
|
<td>0.14%</td>
|
||||||
|
<td>0.14%</td>
|
||||||
|
<td>0.17%</td>
|
||||||
|
<td>0.13%</td>
|
||||||
|
<td>0.09%</td>
|
||||||
|
<td>0.18%</td>
|
||||||
|
<td>0.16%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Threat</td>
|
||||||
|
<td>0.35%</td>
|
||||||
|
<td>0.27%</td>
|
||||||
|
<td>0.41%</td>
|
||||||
|
<td>0.28%</td>
|
||||||
|
<td>0.19%</td>
|
||||||
|
<td>0.42%</td>
|
||||||
|
<td>0.27%</td>
|
||||||
|
<td>0.31%</td>
|
||||||
|
<td>0.40%</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Profanity</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
<td>0.00%</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
## Usage and limitations
|
||||||
|
|
||||||
|
### Intended usage
|
||||||
|
|
||||||
|
Open Vision Language Models (VLMs) have a wide range of applications across
|
||||||
|
various industries and domains. The following list of potential uses is not
|
||||||
|
comprehensive. The purpose of this list is to provide contextual information
|
||||||
|
about the possible use-cases that the model creators considered as part of model
|
||||||
|
training and development. Prohibited uses of Gemma models are outlined in the
|
||||||
|
[Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy).
|
||||||
|
|
||||||
|
Fine-tune on specific vision-language task:
|
||||||
|
|
||||||
|
* The pre-trained models can be fine-tuned on a wide range of vision-language
|
||||||
|
tasks such as: image captioning, short video caption, visual question
|
||||||
|
answering, text reading, object detection and object segmentation.
|
||||||
|
* The pre-trained models can be fine-tuned for specific domains such as remote
|
||||||
|
sensing question answering, visual questions from people who are blind,
|
||||||
|
science question answering, describe UI element functionalities.
|
||||||
|
* The pre-trained models can be fine-tuned for tasks with non-textual outputs
|
||||||
|
such as bounding boxes or segmentation masks.
|
||||||
|
|
||||||
|
Vision-language research:
|
||||||
|
|
||||||
|
* The pre-trained models and fine-tuned models can serve as a foundation for
|
||||||
|
researchers to experiment with VLM techniques, develop algorithms, and
|
||||||
|
contribute to the advancement of the field.
|
||||||
|
|
||||||
|
### Ethical considerations and risks
|
||||||
|
|
||||||
|
The development of vision-language models (VLMs) raises several ethical
|
||||||
|
concerns. In creating an open model, we have carefully considered the following:
|
||||||
|
|
||||||
|
* Bias and Fairness
|
||||||
|
* VLMs trained on large-scale, real-world image-text data can reflect
|
||||||
|
socio-cultural biases embedded in the training material. These models
|
||||||
|
underwent careful scrutiny, input data pre-processing described and
|
||||||
|
posterior evaluations reported in this card.
|
||||||
|
* Misinformation and Misuse
|
||||||
|
* VLMs can be misused to generate text that is false, misleading, or
|
||||||
|
harmful.
|
||||||
|
* Guidelines are provided for responsible use with the model, see the
|
||||||
|
[Responsible Generative AI Toolkit](https://ai.google.dev/responsible).
|
||||||
|
* Transparency and Accountability
|
||||||
|
* This model card summarizes details on the models' architecture,
|
||||||
|
capabilities, limitations, and evaluation processes.
|
||||||
|
* A responsibly developed open model offers the opportunity to share
|
||||||
|
innovation by making VLM technology accessible to developers and
|
||||||
|
researchers across the AI ecosystem.
|
||||||
|
|
||||||
|
Risks identified and mitigations:
|
||||||
|
|
||||||
|
* **Perpetuation of biases:** It's encouraged to perform continuous monitoring
|
||||||
|
(using evaluation metrics, human review) and the exploration of de-biasing
|
||||||
|
techniques during model training, fine-tuning, and other use cases.
|
||||||
|
* **Generation of harmful content:** Mechanisms and guidelines for content
|
||||||
|
safety are essential. Developers are encouraged to exercise caution and
|
||||||
|
implement appropriate content safety safeguards based on their specific
|
||||||
|
product policies and application use cases.
|
||||||
|
* **Misuse for malicious purposes:** Technical limitations and developer and
|
||||||
|
end-user education can help mitigate against malicious applications of LLMs.
|
||||||
|
Educational resources and reporting mechanisms for users to flag misuse are
|
||||||
|
provided: see the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible).
|
||||||
|
Prohibited uses of Gemma models are outlined in the
|
||||||
|
[Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy).
|
||||||
|
* **Privacy violations:** Models were trained on data filtered to remove
|
||||||
|
certain personal information and sensitive data. Developers are encouraged
|
||||||
|
to adhere to privacy regulations with privacy-preserving techniques.
|
||||||
|
|
||||||
|
### Limitations
|
||||||
|
|
||||||
|
* Most limitations inherited from the underlying Gemma 2 models still apply:
|
||||||
|
* VLMs are better at tasks that can be framed with clear prompts and
|
||||||
|
instructions. Open-ended or highly complex tasks might be challenging.
|
||||||
|
* Natural language is inherently complex. VLMs might struggle to grasp
|
||||||
|
subtle nuances, sarcasm, or figurative language.
|
||||||
|
* VLMs generate responses based on information they learned from their
|
||||||
|
training datasets, but they are not knowledge bases. They may generate
|
||||||
|
incorrect or outdated factual statements.
|
||||||
|
* VLMs rely on statistical patterns in language and images. They might
|
||||||
|
lack the ability to apply common sense reasoning in certain situations.
|
||||||
|
* PaliGemma 2 was designed first and foremost to serve as a general
|
||||||
|
pre-trained model for fine-tuning to specialized tasks. Hence, its "out of
|
||||||
|
the box" or "zero-shot" performance might lag behind models designed
|
||||||
|
specifically for general purpose use.
|
||||||
|
* PaliGemma 2 is not a multi-turn chatbot. It is designed for a single round
|
||||||
|
of image and text input.
|
||||||
|
|
||||||
|
|
||||||
|
[ai2d]: https://allenai.org/data/diagrams
|
||||||
|
[aokvqa-da]: https://allenai.org/project/a-okvqa/home
|
||||||
|
[aokvqa-mc]: https://allenai.org/project/a-okvqa/home
|
||||||
|
[anet-cap]: https://paperswithcode.com/dataset/activitynet-captions
|
||||||
|
[anet-qa]: https://arxiv.org/abs/1906.02467
|
||||||
|
[chartqa]: https://arxiv.org/abs/2203.10244
|
||||||
|
[coco-35l]: https://arxiv.org/pdf/2205.12522
|
||||||
|
[coco-cap]: https://cocodataset.org/#home
|
||||||
|
[countbenchqa]: https://github.com/google-research/big_vision/blob/main/big_vision/datasets/countbenchqa/
|
||||||
|
[docvqa]: https://www.docvqa.org/
|
||||||
|
[gqa]: https://cs.stanford.edu/people/dorarad/gqa/about.html
|
||||||
|
[info-vqa]: https://arxiv.org/abs/2104.12756
|
||||||
|
[marvl]: https://marvl-challenge.github.io/
|
||||||
|
[msrvtt]: https://paperswithcode.com/dataset/msr-vtt
|
||||||
|
[msvd-qa]: https://paperswithcode.com/dataset/msvd-qa
|
||||||
|
[nlvr2]: https://lil.nlp.cornell.edu/nlvr/
|
||||||
|
[nocaps]: https://nocaps.org/
|
||||||
|
[ocr-vqa]: https://ocr-vqa.github.io/
|
||||||
|
[okvqa]: https://okvqa.allenai.org/
|
||||||
|
[refcoco]: https://arxiv.org/abs/1608.00272
|
||||||
|
[refcoco+]: https://aclanthology.org/D14-1086
|
||||||
|
[refcocog]: https://arxiv.org/abs/1511.02283
|
||||||
|
[rsvqa-hr]: https://zenodo.org/records/6344367
|
||||||
|
[rsvqa-lr]: https://zenodo.org/records/6344334
|
||||||
|
[st-vqa]: https://arxiv.org/abs/1905.13648
|
||||||
|
[scicap]: https://arxiv.org/abs/2110.11624
|
||||||
|
[scienceqa]: https://scienceqa.github.io/
|
||||||
|
[screen2words]: https://arxiv.org/abs/2108.03353
|
||||||
|
[tallyqa]: https://arxiv.org/abs/1810.12440
|
||||||
|
[textcaps]: https://textvqa.org/textcaps/
|
||||||
|
[textvqa]: https://textvqa.org/
|
||||||
|
[vatex]: https://arxiv.org/abs/1904.03493
|
||||||
|
[vizwiz-vqa]: https://vizwiz.org/tasks-and-datasets/vqa/
|
||||||
|
[widgetcap]: https://arxiv.org/abs/2010.04295
|
||||||
|
[vqav2]: https://visualqa.org/index.html
|
||||||
|
[xgqa]: https://aclanthology.org/2022.findings-acl.196/
|
||||||
|
[xm3600]: https://arxiv.org/pdf/2205.12522
|
||||||
|
|
||||||
|
[icdar2015-inc]: https://arxiv.org/abs/1511.09207
|
||||||
|
[total-text]: https://paperswithcode.com/paper/total-text-a-comprehensive-dataset-for-scene
|
||||||
|
[fintabnet]: https://developer.ibm.com/data/fintabnet/
|
||||||
|
[pubtabnet]: https://paperswithcode.com/dataset/pubtabnet
|
||||||
|
[grandstaff]: https://link.springer.com/article/10.1007/s10032-023-00432-z
|
||||||
|
[pubchem]: https://pmc.ncbi.nlm.nih.gov/articles/PMC7352161/
|
||||||
|
[docci]: https://research.google/pubs/docci-descriptions-of-connected-and-contrasting-images/
|
||||||
|
[mimic-cxr]: https://paperswithcode.com/dataset/mimic-cxr
|
||||||
|
[vsr]: https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00566/116470/Visual-Spatial-Reasoning
|
||||||
|
|
|
@ -0,0 +1,53 @@
|
||||||
|
{
|
||||||
|
"_vocab_size": 257152,
|
||||||
|
"architectures": [
|
||||||
|
"PaliGemmaForConditionalGeneration"
|
||||||
|
],
|
||||||
|
"bos_token_id": 2,
|
||||||
|
"eos_token_id": 1,
|
||||||
|
"image_token_index": 257152,
|
||||||
|
"model_type": "paligemma",
|
||||||
|
"num_hidden_layers": 26,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"projection_dim": 2304,
|
||||||
|
"text_config": {
|
||||||
|
"architectures": [
|
||||||
|
"Gemma2ForCausalLM"
|
||||||
|
],
|
||||||
|
"attn_logit_softcapping": 50.0,
|
||||||
|
"cache_implementation": "hybrid",
|
||||||
|
"eos_token_id": [
|
||||||
|
1,
|
||||||
|
107
|
||||||
|
],
|
||||||
|
"final_logit_softcapping": 30.0,
|
||||||
|
"hidden_act": "gelu_pytorch_tanh",
|
||||||
|
"hidden_activation": "gelu_pytorch_tanh",
|
||||||
|
"hidden_size": 2304,
|
||||||
|
"intermediate_size": 9216,
|
||||||
|
"model_type": "gemma2",
|
||||||
|
"num_attention_heads": 8,
|
||||||
|
"num_hidden_layers": 26,
|
||||||
|
"num_image_tokens": 256,
|
||||||
|
"num_key_value_heads": 4,
|
||||||
|
"query_pre_attn_scalar": 256,
|
||||||
|
"sliding_window": 4096,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"vocab_size": 257216
|
||||||
|
},
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"transformers_version": "4.47.0.dev0",
|
||||||
|
"vision_config": {
|
||||||
|
"hidden_size": 1152,
|
||||||
|
"intermediate_size": 4304,
|
||||||
|
"model_type": "siglip_vision_model",
|
||||||
|
"num_attention_heads": 16,
|
||||||
|
"num_hidden_layers": 27,
|
||||||
|
"num_image_tokens": 256,
|
||||||
|
"num_positions": 256,
|
||||||
|
"patch_size": 14,
|
||||||
|
"projection_dim": 2304,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"vision_use_head": false
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1 @@
|
||||||
|
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
|
@ -0,0 +1,8 @@
|
||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 2,
|
||||||
|
"cache_implementation": "hybrid",
|
||||||
|
"eos_token_id": 1,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"transformers_version": "4.47.0.dev0"
|
||||||
|
}
|
Binary file not shown.
Binary file not shown.
|
@ -0,0 +1,734 @@
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_size": 6064484832
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"language_model.model.embed_tokens.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.20.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"language_model.model.norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"multi_modal_projector.linear.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"multi_modal_projector.linear.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.embeddings.patch_embedding.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.embeddings.patch_embedding.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.embeddings.position_embedding.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.layer_norm1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.layer_norm1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.layer_norm2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.layer_norm2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.post_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"vision_tower.vision_model.post_layernorm.weight": "model-00001-of-00002.safetensors"
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,25 @@
|
||||||
|
{
|
||||||
|
"do_convert_rgb": null,
|
||||||
|
"do_normalize": true,
|
||||||
|
"do_rescale": true,
|
||||||
|
"do_resize": true,
|
||||||
|
"image_mean": [
|
||||||
|
0.5,
|
||||||
|
0.5,
|
||||||
|
0.5
|
||||||
|
],
|
||||||
|
"image_processor_type": "SiglipImageProcessor",
|
||||||
|
"image_seq_length": 256,
|
||||||
|
"image_std": [
|
||||||
|
0.5,
|
||||||
|
0.5,
|
||||||
|
0.5
|
||||||
|
],
|
||||||
|
"processor_class": "PaliGemmaProcessor",
|
||||||
|
"resample": 3,
|
||||||
|
"rescale_factor": 0.00392156862745098,
|
||||||
|
"size": {
|
||||||
|
"height": 224,
|
||||||
|
"width": 224
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,39 @@
|
||||||
|
{
|
||||||
|
"additional_special_tokens": [
|
||||||
|
{
|
||||||
|
"content": "<image>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<bos>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<eos>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<pad>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"unk_token": {
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue