视觉Transformer（ViT）是一种类似BERT的变换器编码器模型，它在一个大型图像集合上以有监督的方式预训练，即在分辨率为224x224像素的ImageNet-21k数据集上进行预训练。

Go to file

pice35408784b54431987c4d13c457b9cd 9db3aa7d45 Add .gitattributes Signed-off-by: pice35408784b54431987c4d13c457b9cd <c457b9cd@leinao.ai>		2024-11-12 16:24:29 +08:00
.gitattributes	Add .gitattributes	2024-11-12 16:24:29 +08:00
README.md	Initial commit	2024-11-12 16:24:29 +08:00

vit-base-patch32-384_a13570863137681408750282