微软团队提出了一个新的架构，名为CvT，它通过在ViT中引入卷积来改进ViT的性能和效率。这是通过两个主要的改进来实现的：一个是新的卷积token embedding，一个是利用卷积投影的卷积变形器块。这些变化为ViT架构引入了卷积神经网络（CNN）的理想特性（如平移、缩放和失真不变性），同时保持了变形器的优点（如动态注意力、全局背景和更好的概括性）。

Go to file

YYJ-aaaa 318b834b85 first commit		2024-10-29 15:05:19 +08:00
.gitattributes	Add .gitattributes	2024-10-29 15:00:03 +08:00
README.md	Initial commit	2024-10-29 15:00:02 +08:00
config.json	first commit	2024-10-29 15:05:19 +08:00
model.safetensors	first commit	2024-10-29 15:05:19 +08:00
preprocessor_config.json	first commit	2024-10-29 15:05:19 +08:00
pytorch_model.bin	first commit	2024-10-29 15:05:19 +08:00
tf_model.h5	first commit	2024-10-29 15:05:19 +08:00

README.md

cvt-13_a13411654355644416189022

README.md Unescape Escape

cvt-13_a13411654355644416189022

README.md