This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# vit-large-patch16-384_a13819965258461184668209
Vision Transformer (ViT) 模型在 ImageNet-21k(1400 万张图像,21843 个类别)上以 224x224 的分辨率进行预训练,并在 ImageNet 2012(100 万张图像,1000 个类别)上以 384x384 的分辨率进行微调。