Google 的 Pix2Struct是一种用于理解视觉情境语言的预训练图像到文本模型。该模型使用新颖的学习技术进行训练，将网页的屏幕截图解析为简化的HTML，为一系列下游活动提供非常适合的预训练数据源。

Go to file

YYJ-aaaa 16318e0819 add file		2024-09-25 16:28:31 +08:00
.gitattributes	Add .gitattributes	2024-09-25 15:28:34 +08:00
README.md	Initial commit	2024-09-25 15:28:34 +08:00
config.json	add file	2024-09-25 16:28:31 +08:00
model.safetensors	add file	2024-09-25 16:28:31 +08:00
preprocessor_config.json	add file	2024-09-25 16:28:31 +08:00
pytorch_model.bin	add file	2024-09-25 16:28:31 +08:00
special_tokens_map.json	add file	2024-09-25 16:28:31 +08:00
spiece.model	add file	2024-09-25 16:28:31 +08:00
tokenizer.json	add file	2024-09-25 16:28:31 +08:00
tokenizer_config.json	add file	2024-09-25 16:28:31 +08:00

README.md

pix2struct-large

Google 的 Pix2Struct是一种用于理解视觉情境语言的预训练图像到文本模型。该模型使用新颖的学习技术进行训练，将网页的屏幕截图解析为简化的HTML，为一系列下游活动提供非常适合的预训练数据源。