下拉刷新
Repository Details
Shared bynavbar_avatar
repo_avatar
HelloGitHub Rating
0 ratings
Library for Intelligent PDF Document Parsing
FreeApache-2.0
Claim
Collect
Share
9.4k
Stars
No
Chinese
Python
Language
Yes
Active
4
Contributors
76
Issues
Yes
Organization
0.1.58
Latest
614
Forks
Apache-2.0
License
More
olmocr image
This project leverages Vision-Language Models (VLMs) to parse and linearize complex PDF documents, converting unstructured content (such as multi-column text, tables, embedded images, mixed font styles, and layouts) into continuous, structured text representations. It supports the full process of distributed multi-node parsing of millions of PDF documents, enabling the construction of high-quality datasets for Large Language Models (LLMs).

Comments

Rating:
No comments yet