Repository Details
Shared by
HelloGitHub Rating
10.0
1 ratings
Free•MIT
Claim
Discuss
Collect
Share
42.6k
Stars
No
Chinese
Python
Language
Yes
Active
142
Contributors
680
Issues
Yes
Organization
2.58.0
Latest
3k
Forks
MIT
License
More

This is a Python tool open-sourced by IBM, specifically designed to convert various documents into formats suitable for generative AI. It can export multiple popular document formats such as PDF, DOCX, PPTX, images, HTML, and Markdown into Markdown and JSON formats. It supports multiple OCR engines (for PDF) and a unified document object (DoclingDocument), and can be easily integrated into retrieval-augmented generation (RAG) and question-answering applications. It is suitable for scenarios where documents need to be used as input for generative AI models.
Comments
Rating:
No comments yet