Repository Details
Shared by
HelloGitHub Rating
10.0
1 ratings
Free•MIT
Claim
Discuss
Collect
Share
12.9k
Stars
No
Chinese
Python
Language
Yes
Active
27
Contributors
96
Issues
Yes
Organization
2.10.0
Latest
644
Forks
MIT
License
More
This is an open-source Python tool developed by IBM, specifically designed for converting various documents into formats suitable for generative AI use. It is capable of exporting multiple popular document formats such as PDF, DOCX, PPTX, images, HTML, and Markdown into Markdown and JSON formats. The tool supports various OCR engines for PDF, a unified document object model (DoclingDocument), and can be easily integrated with Retrieval-Augmented Generation (RAG) and question-answering applications. It is ideal for scenarios that require documents as inputs for generative AI models.
Tags:
Python
Comments
Rating:
No comments yet