All Tools
M
DataFreeOpen Source
MINERU
Turn complex documents into LLM-ready Markdown and JSON
MinerU Open Source License
ABOUT
Real-world documents are messy: scans, tables, formulas, figures, and multi-column layouts are hard to extract cleanly with basic OCR or brittle custom scripts. MinerU converts complex business and research documents into structured Markdown and JSON with layout awareness, OCR, and table handling, making them much easier to ingest into RAG systems, agent workflows, and downstream data pipelines.
INSTALL
pip install --upgrade pip
pip install uv
uv pip install -U "mineru[all]"
INTEGRATION GUIDE
1. Parse research papers and technical PDFs into structured Markdown for retrieval and question answering
2. Extract tables, formulas, and figures from business reports before loading them into analytics or RAG pipelines
3. Convert scanned multilingual documents into machine-readable JSON for agent automation workflows
4. Process office files and PDFs at scale for document understanding, indexing, and knowledge base creation
TAGS
document-processingpdfocrmarkdownjsonragagentstable-extractionmultimodal