HomeToolsMCPHow It WorksStoriesPhilosophyArchitectureStar on GitHub
All Tools
M
DataFreeOpen Source

MINERU

Turn complex documents into LLM-ready Markdown and JSON

MinerU Open Source License

ABOUT

Real-world documents are messy: scans, tables, formulas, figures, and multi-column layouts are hard to extract cleanly with basic OCR or brittle custom scripts. MinerU converts complex business and research documents into structured Markdown and JSON with layout awareness, OCR, and table handling, making them much easier to ingest into RAG systems, agent workflows, and downstream data pipelines.

INSTALL
pip install --upgrade pip pip install uv uv pip install -U "mineru[all]"

INTEGRATION GUIDE

1. Parse research papers and technical PDFs into structured Markdown for retrieval and question answering 2. Extract tables, formulas, and figures from business reports before loading them into analytics or RAG pipelines 3. Convert scanned multilingual documents into machine-readable JSON for agent automation workflows 4. Process office files and PDFs at scale for document understanding, indexing, and knowledge base creation

TAGS

document-processingpdfocrmarkdownjsonragagentstable-extractionmultimodal
MinerU — AI Tool | Agentic AI For Good