An intelligent PDF extraction router that classifies pages and routes them to optimal processing backends (PyMuPDF, Docling, OCR). Includes confidence scoring and auto-reextraction to prevent silent failures in RAG pipelines.
pdfmux is a PDF extraction router that automatically classifies each page in a PDF document (digital, scanned, or table-based) and routes it to the most appropriate processing backend. It uses PyMuPDF, Docling, OCR, and optional LLM fallback to extract content with maximum accuracy. The tool includes per-page confidence scoring that flags low-quality extractions and automatically re-processes them, preventing silent failures in RAG (Retrieval-Augmented Generation) systems.
Installation is straightforward with zero configuration required. Simply run pip install pdfmux to install the package. The tool comes with a built-in MCP (Model Context Protocol) server, making it easy to integrate into AI workflows and applications without additional setup steps.
Monday.com MCP Server streamlines board management, item operations, and workflow automation for teams. I…
от NotionFlow
Sentry MCP Server provides comprehensive error tracking and performance monitoring, helping developers id…
от AnalyticsPro
Cloudflare MCP Server simplifies Cloudflare management by providing tools for DNS management, Workers dep…
от PricingBot