Top 3 Rust Document Processing Projects
-
LiteParse is a fast, local document parser for extracting text from clean, well-structured files. It handles PDFs, DOCX, HTML, and more, with minimal setup and no API calls. Everything runs locally, so your documents never leave your environment.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
pdf_oxide
The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.
Project mention: PDF Oxide – Fast PDF library in Rust with Python bindings – 0.8ms,100% pass rate | news.ycombinator.com | 2026-02-24 -
doc2dataset
3DCF / doc2dataset: token-efficient document layer with NumGuard numeric integrity and multi-framework exports for RAG & fine-tuning.
Project mention: I Built an Open-Source Pipeline to Convert Documents into LLM Training Data | dev.to | 2025-12-07See the GitHub repo link
Rust Document Processing discussion
Rust Document Processing related posts
Index
What are some of the best open-source Document Processing projects in Rust? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | liteparse | 9,929 |
| 2 | pdf_oxide | 817 |
| 3 | doc2dataset | 57 |