Top 3 pdf-to-text Open-Source Projects
-
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
-
ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Docotic.Pdf
Docotic.Pdf library can create, edit, draw and print PDF files in .NET Core, ASP.NET, Windows Forms, WPF, Xamarin, Blazor, Unity, and HoloLense applications. The library is a 100% managed assembly without unsafe blocks. The assembly has no external dependencies.
Be careful with unstructured:
https://github.com/Unstructured-IO/unstructured/blob/d11c70c...
from: https://github.com/open-webui/open-webui/issues/687
Project mention: RAGFlow is an open-source RAG engine based on deep document understanding | news.ycombinator.com | 2024-04-01Just link them to https://github.com/infiniflow/ragflow/blob/main/rag/llm/chat... :)
Index
What are some of the best open-source pdf-to-text projects? This list will help you:
Project | Stars | |
---|---|---|
1 | unstructured | 6,515 |
2 | ragflow | 6,507 |
3 | Docotic.Pdf | 65 |
Sponsored