Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 5 document-understanding Open-Source Projects
-
ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
document-ai-samples
Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Project mention: RAGFlow is an open-source RAG engine based on deep document understanding | news.ycombinator.com | 2024-04-01Just link them to https://github.com/infiniflow/ragflow/blob/main/rag/llm/chat... :)
Project mention: Show HN: Beyond text splitting – improved file parsing for LLM's | news.ycombinator.com | 2024-04-07https://github.com/deepdoctection/deepdoctection
Have you tried this ?
Thanks for the example and that sounds really solid cost savings and definitely agree with the trend that it is here to stay.
For invoice parsing (various formats), are you just using GPT4V? When GPT4V initially came out, i benchmarked it against an out of the box invoice parser from Google Cloud (https://cloud.google.com/document-ai) on 16 documents and it was much better accuracy wise. For ex: i'd get results parsing 10,100 as 101100 (no comma).
Curious if you saw problems like this in your pipeline or if its gotten much better since?
Week 5: 👓Optical Character Recognition (OCR) & 🔑Keyword Search
document-understanding related posts
- When Will the GenAI Bubble Burst?
- RAGFlow is an open-source RAG engine based on deep document understanding
- Based on latest advancements in document transformers, what strategy would you use to parse utility bills?
- [R] Are there any open-source implementations of Document Understanding pipelines?
- Large-Scale Self-Supervised Pre-Training Across Tasks, Languages, and Modalities
- WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing achieves SOTA performance on the SUPERB benchmark
- Microsoft AI Unveils ‘TrOCR’, An End-To-End Transformer-Based OCR Model For Text Recognition With Pre-Trained Models
-
A note from our sponsor - InfluxDB
www.influxdata.com | 27 Apr 2024
Index
What are some of the best open-source document-understanding projects? This list will help you:
Project | Stars | |
---|---|---|
1 | ragflow | 5,516 |
2 | deepdoctection | 2,172 |
3 | awesome-document-understanding | 1,115 |
4 | document-ai-samples | 182 |
5 | pytesseract-ocr-plugin | 8 |
Sponsored