Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 4 table-structure-recognition Open-Source Projects
-
ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
table-transformer
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
-
CascadeTabNet
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
-
Multi-Type-TD-TSR
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
Project mention: Better RAG Results with Reciprocal Rank Fusion and Hybrid Search | news.ycombinator.com | 2024-05-30Within our open source RAG product RAGFlow(https://github.com/infiniflow/ragflow), Elasticsearch is currently used instead of other general vector databases, because it can provide hybrid search right now. Under the default cases, embedding based reranker is not required, just RRF is enough, while even if reranker is used, keywords based retrieval is also a MUST to be hybridized with embedding based retrieval, that's just what RAGFlow's latest 0.7 release has provided.
On the other hand let me introduce another database we developed, Infinity(https://github.com/infiniflow/infinity), which can provide the fastest hybrid search, you can see the performance here(https://github.com/infiniflow/infinity/blob/main/docs/refere...), both vector search and full-text search could perform much faster than other open source alternatives.
From the next version(weeks later), Infinity will also provide more comprehensive hybrid search capabilities, what you have mentioned the 3-way recalls(dense vector, sparse vector, keyword search) could be provided within single request.
Saw this last time but never played with it https://github.com/microsoft/table-transformer
table-structure-recognition discussion
table-structure-recognition related posts
-
Integrated Rerankers, implemented RAPTOR, RAGFlow 0.7 released
-
Ask HN: RAG and unstructured data from several docs
-
DeepSeek-V2 integrated, RAGFlow v0.5.0 is released
-
Data extraction from pdf
-
[P] OCR + Table Extraction Advice
-
[D] Unimpressive improvement in training speed after upgrading from GTX 980 Ti to RTX 4090
-
Microsoft TableTransformer
-
A note from our sponsor - InfluxDB
www.influxdata.com | 15 Jun 2024
Index
What are some of the best open-source table-structure-recognition projects? This list will help you:
Project | Stars | |
---|---|---|
1 | ragflow | 9,054 |
2 | table-transformer | 1,936 |
3 | CascadeTabNet | 1,444 |
4 | Multi-Type-TD-TSR | 243 |