open-parse
deepdoctection
open-parse | deepdoctection | |
---|---|---|
3 | 8 | |
1,782 | 2,209 | |
- | 6.9% | |
9.2 | 9.2 | |
9 days ago | 4 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
open-parse
- Show HN: Beyond text splitting – improved file parsing for LLM's
-
Running OCR against PDFs and images directly in the browser
I recently built a similar tool except it’s configured to use some deep learning libraries for the table extraction. I’m excited to integrate unitable which has state of the art performance later this week.
I built this because most of the basic layout detection libraries have terrible performance on anything non trivial. Deep learning is really the long term solution here.
https://github.com/Filimoa/open-parse
- Show HN: Open-source, high performance document chunking for LLM's
deepdoctection
-
Show HN: Beyond text splitting – improved file parsing for LLM's
https://github.com/deepdoctection/deepdoctection
Have you tried this ?
-
April 2023
DeepDoctection: Document extraction and analysis using deep learning models (https://github.com/deepdoctection/deepdoctection)
- DeepDoctection: Document extraction and analysis using deep learning models
- DeepDoctection
-
[D] Can I use ML/AI to read the back panels of electronic components?
deepdoctection/deepdoctection: A Repo For Document AI
What are some alternatives?
DocumentInformationExtraction - Key Information Extraction From Documents: Evaluation And Generator
Flowise - Drag & drop UI to build your customized LLM flow
CascadeTabNet - This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
PentestGPT - A GPT-empowered penetration testing tool
donut - Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
bark - 🔊 Text-Prompted Generative Audio Model
Information-extraction-from-document - Graph Key Information Extraction: GKIE
JARVIS - JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
unstructured - Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
loopgpt - Modular Auto-GPT Framework
Selefra - The open-source policy-as-code software that provides analysis for Multi-Cloud and SaaS environments, you can get insight with natural language (powered by OpenAI).
Graph-Key-Information-Extraction-from-Documents