awesome-document-understanding
ripgrep-all
Our great sponsors
awesome-document-understanding | ripgrep-all | |
---|---|---|
4 | 43 | |
1,115 | 6,177 | |
- | - | |
4.5 | 8.0 | |
11 months ago | 2 months ago | |
Rust | ||
- | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
awesome-document-understanding
-
[R] Are there any open-source implementations of Document Understanding pipelines?
I have worked on several Document Understanding (DU) projects for my company during the last year. We've mainly used UiPath and Google's DocumentAI.
-
Pdfsandwich
While trying to find a specific project I recalled, I encountered this list of projects which might be of interest: https://github.com/tstanislawek/awesome-document-understandi...
The project I had in mind was similar to this one but I can't remember the name currently: https://github.com/tabulapdf/tabula
However, if you're looking for a ML-based, invoice-specific project looks like the other comment to your reply might be more useful.
-
Extract informations from invoices with machine learning
Check out this repository for inspiration: https://github.com/tstanislawek/awesome-document-understanding
-
[P] Curated List of Document Understanding (DU) Papers & Resources.
In the last few years, I spent a lot of time working on automate business processes of big companies and seeing rising interest in DU topics (especially from Key Information Extraction field). Therefore, I create a list https://github.com/tstanislawek/awesome-document-understanding of resources to make easier to track all the papers out there which are relevant to this topic.
ripgrep-all
- Ripgrep-all: rga: ripgrep, but also search PDFs, E-Books, Office documents, zip
-
Ripgrep is faster than {grep, ag, Git grep, ucg, pt, sift}
I searched in portage, and it seems there is another version working also with other documents like PDFs and doc.
https://github.com/phiresky/ripgrep-all
-
Calibre – New in Calibre 7.0
If you want even faster search across different formats, you can try ripgrep-all ( https://github.com/phiresky/ripgrep-all ). It can search across epub, docx, pdf, zip, mp4 etc. If you are handy with the tool, you can write custom adaptor to search across images using OCR with tesseract.
- Rga: Ripgrep, but also search in PDF, ebooks, office documents, zip, tar.gz etc.
-
Show HN: Khoj – Chat Offline with Your Second Brain Using Llama 2
1. If you want better adoption especially among corporations, GPL-3 wont cut it. Maybe think of some business friendly licenses (MIT etc)
2. I understand the excitement about llm's. But how about making something more accessible. I use rip-grep-all (rga) along with fzf [1] that can search all files including pdfs in a specific folders. However, I would like a GUI tool to search across multiple folders, provide priority of results across folders and store and search histories where I can do a meta-search. This is sufficient for 95% of my usecases to search locally and I dont need LLM. If khoj can enable such search as default without LLM that will be a gamechanger for many people without a heavy compute machine or who dont want to use OpenAI.
[1] https://github.com/phiresky/ripgrep-all/wiki/fzf-Integration
-
How to make file paths clickable?
I use `rga` to search through multiple PDF files for work. The tool returns a list of files and I would like to make those file paths clickable.
- Burgr – Books in Your Terminal
-
Is there a way to searching multiple epub and pdf?
rga, aka ripgrep-all
-
Internet Archive Scholar
I wanted to say 'au contrer' to your 'screenshots are not searchable' and link this[0] but I don't actually see images in the readme.. I swear it was there, maybe it's a buried extra flag..
[0] https://github.com/phiresky/ripgrep-all
- Recoll – Full-text search for your desktop
What are some alternatives?
InvoiceNet - Deep neural network to extract intelligent information from invoice documents.
pdfgrep - PDFGrep is a GNU/Emacs module providing grep comparable facilities but for PDF files
unstructured - Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Awesome-pytorch-list - A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
awesome-ocr
notational-fzf-vim - Notational velocity for vim.
awesome-document-understandi
fd - A simple, fast and user-friendly alternative to 'find'
awesome-huggingface - 🤗 A list of wonderful open-source projects & applications integrated with Hugging Face libraries.
ripgrep - ripgrep recursively searches directories for a regex pattern while respecting your gitignore