tabula vs InvoiceNet

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

tabula		InvoiceNet
	Project
11	Mentions	4
6,511	Stars	2,382
1.0%	Growth	-
2.8	Activity	3.9
16 days ago	Latest Commit	about 2 months ago
CSS	Language	Python
MIT License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

tabula

Posts with mentions or reviews of tabula. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-16.

Automatisches Auslesen von PDFs
2 projects | /r/de_EDV | 16 May 2023
How To: Extract Table From Image In Python (OpenCV & OCR)
1 project | /r/Python | 17 Apr 2023
Ruby
5 projects | /r/ruby | 6 Nov 2022

Another option would be JRuby. I routinely use an application called Tabula, which is built using JRuby and compiles to a Jar file. This, of course, requires Java on the target machine, but you can ship the Jar file and it will work. It's often easier to rely on a working Java environment than it is a working Ruby environment. Especially on Windows.
I am looking to automate a process at work...
2 projects | /r/programmer | 13 Sep 2022
Self Hosted Roundup #19
1 project | /r/selfhosted | 27 Aug 2022

Idk if it has been suggested yet, tabulapdf is a self hosted solution to extract tables from PDF
Alternative to tabula.technology
1 project | /r/data | 4 Aug 2022
Text extraction from pdf, word and PPT
1 project | /r/dataengineering | 1 May 2022

For table extraction from pdfs, have a look at Tabula and Camelot, two open-source projects. They work well with clean tables, both the Tabula Python binding and Camelot allow you to export directly as a pandas dataframe. Otherwise AWS Textract API is very efficient at extracting tables from pdfs, regardless of how clean/messy they are.
hello everyone someone can help me to resolve this problem please. i want to extract this unstructured data from pdf file to excel file
1 project | /r/AskProgramming | 21 Feb 2022

No idea if it will work for you, but there is a git project that seems to do what you want https://github.com/tabulapdf/tabula
Why is the point of having so many implementation of Ruby?
1 project | /r/ruby | 13 Feb 2022
Pdfsandwich
6 projects | news.ycombinator.com | 6 Nov 2021

While trying to find a specific project I recalled, I encountered this list of projects which might be of interest: https://github.com/tstanislawek/awesome-document-understandi...
The project I had in mind was similar to this one but I can't remember the name currently: https://github.com/tabulapdf/tabula
However, if you're looking for a ML-based, invoice-specific project looks like the other comment to your reply might be more useful.

InvoiceNet

Posts with mentions or reviews of InvoiceNet. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-11-06.

How would you annotate resumes for object detection?
1 project | /r/computervision | 11 Mar 2022

You can also possibly look at invoice extraction tools such as https://github.com/naiveHobo/InvoiceNet. They solve a similar issue and are researched fairly well, since there is a big market for that.
Pdfsandwich
6 projects | news.ycombinator.com | 6 Nov 2021
Extract informations from invoices with machine learning
2 projects | /r/deeplearning | 7 Apr 2021

Also, I would suggest you to use this codebase: https://github.com/naiveHobo/InvoiceNet
P Information Extraction From A Document
1 project | /r/MachineLearning | 28 Sep 2020

You can check out this repository. It contains an implementation of some recent research in deep learning for information extraction on invoices. https://github.com/naiveHobo/InvoiceNet

What are some alternatives?

When comparing tabula and InvoiceNet you can also consider the following projects:

Apache PDFBox - Mirror of Apache PDFBox

GLOM-TensorFlow - An attempt at the implementation of GLOM, Geoffrey Hinton's paper for emergent part-whole hierarchies from data

obsidian-notion-like-tables - Your premiere tool for creating and managing tabular data in Obsidian.md

pytorch2keras - PyTorch to Keras model convertor

awesome-english-ebooks - 经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新

awesome-document-understanding - A curated list of resources for Document Understanding (DU) topic

ripgrep-all - rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

Mask-RCNN-TF2 - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow 2.0

OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

laravel-report-generator - Rapidly Generate Simple Pdf, CSV, & Excel Report Package on Laravel

tabula vs Apache PDFBox InvoiceNet vs GLOM-TensorFlow tabula vs obsidian-notion-like-tables InvoiceNet vs pytorch2keras tabula vs awesome-english-ebooks InvoiceNet vs awesome-document-understanding tabula vs ripgrep-all InvoiceNet vs Mask-RCNN-TF2 tabula vs OCRmyPDF InvoiceNet vs ripgrep-all tabula vs laravel-report-generator InvoiceNet vs OCRmyPDF

Compare tabula vs InvoiceNet and see what are their differences.

tabula

InvoiceNet

tabula

InvoiceNet

What are some alternatives?