ITextSharp vs tabula

ITextSharp

[DEPRECATED] .NET port of the iText library, only security fixes will be added — please use iText for .NET (by itext)

PDF

Source Code

itextpdf.com

Suggest alternative

Edit details

tabula

Tabula is a tool for liberating data tables trapped inside PDF files (by tabulapdf)

PDF CSV Excel Tables Scraping

Source Code

tabula.technology

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

ITextSharp		tabula
	Project
4	Mentions	11
1,333	Stars	6,521
0.0%	Growth	0.6%
1.7	Activity	2.8
about 1 year ago	Latest Commit	22 days ago
C#	Language	CSS
GNU General Public License v3.0 or later	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

ITextSharp

Posts with mentions or reviews of ITextSharp. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-16.

Automatisches Auslesen von PDFs
2 projects | /r/de_EDV | 16 May 2023
Extract Table of Contents (TOC) from a PDF using PowerShell
2 projects | /r/PowerShell | 22 Dec 2022

It uses iTextSharp
Unsure where to start - Needing to develop a program/app to automate a tedious part of my job
1 project | /r/AskProgramming | 15 Feb 2022

You can use libraries like https://www.nuget.org/packages/iTextSharp/ to read the text out of a PDF. Then just run a regular expression over it to extract your into XML or something.
Need some help on this:
1 project | /r/PowershellSolutions | 11 Mar 2021

NuGet Gallery | iTextSharp 5.5.13.2

tabula

Posts with mentions or reviews of tabula. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-16.

Automatisches Auslesen von PDFs
2 projects | /r/de_EDV | 16 May 2023
How To: Extract Table From Image In Python (OpenCV & OCR)
1 project | /r/Python | 17 Apr 2023
Ruby
5 projects | /r/ruby | 6 Nov 2022

Another option would be JRuby. I routinely use an application called Tabula, which is built using JRuby and compiles to a Jar file. This, of course, requires Java on the target machine, but you can ship the Jar file and it will work. It's often easier to rely on a working Java environment than it is a working Ruby environment. Especially on Windows.
I am looking to automate a process at work...
2 projects | /r/programmer | 13 Sep 2022
Self Hosted Roundup #19
1 project | /r/selfhosted | 27 Aug 2022

Idk if it has been suggested yet, tabulapdf is a self hosted solution to extract tables from PDF
Alternative to tabula.technology
1 project | /r/data | 4 Aug 2022
Text extraction from pdf, word and PPT
1 project | /r/dataengineering | 1 May 2022

For table extraction from pdfs, have a look at Tabula and Camelot, two open-source projects. They work well with clean tables, both the Tabula Python binding and Camelot allow you to export directly as a pandas dataframe. Otherwise AWS Textract API is very efficient at extracting tables from pdfs, regardless of how clean/messy they are.
hello everyone someone can help me to resolve this problem please. i want to extract this unstructured data from pdf file to excel file
1 project | /r/AskProgramming | 21 Feb 2022

No idea if it will work for you, but there is a git project that seems to do what you want https://github.com/tabulapdf/tabula
Why is the point of having so many implementation of Ruby?
1 project | /r/ruby | 13 Feb 2022
Pdfsandwich
6 projects | news.ycombinator.com | 6 Nov 2021

While trying to find a specific project I recalled, I encountered this list of projects which might be of interest: https://github.com/tstanislawek/awesome-document-understandi...
The project I had in mind was similar to this one but I can't remember the name currently: https://github.com/tabulapdf/tabula
However, if you're looking for a ML-based, invoice-specific project looks like the other comment to your reply might be more useful.

What are some alternatives?

When comparing ITextSharp and tabula you can also consider the following projects:

iTextSharp (LGPL / MPL) 4.1.6 for .NET Core - Unofficial .NET Core port of iTextSharp 4.1.6. Last version to be released under the Mozilla Public License and the LGPL.

Apache PDFBox - Mirror of Apache PDFBox

PdfPig - Read and extract text and other content from PDFs in C# (port of PDFBox)

obsidian-notion-like-tables - Your premiere tool for creating and managing tabular data in Obsidian.md

library - QuestPDF is an open-source, modern and battle-tested library that can help you with generating PDF documents by offering friendly, discoverable and predictable C# fluent API.

awesome-english-ebooks - 经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新

Pdfium.Net SDK

ripgrep-all - rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

WkhtmlToPdf - C# wrapper around excellent wkhtmltopdf console utility.

OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

PdfiumViewer

laravel-report-generator - Rapidly Generate Simple Pdf, CSV, & Excel Report Package on Laravel

ITextSharp vs iTextSharp (LGPL / MPL) 4.1.6 for .NET Core tabula vs Apache PDFBox ITextSharp vs PdfPig tabula vs obsidian-notion-like-tables ITextSharp vs library tabula vs awesome-english-ebooks ITextSharp vs Pdfium.Net SDK tabula vs ripgrep-all ITextSharp vs WkhtmlToPdf tabula vs OCRmyPDF ITextSharp vs PdfiumViewer tabula vs laravel-report-generator

Compare ITextSharp vs tabula and see what are their differences.

ITextSharp

tabula

ITextSharp

tabula

What are some alternatives?