camelot vs excalibur

camelot

Camelot: PDF Table Extraction for Humans (by atlanhq)

Source Code

camelot-py.readthedocs.io

Suggest alternative

Edit details

excalibur

A web interface to extract tabular data from PDFs (by camelot-dev)

PDF Table Extract for-humans

Source Code

excalibur-py.readthedocs.io

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

camelot		excalibur
	Project
1	Mentions	3
3,553	Stars	1,478
1.4%	Growth	1.8%
0.0	Activity	0.0
over 1 year ago	Latest Commit	10 months ago
Python	Language	HTML
GNU General Public License v3.0 or later	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

camelot

Posts with mentions or reviews of camelot. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-23.

How do you parse tables in PDF with langchain? Especially, the context which is few lines above and below the table.
4 projects | /r/LangChain | 23 Jun 2023

excalibur

Posts with mentions or reviews of excalibur. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-10.

Ask HN: What's a good library/command line tool to extract tables from PDFs?
2 projects | news.ycombinator.com | 10 Jun 2023

have not tried it, but this has been in my bookmarks a while: https://github.com/camelot-dev/excalibur
Is there OCR software where I can draw an outline of the columns and rows myself to extract PDF table repeatedly.
1 project | /r/techsupport | 1 Mar 2023

Not sure it lets you draw the columns but you could give Excalibur a look maybe? https://github.com/camelot-dev/excalibur
Is it possible to write a script that copies data from a pdf file to an Excel?
1 project | /r/learnpython | 12 Apr 2021

I'm guessing since this is mostly a non-commercial effort there's this library you could try, https://github.com/camelot-dev/excalibur and see if it helps.

What are some alternatives?

When comparing camelot and excalibur you can also consider the following projects:

table-transformer - Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.

url-to-pdf-api - Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.

pdf2doi - A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.

p2. - 💖 DocumentSpark - Simple secure document viewing server. Converts a document to a picture of its pages. Content disarm and reconstruction. CDR. Formerly p2. [Moved to: https://github.com/dosyago/documentspark]

pix2struct

tabulapdf - Bindings for Tabula PDF Table Extractor Library

video-subtitle-extractor - 视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框架，包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

org-special-block-extras - A number of new custom blocks and link types for Emacs' Org-mode ^_^

PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

pdftoolbox - An opensource solution for easy and intuitive PDF manipulation.

BucketStore - A simple library for interacting with Amazon S3.