excalibur
camelot
excalibur | camelot | |
---|---|---|
3 | 1 | |
1,474 | 3,553 | |
1.8% | 1.4% | |
0.0 | 0.0 | |
10 months ago | over 1 year ago | |
HTML | Python | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
excalibur
-
Ask HN: What's a good library/command line tool to extract tables from PDFs?
have not tried it, but this has been in my bookmarks a while: https://github.com/camelot-dev/excalibur
-
Is there OCR software where I can draw an outline of the columns and rows myself to extract PDF table repeatedly.
Not sure it lets you draw the columns but you could give Excalibur a look maybe? https://github.com/camelot-dev/excalibur
-
Is it possible to write a script that copies data from a pdf file to an Excel?
I'm guessing since this is mostly a non-commercial effort there's this library you could try, https://github.com/camelot-dev/excalibur and see if it helps.
camelot
What are some alternatives?
url-to-pdf-api - Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
table-transformer - Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
p2. - 💖 DocumentSpark - Simple secure document viewing server. Converts a document to a picture of its pages. Content disarm and reconstruction. CDR. Formerly p2. [Moved to: https://github.com/dosyago/documentspark]
pdf2doi - A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.
tabulapdf - Bindings for Tabula PDF Table Extractor Library
pix2struct
org-special-block-extras - A number of new custom blocks and link types for Emacs' Org-mode ^_^
video-subtitle-extractor - 视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
pdftoolbox - An opensource solution for easy and intuitive PDF manipulation.
PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
BucketStore - A simple library for interacting with Amazon S3.