camelot
excalibur
camelot | excalibur | |
---|---|---|
1 | 3 | |
3,553 | 1,478 | |
1.4% | 1.8% | |
0.0 | 0.0 | |
over 1 year ago | 10 months ago | |
Python | HTML | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
camelot
excalibur
-
Ask HN: What's a good library/command line tool to extract tables from PDFs?
have not tried it, but this has been in my bookmarks a while: https://github.com/camelot-dev/excalibur
-
Is there OCR software where I can draw an outline of the columns and rows myself to extract PDF table repeatedly.
Not sure it lets you draw the columns but you could give Excalibur a look maybe? https://github.com/camelot-dev/excalibur
-
Is it possible to write a script that copies data from a pdf file to an Excel?
I'm guessing since this is mostly a non-commercial effort there's this library you could try, https://github.com/camelot-dev/excalibur and see if it helps.
What are some alternatives?
table-transformer - Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
url-to-pdf-api - Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
pdf2doi - A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.
p2. - 💖 DocumentSpark - Simple secure document viewing server. Converts a document to a picture of its pages. Content disarm and reconstruction. CDR. Formerly p2. [Moved to: https://github.com/dosyago/documentspark]
pix2struct
tabulapdf - Bindings for Tabula PDF Table Extractor Library
video-subtitle-extractor - 视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
org-special-block-extras - A number of new custom blocks and link types for Emacs' Org-mode ^_^
PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
pdftoolbox - An opensource solution for easy and intuitive PDF manipulation.
BucketStore - A simple library for interacting with Amazon S3.