Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 21 pdf-document Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
scryber.core
Scryber.Core is a dotnet html to pdf engine written entirely in C# for creating beautiful flexible, flowing documents from html templates including css styles, data binding, svg drawing and encryption
-
pdfmake-wrapper
Wrapper based on pdfmake library (http://pdfmake.org) to generate PDF documents in an easy and readable way.
-
BatchPDFSign
CLI Command line tool to digital signature of PDF files with PKCS12 certificate. You can find the executable in the releases.
-
parsee-pdf-reader
Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-paragraphs. Full support for scans and images.
-
document-barcodes
Docbarcodes extracts 1D and 2D barcodes from scanned PDF documents or images. It can be used to automate extraction and processing of all kind of documents.
-
browserless
A Ruby wrapper for the Browserless PDF API with support for modern CSS such as TailwindCSS (by thomasvanholder)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Making an archive out of my grandfather's writings. What OCR scanning and doc mgt system to use? | /r/selfhosted | 2023-07-12On tesseract base here is a software to make a scan a text searchable pdf. It take a bit of time and can be a bit tedious but it does the work! https://github.com/manisandro/gImageReader/releases It does not work well on cursive writing of course. It's a bit less heavy code sided solution. Good luck!
Project mention: What is the best library for processing table data contained within a PDF? | /r/dotnet | 2023-06-23In R we have this tabulizer library which is great for doing this: https://github.com/ropensci/tabulizer
I'm using https://github.com/GowenGit/docnet in production. I use it for text extraction and to generate thumbnails.
u/WolfenBass1, just spotted your post, and feel free to check out Scryber.Core. It sounds like it supports what need, and will run client-side in Blazor (as well as server side). Using templates, based on html with data binding with expressions you should be able to do what you need. It is open source, and free. Also on Nuget, and any feedback is gratefully received.
Well, I guess I can recommend you my tool, betterwrite.io. I built it to be able to write from any device (and have a professionally printable PDF without having to pay for it).
Project mention: Parsee.ai – a framework to easily extract complex structured data with LLMs | news.ycombinator.com | 2024-03-31Yes, another LLM framework. This one is specialized on extracting structured data from various document types (mainly PDFs, images and HTML files).
Comes with a new (separate) PDF extraction library that is focused on the extraction of numeric tables (tables with numbers, so especially for the financial domain): https://github.com/parsee-ai/parsee-pdf-reader
Helps to easily set up a dataset to evaluate the performance of various LLMs on data extraction tasks, e.g. extracting revenue figures from financial reports: https://github.com/parsee-ai/parsee-datasets/tree/main/datas...
Project mention: How to crop, split, remove pages from PDFs with Java and PDFBox | dev.to | 2023-05-30The source code is available here.
pdf-document related posts
- Making an archive out of my grandfather's writings. What OCR scanning and doc mgt system to use?
- What is the best library for processing table data contained within a PDF?
- Is there free software for windows that can read scanned handwriting and turn it into text?
- أحمل برنامج صخر منين؟ دورت عليه كتير مش لاقياه؟ ولو مش موجود حد يعرف أي بديل كويس بيعمل Arabic OCR؟
- Writer - Tips to remove breaks and hyphenations from PDF to DOC conversion?
- How to get text from a PDF file (lopdf)?
- Help plz! Tool to enhance pdf text quality?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Apr 2024
Index
What are some of the best open-source pdf-document projects? This list will help you:
Project | Stars | |
---|---|---|
1 | ReLaXed | 11,808 |
2 | gImageReader | 1,519 |
3 | lopdf | 1,486 |
4 | PdfPig | 1,455 |
5 | go-wkhtmltopdf | 1,003 |
6 | tabulapdf | 526 |
7 | PDFGen | 463 |
8 | docnet | 425 |
9 | boxable | 323 |
10 | scryber.core | 172 |
11 | markpdf | 153 |
12 | PyPDFForm | 126 |
13 | PDFIO.jl | 122 |
14 | pdfmake-wrapper | 67 |
15 | betterwrite | 56 |
16 | BatchPDFSign | 42 |
17 | parsee-pdf-reader | 18 |
18 | annotated-pdf-spec | 5 |
19 | document-barcodes | 4 |
20 | browserless | 2 |
21 | pdf_utils | 0 |
Sponsored