pdf-document

Top 21 pdf-document Open-Source Projects

  • ReLaXed

    Create PDF documents using web technologies

  • gImageReader

    A Gtk/Qt front-end to tesseract-ocr.

  • Project mention: Making an archive out of my grandfather's writings. What OCR scanning and doc mgt system to use? | /r/selfhosted | 2023-07-12

    On tesseract base here is a software to make a scan a text searchable pdf. It take a bit of time and can be a bit tedious but it does the work! https://github.com/manisandro/gImageReader/releases It does not work well on cursive writing of course. It's a bit less heavy code sided solution. Good luck!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • lopdf

    A Rust library for PDF document manipulation.

  • PdfPig

    Read and extract text and other content from PDFs in C# (port of PDFBox)

  • go-wkhtmltopdf

    Golang commandline wrapper for wkhtmltopdf

  • tabulapdf

    Bindings for Tabula PDF Table Extractor Library

  • Project mention: What is the best library for processing table data contained within a PDF? | /r/dotnet | 2023-06-23

    In R we have this tabulizer library which is great for doing this: https://github.com/ropensci/tabulizer

  • PDFGen

    Simple C PDF Writer/Generation library

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • docnet

    DocNET is as fast PDF editing and reading library for modern .NET applications

  • Project mention: Completely free libraries to read data from a pdf in C#? | /r/dotnet | 2023-07-12

    I'm using https://github.com/GowenGit/docnet in production. I use it for text extraction and to generate thumbnails.

  • boxable

    Boxable is a library that can be used to easily create tables in pdf documents.

  • scryber.core

    Scryber.Core is a dotnet html to pdf engine written entirely in C# for creating beautiful flexible, flowing documents from html templates including css styles, data binding, svg drawing and encryption

  • Project mention: (Free) Open-source PDF Generation/Export | /r/dotnet | 2023-06-06

    u/WolfenBass1, just spotted your post, and feel free to check out Scryber.Core. It sounds like it supports what need, and will run client-side in Blazor (as well as server side). Using templates, based on html with data binding with expressions you should be able to do what you need. It is open source, and free. Also on Nuget, and any feedback is gratefully received.

  • markpdf

    Watermark PDF files using image or text

  • PyPDFForm

    :fire: The Python library for PDF forms.

  • Project mention: Show HN: A Python PDF Form Library | news.ycombinator.com | 2024-02-03
  • PDFIO.jl

    PDF Reader Library for Native Julia.

  • pdfmake-wrapper

    Wrapper based on pdfmake library (http://pdfmake.org) to generate PDF documents in an easy and readable way.

  • betterwrite

    :bookmark_tabs: A Creative Word Processor.

  • Project mention: What do you use to write? | /r/writing | 2023-05-15

    Well, I guess I can recommend you my tool, betterwrite.io. I built it to be able to write from any device (and have a professionally printable PDF without having to pay for it).

  • BatchPDFSign

    CLI Command line tool to digital signature of PDF files with PKCS12 certificate. You can find the executable in the releases.

  • parsee-pdf-reader

    Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-paragraphs. Full support for scans and images.

  • Project mention: Parsee.ai – a framework to easily extract complex structured data with LLMs | news.ycombinator.com | 2024-03-31

    Yes, another LLM framework. This one is specialized on extracting structured data from various document types (mainly PDFs, images and HTML files).

    Comes with a new (separate) PDF extraction library that is focused on the extraction of numeric tables (tables with numbers, so especially for the financial domain): https://github.com/parsee-ai/parsee-pdf-reader

    Helps to easily set up a dataset to evaluate the performance of various LLMs on data extraction tasks, e.g. extracting revenue figures from financial reports: https://github.com/parsee-ai/parsee-datasets/tree/main/datas...

  • annotated-pdf-spec

    Collection of useful hints for implementing a PDF library

  • document-barcodes

    Docbarcodes extracts 1D and 2D barcodes from scanned PDF documents or images. It can be used to automate extraction and processing of all kind of documents.

  • browserless

    A Ruby wrapper for the Browserless PDF API with support for modern CSS such as TailwindCSS (by thomasvanholder)

  • pdf_utils

    Crop, split, remove pages from PDFs with Java and PDFBox

  • Project mention: How to crop, split, remove pages from PDFs with Java and PDFBox | dev.to | 2023-05-30

    The source code is available here.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

pdf-document related posts

Index

What are some of the best open-source pdf-document projects? This list will help you:

Project Stars
1 ReLaXed 11,808
2 gImageReader 1,519
3 lopdf 1,486
4 PdfPig 1,455
5 go-wkhtmltopdf 1,003
6 tabulapdf 526
7 PDFGen 463
8 docnet 425
9 boxable 323
10 scryber.core 172
11 markpdf 153
12 PyPDFForm 126
13 PDFIO.jl 122
14 pdfmake-wrapper 67
15 betterwrite 56
16 BatchPDFSign 42
17 parsee-pdf-reader 18
18 annotated-pdf-spec 5
19 document-barcodes 4
20 browserless 2
21 pdf_utils 0

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com