PDF

Top 23 PDF Open-Source Projects

  • quivr

    Your GenAI Second Brain 🧠 A personal productivity assistant (RAG) ⚡️🤖 Chat with your docs (PDF, CSV, ...) & apps using Langchain, GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, Ollama, LLMs, Groq that you can share with users ! Local & Private alternative to OpenAI GPTs & ChatGPT powered by retrieval-augmented generation.

  • Project mention: privateGPT VS quivr - a user suggested alternative | libhunt.com/r/privateGPT | 2024-01-12
  • Stirling-PDF

    #1 Locally hosted web application that allows you to perform various operations on PDF files

  • Project mention: Stirling PDF: Self-hosted, web-based PDF manipulation tool | news.ycombinator.com | 2024-05-02

    Well it was developed initially by ChatGPT. First file I open I see repeated comments.

    https://github.com/Stirling-Tools/Stirling-PDF/blob/7f577a60...

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • Awesome-CV

    :page_facing_up: Awesome CV is LaTeX template for your outstanding job application

  • Project mention: How can I turn awesome-cv coverletter.tex and cv.tex into a single PDF? | /r/LaTeX | 2023-10-02

    I am in the process of rewriting my CV using the [awesome-cv](https://github.com/posquit0/Awesome-CV) template and am pretty happy with how things are turning out.

  • paperless-ngx

    A community-supported supercharged version of paperless: scan, index and archive all your physical documents

  • Project mention: I accidentally built a meme search engine | news.ycombinator.com | 2024-04-13

    I steered a friend towards Paperless (and away from an LLM solution) as a way of searching/accessing GBs of architectural PDFs recently - so far, it’s apparently working well for them.

    https://github.com/paperless-ngx/paperless-ngx

  • awesome-english-ebooks

    经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新

  • best-resume-ever

    :necktie: :briefcase: Build fast :rocket: and easy multiple beautiful resumes and create your best CV ever! Made with Vue and LESS.

  • Etherpad

    Etherpad: A modern really-real-time collaborative document editor.

  • Project mention: Edit This Blog Post | news.ycombinator.com | 2024-02-06
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • koodo-reader

    A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web

  • koreader

    An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices

  • Project mention: Ask HN: Best Open E-Reader? | news.ycombinator.com | 2024-04-20

    Kobos[1] and Pocketbooks[2] are a lot more open than Kindles. AFAIK you can transfer .epub files into both devices and these epubs are perfectly readable via the stock OS. If for some reason you find the stock proprietary OS lacking, you can install an open source one like KOreader [3] or Plato[4]

    Of course you want a good way of organizing epubs pdfs mobi, and like has already been mentioned Calibre[5] is a great option.

    [1]https://www.kobo.com/

    [2]https://pocketbookstore.com/en-ca

    [3]https://github.com/koreader/koreader

    [4]https://github.com/baskerville/plato

    [5]https://calibre-ebook.com/

  • gpt4-pdf-chatbot-langchain

    GPT4 & LangChain Chatbot for large PDF docs

  • Project mention: Back and forth conversations before a vector search? | /r/LangChain | 2023-08-30

    I am playing around with this github project, which takes a user question as input and immediately runs a vector search on it to find relevant storied information before delivering an answer.

  • react-pdf

    📄 Create PDF files using React

  • Project mention: How we improved our client-side PDF generation by 5x | dev.to | 2024-03-17

    Using react-pdf, we crafted a solution that allowed users to manipulate their reports with an impressive degree of flexibility. But, as data grew (imagine trying to cram an entire financial year's worth of invoices, up to 22,000 rows, into one PDF), our solution began to falter, especially on older PCs with limited resources.

  • sumatrapdf

    SumatraPDF reader

  • Project mention: MuPDF WASM Viewer Demo | news.ycombinator.com | 2024-04-20

    I’m curious, have you tried SumatraPDF (uses muPDF under the hood)?

    https://github.com/sumatrapdfreader/sumatrapdf

  • mit-deep-learning-book-pdf

    MIT Deep Learning Book in PDF format (complete and parts) by Ian Goodfellow, Yoshua Bengio and Aaron Courville

  • Project mention: Deep Learning Course | news.ycombinator.com | 2023-11-19
  • OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

  • Project mention: TextSnatcher: Copy text from images, for the Linux Desktop | news.ycombinator.com | 2024-03-14

    Try https://github.com/ocrmypdf/OCRmyPDF - it uses Tesseract behind the scenes and it absolutely brilliant.

  • milewski-ctfp-pdf

    Bartosz Milewski's 'Category Theory for Programmers' unofficial PDF and LaTeX source

  • Project mention: reflect-cpp - Now with compile time extraction of field names from structs and enums using C++-20. | /r/cpp | 2023-12-09

    Category Theory for Programmers by Bartosz Milewski (https://github.com/hmemcpy/milewski-ctfp-pdf/releases)

  • QuestPDF

    QuestPDF is a modern open-source .NET library for PDF document generation. Offering comprehensive layout engine powered by concise and discoverable C# Fluent API. Easily generate PDF reports, invoices, exports, etc.

  • Project mention: PDF Generation using QuestPDF in ASP.NET Core — Part 1 | dev.to | 2024-05-04

    What is QuestPDF? QuestPDF is an open-source .NET library for PDF document generation. It uses a fluent API approach to compose together many simple elements to create complex documents.

  • h2ogpt

    Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/

  • Project mention: Ask HN: How do I train a custom LLM/ChatGPT on my own documents in Dec 2023? | news.ycombinator.com | 2023-12-24

    As others have said you want RAG.

    The most feature complete implementation I've seen is h2ogpt[0] (not affiliated).

    The code is kind of a mess (most of the logic is in an ~8000 line python file) but it supports ingestion of everything from YouTube videos to docx, pdf, etc - either offline or from the web interface. It uses langchain and a ton of additional open source libraries under the hood. It can run directly on Linux, via docker, or with one-click installers for Mac and Windows.

    It has various model hosting implementations built in - transformers, exllama, llama.cpp as well as support for model serving frameworks like vLLM, HF TGI, etc or just OpenAI.

    You can also define your preferred embedding model along with various other parameters but I've found the out of box defaults to be pretty sane and usable.

    [0] - https://github.com/h2oai/h2ogpt

  • Dompdf

    HTML to PDF converter for PHP

  • Project mention: Intro to DOMPDF - lightest and simplest PHP library to generate PDF documents | dev.to | 2024-04-05

    Generating PDF documents out of your app's HTML output is a very common requirement and there are several open source libraries to accomplish this. I came across this need for my project recently and I evaluated many popular ones such as TCPDF, mpdf, FPDF, etc. But the one that truly stood up to my evaluation in terms of efficiency (minimal footprint) and ease of implementation was DOMPDF.

  • xournalpp

    Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input from devices such as Wacom Tablets.

  • Project mention: Rnote – An open-source vector-based drawing app | news.ycombinator.com | 2024-03-11

    I highly recommend Rnote to anyone on Linux that misses the "hodgepodge" notetaking of apps like OneNote. It works like a dream on touchscreens and drawing tablets, with a surprising amount of configuration under the hood.

    Also worth noting is Xournal, an older but similar project: https://xournalpp.github.io/

  • Zettlr

    Your One-Stop Publication Workbench

  • Project mention: Obsidian 1.5 Desktop (Public) | news.ycombinator.com | 2023-12-26
  • libvips

    A fast image processing library with low memory needs.

  • Project mention: Ask HN: How to handle user file uploads? | news.ycombinator.com | 2024-05-03

    Read through the comments and was surprised no one mentioned libvips - https://github.com/libvips/libvips. At my current small company we were trying to allow image uploads and started with imagemagick but certain images took too long to process and we were looking for faster alternatives. It's a great tool with minimum overhead. For video thumbnails, we use ffmpeg which is really heavy. We off-load video thumbnail generation to a queue. We've had great luck with these tools.

  • react-pdf

    Display PDFs in your React app as easily as if they were images. (by wojtekmaj)

  • Project mention: 33 React Libraries Every React Developer Should Have In Their Arsenal | dev.to | 2024-01-07

    23.react-pdf

  • PyPDF2

    A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

  • Project mention: Yara scanning PDF files | /r/computerforensics | 2023-06-01
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

PDF related posts

  • Stirling PDF: Self-hosted, web-based PDF manipulation tool

    4 projects | news.ycombinator.com | 2 May 2024
  • PDF Generation using QuestPDF in ASP.NET Core — Part 1

    2 projects | dev.to | 4 May 2024
  • A small lathe built in a Japanese prison camp

    1 project | news.ycombinator.com | 29 Apr 2024
  • DEMO - Voice to PDF - Complete PDF documents with voice commands using the Claude 3 Opus API

    4 projects | dev.to | 27 Apr 2024
  • Ask HN: Best Open E-Reader?

    2 projects | news.ycombinator.com | 20 Apr 2024
  • MuPDF WASM Viewer Demo

    9 projects | news.ycombinator.com | 20 Apr 2024
  • Security review of this Java library

    1 project | news.ycombinator.com | 19 Apr 2024
  • A note from our sponsor - SurveyJS
    surveyjs.io | 8 May 2024
    With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js. Learn more →

Index

What are some of the best open-source PDF projects? This list will help you:

Project Stars
1 quivr 32,917
2 Stirling-PDF 24,441
3 Awesome-CV 21,842
4 paperless-ngx 16,882
5 awesome-english-ebooks 16,697
6 best-resume-ever 16,230
7 Etherpad 15,854
8 koodo-reader 15,642
9 koreader 15,254
10 gpt4-pdf-chatbot-langchain 14,573
11 react-pdf 14,166
12 sumatrapdf 12,642
13 mit-deep-learning-book-pdf 12,342
14 OCRmyPDF 12,067
15 milewski-ctfp-pdf 10,758
16 QuestPDF 10,590
17 h2ogpt 10,506
18 Dompdf 10,278
19 xournalpp 10,270
20 Zettlr 9,640
21 libvips 9,029
22 react-pdf 8,587
23 PyPDF2 7,422

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com