PDF to Text /selfhosted

This page summarizes the projects mentioned and recommended in the original post on /r/selfhosted

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • tesseract-ocr

    Tesseract Open Source OCR Engine (main repository)

  • Finally, if you have scanned documents in a PDF, then you will want some kind of OCR software. Tesseract is great for converting scanned documents to plain text, but does not handle layout. It's available on apt-based systems in the tesseract-ocr package, with tesseract-ocr-ABC packages for language support, where ABC is the 3-letter language code. Try apt search tesseract-ocr to see the list of available packages.

  • pandoc

    Universal markup converter

  • If you want to convert a PDF to a word document or the OpenDocument format, then it's probably Pandoc that you're after - but bear in mind that PDFs are absolutely not designed for being converted that way - in that the result of any program that does this is going to be less than perfect.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • calibre

    The official source code repository for the calibre ebook manager

  • Ah, sure! Thanks for clarifying. The other piece of software that can convert like that is Calibre. It has a ton of options, but it more suited to books etc than regular documents e.g. forms etc.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • What version of RTF (Rich Text Format) does LibreOffice adhere to?

    2 projects | /r/libreoffice | 7 Jul 2023
  • How to get Neovim docs in epub?

    2 projects | /r/neovim | 13 Sep 2022
  • i'm looking for an app for printing in epub

    2 projects | /r/opensource | 17 Feb 2022
  • Un-Remarkable2. I returned my Remarkable tablet

    2 projects | news.ycombinator.com | 28 Nov 2021
  • EPUB to MOBI from scratch

    2 projects | /r/csharp | 30 Mar 2021