Completely crazy tables when transforming table from PDF file to CSV

This page summarizes the projects mentioned and recommended in the original post on /r/Python

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • tabula-py

    Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame

  • pdfminer.six

    Community maintained fork of pdfminer - we fathom PDF

  • Having said that, I've had pretty decent luck with PDFMiner.six (github link) for various extractions. Sometimes PDFs are decently structured HTML under the hood as well, so you might look into dumping the PDF to HTML, then parsing it with an HTML library like LXML or use Python's HTML.parser as part of the standard Python library (assuming the structured HTML is actually functional; can check by dumping the PDF to an .html document and opening it in a web browser.)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • What is the best way to extract tables from scanned pdf's?

    1 project | /r/learnpython | 10 Nov 2022
  • software to convert pdf tables to Excel

    1 project | /r/BusinessIntelligence | 26 Apr 2022
  • Ensure Java is installed and PATH is set for `java` in Amazon SageMaker Jupyter Notebook

    1 project | /r/aws | 14 Oct 2021
  • Show HN: PDFSyntax, a Python library to inspect and transform PDF files

    1 project | news.ycombinator.com | 22 Jan 2024
  • Code to extract text from pdf to excel

    2 projects | /r/Python | 2 Jun 2023