Show HN: I made a tool to convert images of tables to CSV

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Camelot

    A Python library to extract tabular data from PDFs

  • I've had success using camelot-py (https://camelot-py.readthedocs.io) to extract tabular data from PDFs (for images, I use imagemagick to convert those to PDF). If your table has borders the default method (lattice) works quite well. For non-bordered table there is the option to use 'stream' option but usually requires bit more preprocessing to get usable results.

  • image2csv

    Convert tables stored as images to an usable .csv file

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • image-table-ocr

    Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

  • https://github.com/eihli/image-table-ocr seems to automatically find tables within larger images, IDK if it works without borders though.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts