Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
image-table-ocr
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.
I've had success using camelot-py (https://camelot-py.readthedocs.io) to extract tabular data from PDFs (for images, I use imagemagick to convert those to PDF). If your table has borders the default method (lattice) works quite well. For non-bordered table there is the option to use 'stream' option but usually requires bit more preprocessing to get usable results.
https://github.com/eihli/image-table-ocr seems to automatically find tables within larger images, IDK if it works without borders though.