-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
There's a few good models out there that take this a few steps further, and take care a lot of the work for you. Check out LayoutLM: https://github.com/microsoft/unilm
If all you need to do is OCR, check out https://github.com/JaidedAI/EasyOCR , it's a similar architecture to the cloud services, without all the $. You'll end up with extracted text and bounding boxes for it.
If you want to extract structured stuff from PDFs, there is a piece of work you can find called TableNet: https://github.com/tomassosorio/OCR_tablenet , that may also be worth checking out.