-
DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
I looked into OCR a while ago for some hundreds of thousands of pages of PDF. All hosted offerings would end up costing quite a bit.
After looking at options and few tests, I figured I'd use https://github.com/jbarlow83/OCRmyPDF
Should be able to use ffmpeg[0] to extract a single frame each second/keyframe (doubtful it's worth doing every single frame) and then pass it to tesseract.
For speech to text.. if english, try mozilla's deepspeech? https://github.com/mozilla/DeepSpeech
Might be fun to try.
[0] https://stackoverflow.com/questions/27568254/how-to-extract-...
For speech-to-text extraction you can try Silero [1].
Free software (AGPL-3.0 License), fast, highly accurate and extremely simple to deploy (I have no affiliation with them).
[1] https://github.com/snakers4/silero-models
It's not really working. Tried 2 English PDF invoices. Normal format. One came back empty, the other only had the amount right.
I'm assuming they only trained on some specific documents (passport of country X, etc) and all others don't work.
If someone processes the same document all the time, then my invoice2data project may work better and is open source. It's based on Regx, rather than machine learning: https://github.com/invoice-x/invoice2data