Base64.ai – Extract text, data, photos and more from all types of docs

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

OCRmyPDF

77 12,067 9.5 Python

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

I looked into OCR a while ago for some hundreds of thousands of pages of PDF. All hosted offerings would end up costing quite a bit.
After looking at options and few tests, I figured I'd use https://github.com/jbarlow83/OCRmyPDF

DeepSpeech

68 24,324 0.0 C++

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Should be able to use ffmpeg[0] to extract a single frame each second/keyframe (doubtful it's worth doing every single frame) and then pass it to tesseract.
For speech to text.. if english, try mozilla's deepspeech? https://github.com/mozilla/DeepSpeech
Might be fun to try.
[0] https://stackoverflow.com/questions/27568254/how-to-extract-...

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
silero-models

32 4,569 4.7 Jupyter Notebook

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

For speech-to-text extraction you can try Silero [1].
Free software (AGPL-3.0 License), fast, highly accurate and extremely simple to deploy (I have no affiliation with them).
[1] https://github.com/snakers4/silero-models

invoice2data

2 1,699 6.7 Python

Extract structured data from PDF invoices

It's not really working. Tried 2 English PDF invoices. Normal format. One came back empty, the other only had the amount right.
I'm assuming they only trained on some specific documents (passport of country X, etc) and all others don't work.
If someone processes the same document all the time, then my invoice2data project may work better and is open source. It's based on Regx, rather than machine learning: https://github.com/invoice-x/invoice2data

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Ask HN: Offline, Embeddable Speech Recognition?

4 projects | news.ycombinator.com | 19 Aug 2022
Show HN: State-of-the-Art German Speech Recognition in 284 lines of C++

5 projects | news.ycombinator.com | 10 Aug 2022
Transcribe Speech to Text with Python for Free

1 project | /r/programming | 30 Mar 2022
Voice to Text Options that respect privacy

1 project | /r/privacy | 3 Jan 2022
Mozilla Common Voice Adds 16 New Languages and 4,600 New Hours of Speech

12 projects | news.ycombinator.com | 5 Aug 2021

Base64.ai – Extract text, data, photos and more from all types of docs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Deep Learning Python speech-recognition speech-to-text Machine Learning
Post date: 10 Feb 2021

OCRmyPDF

DeepSpeech

InfluxDB

silero-models

invoice2data

Related posts

Ask HN: Offline, Embeddable Speech Recognition?

Show HN: State-of-the-Art German Speech Recognition in 284 lines of C++

Transcribe Speech to Text with Python for Free

Voice to Text Options that respect privacy

Mozilla Common Voice Adds 16 New Languages and 4,600 New Hours of Speech

Base64.ai – Extract text, data, photos and more from all types of docs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Deep Learning Python speech-recognition speech-to-text Machine Learning Post date: 10 Feb 2021

OCRmyPDF

DeepSpeech

InfluxDB

silero-models

invoice2data

Related posts

Ask HN: Offline, Embeddable Speech Recognition?

Show HN: State-of-the-Art German Speech Recognition in 284 lines of C++

Transcribe Speech to Text with Python for Free

Voice to Text Options that respect privacy

Mozilla Common Voice Adds 16 New Languages and 4,600 New Hours of Speech

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Deep Learning Python speech-recognition speech-to-text Machine Learning
Post date: 10 Feb 2021