Whisper

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

whisper

343 60,303 6.4 Python

Robust Speech Recognition via Large-Scale Weak Supervision

It seems like OpenAI are finally living up to their name for once with this release? Anything I'm missing?
From what I can gather:
1. Includes model weights. I can't find the URL, but they reference them enough and have a CLI tool, so I presume I just haven't found them yet.
2. Includes code: https://github.com/openai/whisper
3. Released under MIT License: https://github.com/openai/whisper/blob/main/LICENSE

stable-diffusion

382 65,389 0.0 Jupyter Notebook

A latent text-to-image diffusion model

What "script" are you using for doing txt2img? The watermark function is automatically called when you use the CLI in two places, https://github.com/CompVis/stable-diffusion/blob/69ae4b35e0a... and https://github.com/CompVis/stable-diffusion/blob/69ae4b35e0a...
Trivial to remove, I give you that. But AFAIK, the original repository + forks put the watermark automatically unless you've removed it on your own.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
wer_are_we

4 1,862 1.8

Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.

The authors do explicitly state that they're trying to do a lot of fancy new stuff here, like be multilingual, rather than pursuing just accuracy.
[1] https://github.com/syhw/wer_are_we

whisper

1 0 0.0

I'm giving up for the night, but https://github.com/Smaug123/whisper/pull/1/files at least contains the setup instructions that may help others get to this point.

mycroft-core

212 6,452 0.0 Python

Mycroft Core, the Mycroft Artificial Intelligence platform.

As far as TTS goes, Mycroft.ai[0] has released a decent offline one.
[0]https://mycroft.ai/

NeMo

29 10,021 9.8 Python

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
vosk-api

59 7,025 5.9 Jupyter Notebook

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
dragonfly

17 373 7.5 Python

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx (by dictation-toolbox)
openai-whisper-realtime

1 180 10.0 Python

A quick experiment to achieve almost realtime transcription using Whisper.

I tried running it in realtime with live audio input (kind of).
You can find the python script in this repo: https://github.com/tobiashuttinger/openai-whisper-realtime

trashbot

1 0 10.0 Python

Trashbot helper AI assistant

One thing they don't touch much on is the STT, as they use models from third parties. You could definitely do something that utilizes this model and then feeds the tokens to some of their parsing code. I've been working on something similar to this, but burned out around adding the STT portion [0].
[0]: https://github.com/Sheepybloke2-0/trashbot - It was called trashbot because the final implementation was going to look like oscar the grouch in a trashcan displaying the reminders.

py-webrtcvad

2 1,878 0.0 C

Python interface to the WebRTC Voice Activity Detector

Haven’t tried it yet but love the concept!
Have you thought of using VAD (voice activity detection) for breaks? Back in my day (a long time ago) the webrtc VAD stuff was considered decent:
https://github.com/wiseman/py-webrtcvad
Model isn’t optimized for this use but I like where you’re headed!

plaidml

14 4,574 5.4 C++

PlaidML is a framework for making deep learning work everywhere.

It understands my Swedish attempts at English really well with the medium.en model. (Although, it gives me a funny warning: `UserWarning: medium.en is an English-only model but receipted 'English'; using English instead.`. I guess it doesn't want to be told to use English when that's all it can do.)
However, it runs very slowly. It uses the CPU on my macbook, presumably because it hasn't got a NVidia card.
Googling about that I found [plaidML](https://github.com/plaidml/plaidml) which is a project promising to run ML on many different gpu architectures. Does anyone know whether it is possible to plug them together somehow? I am not an ML researcher, and don't quite understand anything about the technical details of the domain, but I can understand and write python code in domains that I do understand, so I could do some glue work if required.

DeepSpeech-examples

2 797 0.0 Python

Examples of how to use or integrate DeepSpeech

Perhaps this could be adapted? https://github.com/mozilla/DeepSpeech-examples/blob/r0.9/mic...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Txtai: An all-in-one embeddings database for semantic search and LLM workflows
1 project | news.ycombinator.com | 24 Jan 2024
Generate knowledge with Semantic Graphs and RAG
1 project | dev.to | 23 Jan 2024
Ten Noteworthy AI Research Papers of 2023
1 project | news.ycombinator.com | 6 Jan 2024
2023: The Year of AI
2 projects | news.ycombinator.com | 25 Dec 2023
Build a search engine, not a vector DB
3 projects | news.ycombinator.com | 20 Dec 2023

Whisper – open source speech recognition by OpenAI

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
speech-recognition Python Deep Learning Deepspeech NLP
Post date: 21 Sep 2022

stable-diffusion

WorkOS

wer_are_we

whisper

mycroft-core

NeMo

vosk-api

InfluxDB

dragonfly

openai-whisper-realtime

trashbot

py-webrtcvad

plaidml

DeepSpeech-examples

Related posts

Whisper – open source speech recognition by OpenAI

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com speech-recognition Python Deep Learning Deepspeech NLP Post date: 21 Sep 2022

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
speech-recognition Python Deep Learning Deepspeech NLP
Post date: 21 Sep 2022