Talk-Llama

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

whisper.cpp

187 31,174 9.8 C

Port of OpenAI's Whisper model in C/C++

https://github.com/ggerganov/whisper.cpp/issues/352#issuecom...
I'm not sure what changed, but basically I purged ffmpeg and libsdl2-dev and the `make` in the root of the repo. Then I installed libsdl2 and ffmpeg and `make talk-llama`.
It's quite slow on 4 core i7-8550U and 16 GB of RAM.
basically, in the root of the repo:
$ sudo apt purge ffmpeg

whisper

343 60,303 6.4 Python

Robust Speech Recognition via Large-Scale Weak Supervision

For SRT, here are some front-ends: https://www.reddit.com/r/OpenAI/comments/163hzhe/recommended...
Also I saw this thing called WhisperScript that looks pretty slick: https://github.com/openai/whisper/discussions/1028
That being said, WhisperX isn't that hard to setup. My step by step from a couple months ago: https://llm-tracker.info/books/logbook/page/transcription-te...

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
ollama

192 58,943 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.
distil-whisper

9 3,125 8.5 Python

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Is https://github.com/huggingface/distil-whisper on its way to whisper.cpp?

cog

20 7,133 9.4 Python

Containers for machine learning

I'm in the same situation. I found this cog project to dockerise ML https://github.com/replicate/cog : you write just one python class and a yaml file, and it takes care of the "CUDA hell" and deps. It even creates a flask app in front of your model.
That helps keep your system clean, but someone with big $s please rewrite pytorch to golang or rust or even nodejs / typescript.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

MacWhisper: Transcribe audio files on your Mac
8 projects | news.ycombinator.com | 23 Aug 2023
Distil-Whisper: a distilled variant of Whisper that is 6x faster
1 project | /r/AudioAI | 17 Nov 2023
AI — weekly megathread!
3 projects | /r/artificial | 5 Nov 2023
Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
1 project | /r/hackernews | 3 Nov 2023
Distil-Whisper is up to 6x faster than Whisper while performing within 1% Word-Error-Rate on out-of-distribution eval sets
1 project | /r/speechtech | 2 Nov 2023

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
speech-recognition Whisper Containers openai Audio
Post date: 2 Nov 2023

whisper.cpp

whisper

WorkOS

ollama

distil-whisper

cog

InfluxDB

Related posts

Talk-Llama

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com speech-recognition Whisper Containers openai Audio Post date: 2 Nov 2023

whisper.cpp

whisper

WorkOS

ollama

distil-whisper

cog

InfluxDB

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
speech-recognition Whisper Containers openai Audio
Post date: 2 Nov 2023