pydub vs whisper.cpp

pydub

Manipulate audio with a simple and easy high level interface (by jiaaro)

Audio

Source Code

pydub.com

Suggest alternative

Edit details

whisper.cpp

Port of OpenAI's Whisper model in C/C++ (by ggerganov)

openai speech-to-text Transformer Whisper Inference speech-recognition

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

pydub		whisper.cpp
	Project
25	Mentions	187
8,355	Stars	31,426
-	Growth	-
0.0	Activity	9.8
about 1 month ago	Latest Commit	2 days ago
Python	Language	C
MIT License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

pydub

Posts with mentions or reviews of pydub. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-12.

Looking for help with a winamp project please.
3 projects | /r/learnpython | 12 May 2023
Best language(s) for creating/manipulating sounds
1 project | /r/learnprogramming | 30 Apr 2023

Honestly while, C++ is used for professional audio software, you can get a lot done with python and a library like pydub, or you can even learn to manipulate audio files without any libraries in any language. So if you are not particulary interested in C++ at the moment you can start with Python, which is easier to learn. You can check out other python audio manipulation libraries here
ChatGPT and Whisper APIs
15 projects | news.ycombinator.com | 1 Mar 2023

I doubt it will matter if you're breaking up mid sentence if you pass in the previous as a prompt and split words. This is how Whisper does it internally.
It's not absolutely perfect, but splitting on the word boundary is one line of code with the same package in their docs: https://github.com/jiaaro/pydub/blob/master/API.markdown#sil...
25MB is also a lot. That's 30 minutes to an hour on MP3 at reasonable compression. A 2 hour movie would have three splits.
FFmpeg 6.0
10 projects | news.ycombinator.com | 27 Feb 2023

Even given an option it can be difficult to find the corresponding documentation, if only because of the many different submodules and encoders and decoders and filters that have o-so-slightly different options. That said, I've just switched from pydub to ffmpeg-python (due to memory issues of the former[1]) and judging from the Jupiter notebook[2] it seems a much more intuitive method of constructing ffmpeg pipelines.
[1] https://github.com/jiaaro/pydub/issues/135
[2] https://github.com/kkroening/ffmpeg-python/tree/master/examp...
Download & Trim MP3 from Youtube with Python
3 projects | dev.to | 21 Dec 2022

With the file downloaded, we're now going to arbitrarily slice it locally (you might have considered wheter it is possible to simply download a clip from youtube; all reliable methods I've found will essentially boil down to downloading the whole and then editing locally). For that we'll use the pydub library:
Playing multiple .wav and/or mp3 files in Python
1 project | /r/learnpython | 17 Oct 2022

I guess it's possible in theory, a quick search suggest pydub library.But you may find something better if you do a little research.
I made a cross-platform command-line app called maestro to play music!
4 projects | /r/Python | 20 Jul 2022

Uses https://github.com/cheofusi/just_playback to play sound. It's actually surprising how hard it was to find a cross-platform Python module to play sound that doesn't require an external dependency like ffmpeg. Even then, modules like https://github.com/jiaaro/pydub don't support features like seeking/scrubbing, which was a must-have for my project.
Batch conversion FLAC to WAV
1 project | /r/audiophile | 16 May 2022

Once python is installed, you will also need to install the "pydub" package for this script to work. If you're on a Windows computer, you can do this from the command line (run the "cmd") program. If you're on mac, you can do this from the terminal. Basically, the way that you do this is using "pip" -- a "helper" program that comes with python. Once you launch the command line, just run the command python -m pip install pydub --upgrade and you should see a message showing that it successfully installed. If you're struggling with this step, just google how to "pip install python packages" and you can find a lot of beginner guides.
How can I modify the pitch of an audio file and save it to disk?
1 project | /r/FlutterDev | 20 Feb 2022

That is kinda what serverless functions are built for. Looks like python has some good libraries for this: https://github.com/jiaaro/pydub.
Playing large audio files?
1 project | /r/learnpython | 7 Jan 2022

The files are big, so it's not feasible to load one in all at once. They have to be streamed/chunked somehow. (sadly, pydub doesn't support this...)

whisper.cpp

Posts with mentions or reviews of whisper.cpp. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-31.

Show HN: I created automatic subtitling app to boost short videos
1 project | news.ycombinator.com | 9 Apr 2024

whisper.cpp [1] has a karaoke example that uses ffmpeg's drawtext filter to display rudimentary karaoke-like captions. It also supports diarisation. Perhaps it could be a starting point to create a better script that does what you need.
--
1: https://github.com/ggerganov/whisper.cpp/blob/master/README....
LLaMA Now Goes Faster on CPUs
16 projects | news.ycombinator.com | 31 Mar 2024
LLMs on your local Computer (Part 1)
7 projects | dev.to | 11 Mar 2024

The ggml library is one of the first library for local LLM interference. It’s a pure C library that converts models to run on several devices, including desktops, laptops, and even mobile device - and therefore, it can also be considered as a tinkering tool, trying new optimizations, that will then be incorporated into other downstream projects. This tool is at the heart of several other projects, powering LLM interference on desktop or even mobile phones. Subprojects for running specific LLMs or LLM families exists, such as whisper.cpp.
Voxos.ai – An Open-Source Desktop Voice Assistant
7 projects | news.ycombinator.com | 19 Jan 2024

I'm not sure if it is _fully_ openai compatible, but whispercpp has a server bundled that says it is "OAI-like": https://github.com/ggerganov/whisper.cpp/tree/master/example...
I don't have any direct experience with it... I've only played around with whisper locally, using scripts.
Jarvis: A Voice Virtual Assistant in Python (OpenAI, ElevenLabs, Deepgram)
7 projects | news.ycombinator.com | 18 Dec 2023

unless i'm misunderstanding `whisper.cpp` seems to support streaming & the repository includes a native example[0] and a WASM example[1] with a demo site[2].
[0]: https://github.com/ggerganov/whisper.cpp/tree/master/example...
Wchess
1 project | news.ycombinator.com | 14 Dec 2023
I've open sourced my Flutter plugin to run on-device LLMs on any platform. TestFlight builds available now.
9 projects | /r/FlutterDev | 8 Dec 2023

Usage 1: Good to transcribe audio. An example use case could be to summarize YouTube videos or long courses. Usage 2: You talk with voice to your AI that responds with text (later with audio too). - https://github.com/ggerganov/whisper.cpp
Scrybble is the ReMarkable highlights to Obsidian exporter I have been looking for
9 projects | /r/RemarkableTablet | 7 Dec 2023

🗣️🎙️ whisper.cpp (offline speech-to-text transcription, models trained by OpenAI, CLI based, browser based)
Whisper.wasm
1 project | news.ycombinator.com | 13 Nov 2023
Whisper C++ not working for me. Anyone else?
1 project | /r/Xcode | 11 Nov 2023

Has anyone played around with Whisper C++ for swift? I'm hitting a snag even on the demo. I've downloaded the github repo and everything matches up with this video [ https://youtu.be/b10OHCDHDQ4 ] but when he hits the transcribe button, it actually prints out the captioning. When I do it, it skips that part and just says "Done...". But it, does everything else - plays the audio, says it's transcribing.. just doesn't show me the transcription: and it's not in the debug window either. But the demo isn't throwing any errors, and I haven't messed with the code really so this is their example. https://github.com/ggerganov/whisper.cpp

What are some alternatives?

When comparing pydub and whisper.cpp you can also consider the following projects:

librosa - Python library for audio and music analysis

faster-whisper - Faster Whisper transcription with CTranslate2

SpeechRecognition - Speech recognition module for Python, supporting several engines and APIs, online and offline.

Whisper - High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

pyAudioAnalysis - Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

bark - 🔊 Text-Prompted Generative Audio Model

ffmpeg-python - Python bindings for FFmpeg - with complex filtering support

whisper - Robust Speech Recognition via Large-Scale Weak Supervision

mutagen - Python module for handling audio metadata

whisperX - WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

audioread - cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

llama.cpp - LLM inference in C/C++

pydub vs librosa whisper.cpp vs faster-whisper pydub vs SpeechRecognition whisper.cpp vs Whisper pydub vs pyAudioAnalysis whisper.cpp vs bark pydub vs ffmpeg-python whisper.cpp vs whisper pydub vs mutagen whisper.cpp vs whisperX pydub vs audioread whisper.cpp vs llama.cpp

Compare pydub vs whisper.cpp and see what are their differences.

pydub

whisper.cpp

pydub

whisper.cpp

What are some alternatives?