pydub
whisper.cpp
pydub | whisper.cpp | |
---|---|---|
25 | 187 | |
8,355 | 31,426 | |
- | - | |
0.0 | 9.8 | |
about 1 month ago | 2 days ago | |
Python | C | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pydub
- Looking for help with a winamp project please.
-
Best language(s) for creating/manipulating sounds
Honestly while, C++ is used for professional audio software, you can get a lot done with python and a library like pydub, or you can even learn to manipulate audio files without any libraries in any language. So if you are not particulary interested in C++ at the moment you can start with Python, which is easier to learn. You can check out other python audio manipulation libraries here
-
ChatGPT and Whisper APIs
I doubt it will matter if you're breaking up mid sentence if you pass in the previous as a prompt and split words. This is how Whisper does it internally.
It's not absolutely perfect, but splitting on the word boundary is one line of code with the same package in their docs: https://github.com/jiaaro/pydub/blob/master/API.markdown#sil...
25MB is also a lot. That's 30 minutes to an hour on MP3 at reasonable compression. A 2 hour movie would have three splits.
-
FFmpeg 6.0
Even given an option it can be difficult to find the corresponding documentation, if only because of the many different submodules and encoders and decoders and filters that have o-so-slightly different options. That said, I've just switched from pydub to ffmpeg-python (due to memory issues of the former[1]) and judging from the Jupiter notebook[2] it seems a much more intuitive method of constructing ffmpeg pipelines.
[1] https://github.com/jiaaro/pydub/issues/135
[2] https://github.com/kkroening/ffmpeg-python/tree/master/examp...
-
Download & Trim MP3 from Youtube with Python
With the file downloaded, we're now going to arbitrarily slice it locally (you might have considered wheter it is possible to simply download a clip from youtube; all reliable methods I've found will essentially boil down to downloading the whole and then editing locally). For that we'll use the pydub library:
-
Playing multiple .wav and/or mp3 files in Python
I guess it's possible in theory, a quick search suggest pydub library.But you may find something better if you do a little research.
-
I made a cross-platform command-line app called maestro to play music!
Uses https://github.com/cheofusi/just_playback to play sound. It's actually surprising how hard it was to find a cross-platform Python module to play sound that doesn't require an external dependency like ffmpeg. Even then, modules like https://github.com/jiaaro/pydub don't support features like seeking/scrubbing, which was a must-have for my project.
-
Batch conversion FLAC to WAV
Once python is installed, you will also need to install the "pydub" package for this script to work. If you're on a Windows computer, you can do this from the command line (run the "cmd") program. If you're on mac, you can do this from the terminal. Basically, the way that you do this is using "pip" -- a "helper" program that comes with python. Once you launch the command line, just run the command python -m pip install pydub --upgrade and you should see a message showing that it successfully installed. If you're struggling with this step, just google how to "pip install python packages" and you can find a lot of beginner guides.
-
How can I modify the pitch of an audio file and save it to disk?
That is kinda what serverless functions are built for. Looks like python has some good libraries for this: https://github.com/jiaaro/pydub.
-
Playing large audio files?
The files are big, so it's not feasible to load one in all at once. They have to be streamed/chunked somehow. (sadly, pydub doesn't support this...)
whisper.cpp
-
Show HN: I created automatic subtitling app to boost short videos
whisper.cpp [1] has a karaoke example that uses ffmpeg's drawtext filter to display rudimentary karaoke-like captions. It also supports diarisation. Perhaps it could be a starting point to create a better script that does what you need.
--
1: https://github.com/ggerganov/whisper.cpp/blob/master/README....
- LLaMA Now Goes Faster on CPUs
-
LLMs on your local Computer (Part 1)
The ggml library is one of the first library for local LLM interference. It’s a pure C library that converts models to run on several devices, including desktops, laptops, and even mobile device - and therefore, it can also be considered as a tinkering tool, trying new optimizations, that will then be incorporated into other downstream projects. This tool is at the heart of several other projects, powering LLM interference on desktop or even mobile phones. Subprojects for running specific LLMs or LLM families exists, such as whisper.cpp.
-
Voxos.ai – An Open-Source Desktop Voice Assistant
I'm not sure if it is _fully_ openai compatible, but whispercpp has a server bundled that says it is "OAI-like": https://github.com/ggerganov/whisper.cpp/tree/master/example...
I don't have any direct experience with it... I've only played around with whisper locally, using scripts.
-
Jarvis: A Voice Virtual Assistant in Python (OpenAI, ElevenLabs, Deepgram)
unless i'm misunderstanding `whisper.cpp` seems to support streaming & the repository includes a native example[0] and a WASM example[1] with a demo site[2].
[0]: https://github.com/ggerganov/whisper.cpp/tree/master/example...
- Wchess
-
I've open sourced my Flutter plugin to run on-device LLMs on any platform. TestFlight builds available now.
Usage 1: Good to transcribe audio. An example use case could be to summarize YouTube videos or long courses. Usage 2: You talk with voice to your AI that responds with text (later with audio too). - https://github.com/ggerganov/whisper.cpp
-
Scrybble is the ReMarkable highlights to Obsidian exporter I have been looking for
🗣️🎙️ whisper.cpp (offline speech-to-text transcription, models trained by OpenAI, CLI based, browser based)
- Whisper.wasm
-
Whisper C++ not working for me. Anyone else?
Has anyone played around with Whisper C++ for swift? I'm hitting a snag even on the demo. I've downloaded the github repo and everything matches up with this video [ https://youtu.be/b10OHCDHDQ4 ] but when he hits the transcribe button, it actually prints out the captioning. When I do it, it skips that part and just says "Done...". But it, does everything else - plays the audio, says it's transcribing.. just doesn't show me the transcription: and it's not in the debug window either. But the demo isn't throwing any errors, and I haven't messed with the code really so this is their example. https://github.com/ggerganov/whisper.cpp
What are some alternatives?
librosa - Python library for audio and music analysis
faster-whisper - Faster Whisper transcription with CTranslate2
SpeechRecognition - Speech recognition module for Python, supporting several engines and APIs, online and offline.
Whisper - High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
pyAudioAnalysis - Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
bark - 🔊 Text-Prompted Generative Audio Model
ffmpeg-python - Python bindings for FFmpeg - with complex filtering support
whisper - Robust Speech Recognition via Large-Scale Weak Supervision
mutagen - Python module for handling audio metadata
whisperX - WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
audioread - cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python
llama.cpp - LLM inference in C/C++