pyAudioAnalysis
SpeechRecognition
Our great sponsors
pyAudioAnalysis | SpeechRecognition | |
---|---|---|
11 | 16 | |
5,668 | 8,040 | |
- | - | |
5.0 | 8.7 | |
26 days ago | 8 days ago | |
Python | Python | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pyAudioAnalysis
-
How would I compare two voice recordings of the same sentence and advise one speaker how to get closer to the second?
I actually came up with an el cheapo version of what I want to accomplish that isn't perfect but without any research can implement it and it may actually prove useful to language learners. PM me if you're interested in hearing it and critiquing it. I can share here that I'm using this guy's multiple repos though: https://github.com/tyiannak/pyAudioAnalysis
- How do I run code only when an audio file has bass
- A Python library for audio feature extraction, classification, segmentation and applications
-
Phonetic search for audio files
Update: From one researcher to another. I was referred to a Python Audio AI project . Once I determine exactly which module to use I should be smooth sailing. I'll send more updates soon.
-
Clustering songs with different lengths
Hey folks, I'm looking into clustering audio files with features extracted by pyAudioAnalysis. However, every feature (I'm interested in MFCC, spectral centroid and spread, and BPM) is extracted for each frame of the song (by default 0.05s, excluding BPM that relates to the whole) so tracks with different lengths produce arrays with different shapes.
-
AUDIO ANALYSIS WITH LIBROSA
To learn more about pyAudioAnalysis here you go.
-
Creating Audio Features with PyAudio Analysis
Humans are great at classifying noises. We can hear a chirp and surmise that it belongs to a bird, we can hear an abstract noise and classify it as as speech with a particular meaning and definition. This relationship between humans and audio classification forms the basis of speech and human communication as a whole. Translating this incredible ability to computers on the other hand can be a difficult challenge to say the least. Whilst we can naturally decompose signals, how do we teach computers to do this, and how do we show what parts of the signal matter and what parts of the signal are irrelevant or noisy? This is where PyAudio Analysis comes in. PyAudio Analysis is an open source Python project by Theodoros Giannakopoulos, a Principle researcher of multimodal machine learning at the Multimedia Analysis Group of the Computational Intelligence Lab (MagCIL). The package aims to simplify the feature extraction and classification process by providing a number of helpful tools at can sift through the signal and create relevant features. These features can then be used to train models for classification tasks.
-
[P] Feature extraction for acoustic signals
This might be relevant, which has a set of feature extraction methods implemented: https://github.com/tyiannak/pyAudioAnalysis/wiki/3.-Feature-Extraction
-
Hacker News top posts: Dec 11, 2021
A library for audio feature extraction, regression, classification, segmentation\ (2 comments)
- Audio feature extraction, classification, segmentation and applications
SpeechRecognition
-
help with script (beginner)
Start and Stop Listening Example
-
MacWhisper: Transcribe audio files on your Mac
There is a great library that has support not only with OpenAIs whisper but many others that also work offline. https://github.com/Uberi/speech_recognition
-
Unpopular Opinion: a lot of Obsidian community make Obsidian sound like something cringey/productivity guru-y
This is the library: https://github.com/Uberi/speech_recognition
-
Nvim-VoiceRec : Add Speech-To-Text To Neovim! (useful for gpt)
It is python remote plugin that is a tin wrapper around speech_recognition package.
- Speech-to-text software
-
Voice commands in Doom Eternal possible?
I am less familiar with speech recognition myself. I have implemented something similar many years ago, back when Google had a REST API that allowed you to upload audio and they would respond with the recognized words/sentence. I think they still have the same API available, though. They limited how much you could send, but for voice commands it was pretty solid. However, SpeechRecognition looks like a library worth trying out for this, as that seems like it could do offline processing depending on the underlying library. They also have some examples to look at.
-
Build Simple CLI-Based Voice Assistant with PyAudio, Speech Recognition, pyttsx3 and SerpApi
SpeechRecognition
- Need help with speech recognition
-
Wiki for the podcast
I found this one here
-
How to use my speaker as input and my mic as output?
https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst this might help. I guess your best bet is to rtfm.
What are some alternatives?
librosa - Python library for audio and music analysis
pydub - Manipulate audio with a simple and easy high level interface
allosaurus - Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
pyAcoustics - A collection of python scripts for extracting and analyzing acoustics from audio files.
aeneas - aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
mingus - Mingus is a music package for Python
speech-to-text-websockets-python
Watson Developer Cloud Python SDK - :snake: Client library to use the IBM Watson services in Python and available in pip as watson-developer-cloud
speechpy - :speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/