ckip-transformers
speechbrain
ckip-transformers | speechbrain | |
---|---|---|
1 | 26 | |
628 | 7,892 | |
1.1% | 2.5% | |
3.3 | 9.8 | |
about 1 year ago | 6 days ago | |
Python | Python | |
GNU General Public License v3.0 only | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ckip-transformers
speechbrain
- SpeechBrain 1.0: A free and open-source AI toolkit for all things speech
- FLaNK Stack Weekly 22 January 2024
-
[D] Training ASR model using SpeechBrain
You likely have a very broken sample in one of your batches. It looks like your training actually went through a few batches before it horked the error at you. A quick google shows a similar issue in the github repo: https://github.com/speechbrain/speechbrain/issues/649 .
-
Whisper.cpp
https://github.com/ggerganov/whisper.cpp https://speechbrain.github.io/
-
[D] What is the best open source text to speech model?
I don't know if it's the best, but Speechbrain is supposed to be state of the art.
-
[D] What's stopping you from working on speech and voice?
- https://github.com/speechbrain/speechbrain
- Specific Voice recognition
- How to get high-quality, low-cost Speech-to-Text transcription?
- [D] Speech Enhancement SOTA
- Speaker diarization
What are some alternatives?
nlu - 1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
espnet - End-to-End Speech Processing Toolkit
haystack - :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
pyannote-audio - Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
trankit - Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Resemblyzer - A python package to analyze and compare voices with deep learning
BERTweet - BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
ukrainian-onnx-model - An ONNX model for speech recognition of the Ukrainian language
sequence-classifier - A PyTorch Library for Sequence Labeling Tasks such as Named-entity Recognition or Part-of-speech Tagging
SincNet - SincNet is a neural architecture for efficiently processing raw audio samples.
NeMo - A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
speech-to-text-benchmark - speech to text benchmark framework