dograh
j-moshi
| dograh | j-moshi | |
|---|---|---|
| 7 | 2 | |
| 4,249 | 310 | |
| 90.4% | 1.3% | |
| 9.8 | 3.8 | |
| 5 days ago | about 1 year ago | |
| Python | JavaScript | |
| BSD 2-clause "Simplified" License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dograh
-
OpenAI delivers low-latency voice AI at scale
If you like Pipecat’s focus on speed, you might also try out our open source, which comes with all the batteries included (knowledge base, telephony/SIP, variables, BYOK any LLM STT TTS, Speech to Speech, etc )
And it's fully OSS- like n8n for voice AI, and you can use it with OpenClaw or Claude code - recently launched MCPs.Github- https://github.com/dograh-hq/dograh, Youtube -https://www.youtube.com/watch?v=sxiSp4JXqws&list=PLDqzGuN7B1...
-
4 open-source tools to build production-ready AI voice agents 🎙️🚀
I've built voice agents before, but when it came to shipping them for production, I couldn't find a platform that worked quickly in 2 minutes - until we started building Dograh. It's an open-source voice AI platform with a visual workflow builder, built-in telephony, and post-call analytics out of the box. Alternative to Vapi, Retell, and Bland, but self-hostable and BSD-2 licensed. You get a canvas where you connect nodes instead of writing Python, so prompt tweaks don't mean a redeploy. Voicemail detection, call transfer, variable extraction, knowledge base, and CRM connectors all come standard. Same feature set whether you self-host or use the managed cloud. It has native support for BYOK (bring your own key) across every layer. Deepgram or Whisper for STT, ElevenLabs or Kokoro for TTS, and any LLM for the brain. Want to run everything locally? Swap in self-hosted models through the UI, no code required. Check it. https://docs.dograh.com/getting-started Youtube link: https://www.youtube.com/watch?v=sxiSp4JXqws Star the Dograh repo ⭐ → https://github.com/dograh-hq/dograh
- Show HN: Dograh – voice agents that pick Recordings over TTS using LLM
-
We analyzed 10,000 voice AI calls. The LLM was rarely the problem.
We built Dograh OSS, an open-source voice AI platform. When we started, we assumed most failures would come from the LLM - bad answers, missed intent, prompt edge cases. So we spent a lot of early effort there.
- Show HN: We open sourced Vapi – UI included
- Show HN: Dograh – an OSS Vapi alternative to quickly build and test voice agents
- Is there open source alternative for VAPI or retellai?
j-moshi
-
OpenAI delivers low-latency voice AI at scale
There's a really interesting project in Japanese natural language processing called J-Moshi that had a novel approach and in my opinion good results.
They tried to make it mimic the way Japanese is full of really quick acknowledgement sounds and it seems to allow it to handle those pauses and interruptions really well.
https://en.nagoya-u.ac.jp/news/articles/say-hello-to-j-moshi... (english)
https://nu-dialogue.github.io/j-moshi/ (japanese and english)
-
High-Fidelity Simultaneous Speech-to-Speech Translation
Check out this model based on the same architecture: https://github.com/nu-dialogue/j-moshi
What are some alternatives?
pronghorn - Fast, low-latency voice assistant protocol. Wire-level UDP streaming replacement for Wyoming.
hibiki - Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- Hibiki adapts its flow to accumulate just enough context to produce a correct translation in real-time, chunk by chunk.
webrtc-zero-downtime-restart - A playground to make WebRTC easier to deploy, safer and more robust
pipecat - Open Source framework for voice and multimodal conversational AI