Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative (by toverainc)

Willow Alternatives

Similar projects and alternatives to willow

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better willow alternative or higher similarity.

willow reviews and mentions

Posts with mentions or reviews of willow. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-02.
  • Home Assistant 2023.11
    11 projects | news.ycombinator.com | 2 Nov 2023
    Very nice!

    Would you be interesting in integrating with my project Willow[0]?

    Willow supports Home Assistant, OpenHAB, and generic REST+MQTT endpoints today. With Home Assistant and OpenHAB we benefit from their specific API support for providing speech to text output and processing through things like the HA Assist Pipelines[1].

    From our standpoint we handle wake word, VAD+AEC+BSS, STT, TTS, user feedback, etc. All we really do is send the speech transcript to the Willow command endpoint (like HA) and speak+display the execution result. Other than all of the wild speech stuff and our obsession with speed and accuracy Willow is really quite "dumb" - think of it as a voice terminal.

    OpenHAB has something similar but it's significantly more limited.

    [0] - https://heywillow.io

    [1] - https://developers.home-assistant.io/docs/voice/pipelines/

  • Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
    14 projects | news.ycombinator.com | 31 Oct 2023
    I'm the founder of Willow[0] (we use ctranslate2 as well) and I will be looking at this as soon tomorrow as these models are released. HF claims they're drop-in compatible but we won't know for sure until someone looks at it.

    [0] - https://heywillow.io/

  • What's New in Python 3.12
    5 projects | news.ycombinator.com | 18 Oct 2023
    Shameless self-plug but with my project Willow[0] we have a management server implementation to deal with multiple devices, etc. We have a new feature called "Willow One Wake" that takes the incoming audio amplitude when wake word is detected and uses our Willow Application Server (python) to only activate wake on the device closest to the person speaking. Old and tired compared to the commercial stuff but a first in the open source space.

    The asyncio improvements in Python 3.12 especially (plus perf generally) have been instrumental in enabling real world use of this. With Python 3.12 asyncio, uvloop, and FastAPI it works remarkably well[1]. As the demo video shows not only does it not delay responsiveness, it has granularity down to inches.

    [0] - https://heywillow.io/

    [1] - https://youtu.be/qlhSEeWJ4gs

  • A Raspberry Pi 5 is better than two Pi 4S
    3 projects | news.ycombinator.com | 8 Oct 2023
    For most people with self-hosting tasks amd64 is back as the way to go.

    As you say, there are a ton of "minipcs" on the market that directly compete with the Raspberry Pi on cost and power usage. They're typically slightly larger but the expansion options (bring your own RAM/storage) plus real I/O (with real PCIe), disk, etc IMO significantly outweighs this. They're also typically more performant and while aarch64 platform support is increasing dramatically there are still the occasions where there's a project, docker container, etc that doesn't support it.

    Taking it a step further, there are a TON of decommissioned/recycled corporate/enterprise SFF desktops on the market. They don't compete in terms of size (13" x 15" or so) but they can actually get close in power usage. Many of them have multiple SATA ports, real NVMe, multiple real half-height PCIe slots, significantly better USB and PCIe bandwidth, etc.

    With my project Willow and Willow Inference Server[0] we're trying to drive this approach in the self-hosting community with an initial emphasis on Home Assistant. They're generally sick of Raspberry PI supply shortages, very limited performance, poor I/O, flaky SD cards, etc. The Raspberry Pi is still pretty popular for "my first Home Assistant" but generally once people get bitten by the self-hosting bug they end up looking more like homelab very quickly.

    For Willow particularly we emphasize use of GPUs because a voice assistant can't be waiting > 10 seconds to do speech recognition and speech synthesis. There are approaches out there trying to kind of get something working using Whisper tiny but in our ample internal testing and community feedback we feel that Whisper small is the bare minimum for voice assistant tasks, with many users going all out and using Whisper large-v2 at beam size 5. With GPU it's still so fast it doesn't really matter.

    The Raspberry Pi is especially poorly suited for this use case (and even amd64). We have some benchmarks here[1]. TLDR a ~seven year old Tesla P4 (single slot, slot power only, half-height, used for $70) does speech recognition 87x faster, with the multiple increasing for more complex models and longer speech segments. A 3.8 second voice command takes 586ms on the Tesla P4 and 51 seconds on the Raspberry Pi 4. Even with the Pi 5 being twice as fast that's still 25 seconds, which is completely unusable. Not fair to compare GPU to Raspberry Pi but consider the economics and practicality...

    You can get an SFF desktop and Tesla P4 from eBay for $200 shipped to your door. It will idle (with GPU and models loaded) at ~30 watts. The CPU, RAM, disk (NVMe), I/O, etc will walk all over a Raspberry Pi anything. Add the GPU and obviously it's not even close - you end up with a machine that can easily do 10x-100x what a Raspberry Pi can do for 2x the cost and power usage. You can even throw a 2.5gb Ethernet card in another slot for $20.

    Even factoring in power usage (10-15w vs 30, 2-3x) the cost difference comes down to nearly nothing and for many users this configuration is essentially future-proof to anything they may want to do for many years (my system with everything running maxes out around 50% of one core). Many also gradually grew their self-hosted situation over the years with people ending up with three or more Raspberry Pis for different tasks (PiHole, Home Assistant, Plex, etc). At this point the SFF configuration starts to pull far head in every way including power usage.

    Users were initially very skeptical to GPU use, likely from taking their experience in the desktop market and assuming things like "300 watt power usage with a huge > $500 card". Now they love having a GPU around for Willow and miscellaneous other CUDA tasks like encoding/decoding/transcoding with Plex/Jellyfin, accelerated Frigate, and all kinds of other applications. Willow Inference Server (depending on configuration) uses somewhere between 1-4GB of VRAM so with an 8GB VRAM card that leaves for plenty of additional tasks. We even have users who started with the Tesla P4 and then got the LLM bug and figured out how to get an RTX 3090 working with their setup.

    [0] - https://heywillow.io/

    [1] - https://heywillow.io/components/willow-inference-server/#ben...

  • ChatGPT can now see, hear, and speak – openai.com
    6 projects | news.ycombinator.com | 25 Sep 2023
  • Ask HN: Is there any open source/open hardware Echo Dot alike?
    2 projects | news.ycombinator.com | 11 Aug 2023
    I created Willow, which uses the Espressif ESP BOX:


  • Show HN: Project S.A.T.U.R.D.A.Y – open-source, self hosted, J.A.R.V.I.S
    7 projects | news.ycombinator.com | 2 Jul 2023
    Nice! I'm the creator of Willow[0] (which has been mentioned here).

    First of all, we love seeing efforts like this and we'd love to work together with other open source voice user interface projects! There's plenty of work to do in the space...

    I have roughly two decades of experience with voice and one thing to keep in mind is how latency sensitive voice tasks are. Generally speaking when it comes to conversational audio people have very high expectations regarding interactivity. For example, in the VoIP world we know that conversation between people starts getting annoying at around 300ms of latency. Higher latencies for voice assistant tasks are more-or-less "tolerated" but latency still needs to be extremely low. Alexa/Echo (with all of its problems) is at least a decent benchmark for what people expect for interactivity and all things considered it does pretty well.

    I know you're early (we are too!) but in your demo I counted roughly six seconds of latency between the initial hello and response (and nearly 10 for "tell me a joke"). In terms of conversational voice this feels like an eternity. Again, no shade at all (believe me I understand more than most) but just something I thought I'd add from my decades of experience with humans and voice. This is why we have such heavy emphasis on reducing latency as much as possible.

    For an idea of just how much we emphasize this you can try our WebRTC demo[1] which can do end-to-end (from click stop record in browser to ASR response) in a few hundred milliseconds (with Whisper large-v2 and beam size 5 - medium/1 is a fraction of that) including internet latency (it's hosted in Chicago, FYI).

    Running locally with WIS and Willow we see less than 500ms from end of speech (on device VAD) to command execution completion and TTS response with platforms like Home Assistant. Granted this is with GPU so you could call it cheating but a $100 six year old Nvidia Pascal series GPU runs circles around the fastest CPUs for these tasks (STT and TTS - see benchmarks here[2]). Again, kind of cheating but my RTX 3090 at home drops this down to around 200ms - roughly half of that time is Home Assistant. It's my (somewhat controversial) personal opinion that GPUs are more-or-less a requirement (today) for Alexa/Echo competitive responsiveness.

    Speaking of latency, I've been noticing a trend with Willow users regarding LLMs - they are very neat, cool, and interesting (our inference server[3] supports LLamA based LLMs) but they really aren't the right tool for these kinds of tasks. They have very high memory requirements (relatively speaking), require a lot of compute, and are very slow (again, relatively speaking). They also don't natively support the kinds of API call/response you need for most voice tasks. There are efforts out there to support this with LLMs but frankly I find the overall approach very strange. It seems that LLMs have sucked a lot of oxygen out of the room and people have forgotten (or never heard of) "good old fashioned" NLU/NLP approaches.

    Have you considered an NLU/NLP engine like Rasa[4]? This is the approach we will be taking to implement this kind of functionality in a flexible and assistant platform/integration agnostic way. By the time you stack up VAD, STT, understanding user intent (while allowing flexible grammar), calling an API, execution, and TTS response latency starts to add up very, very quickly.

    As one example, for "tell me a joke" Alexa does this in a few hundred milliseconds and I guarantee they're not using an LLM for this task - you can have a couple of hundred jokes to randomly select from with pre-generated TTS responses cached (as one path). Again, this is the approach we are taking to "catch up" with Alexa for all kinds of things from jokes to creating calendar entries, etc. Of course you can still have a catch-all to hand off to LLM for "conversation" but I'm not sure users actually want this for voice.

    I may be misunderstanding your goals but just a few things I thought I would mention.

    [0] - https://github.com/toverainc/willow

    [1] - https://wisng.tovera.io/rtc/

    [2] - https://github.com/toverainc/willow-inference-server/tree/wi...

    [3] - https://github.com/toverainc/willow-inference-server

    [4] - https://rasa.com/

  • Cooklang – Recipe Markup Language
    4 projects | news.ycombinator.com | 27 Jun 2023
    I would suspect teaching Willow (https://github.com/toverainc/willow#readme) to do that would be much more reasonable in a kitchen setup, for the exact reason guhidalg mentioned: who wants to use oil soaked hands to touch something in the kitchen?
  • Is anyone doing always-on voice to text with a local llama at home?
    5 projects | /r/LocalLLaMA | 25 Jun 2023
    I have something similar running locally using https://github.com/toverainc/willow and a python script.
  • VLLM: 24x faster LLM serving than HuggingFace Transformers
    3 projects | news.ycombinator.com | 20 Jun 2023
    We run into this constantly with Willow[0] and the Willow Inference Server[1]. There seems to be a large gap in understanding with many users. They seem to find it difficult to understand a fundamental reality: GPUs are so physically different and better suited to many/most ML tasks all the CPU tricks in the world cannot bring CPU even close to the performance of GPUs (while maintaining quality/functionality) for many tasks. I find this interesting because everyone seems to take it as obvious that integrated graphics vs discrete graphics for gaming aren't even close. Ditto for these tasks.

    With Willow Inference Server I'm constantly telling people: a six year old $100 Tesla P4/GTX 1070 walks all over even the best CPUs in the world for our primary task of speech to text/ASR - at dramatically lower cost and power usage. Seriously - a GTX 1070 is at least 5x faster than a Threadripper 5955WX. Our goal is to provide an open-source commercial voice assistant equivalent user experience and that is and will be fundamentally impossible for the foreseeable future on CPU.

    Slight tangent but there are users in the space who seem to be under the impression that they can use their Raspberry Pi for voice assistant/speech recognition. It's not even close to a fair fight but with the same implementation and settings a GTX 1070 is roughly 90x (nearly two orders of magnitude) faster[2] than a Raspberry Pi... Yes, all-in a machine with a GTX 1070 uses and order of magnitude more power (3w vs 30x) than a Raspberry Pi but the power cost in even countries with the most expensive power in the world results in a $2-$3/mo cost difference - which I feel, at least, is a reasonable trade-off considering the dramatic difference in usability (Raspberry Pi is essentially useless - waiting 10-30 seconds for a response makes pulling your phone out faster).

    [0] - https://github.com/toverainc/willow

    [1] - https://github.com/toverainc/willow-inference-server

    [2] - https://github.com/toverainc/willow-inference-server/tree/wi...

  • A note from our sponsor - Onboard AI
    getonboard.dev | 6 Dec 2023
    Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev. Learn more →


Basic willow repo stats
5 days ago

toverainc/willow is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of willow is C.

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives