-
I wonder how this compares to Flash Attention (https://github.com/HazyResearch/flash-attention), which is the other "memory aware" Attention project I'm aware of.
I guess Flash Attention is more about utilizing memory GPU SRam correctly, where this is more about using the OS/CPU memory better?
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
willow
Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
-
willow-inference-server
Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
-
Show HN: Willow Inference Server: Optimized ASR/TTS/LLM for Willow/WebRTC/REST
-
Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
-
Whisper.api: An open source, self-hosted speech-to-text with fast transcription
-
[D] What is the most efficient version of OpenAI Whisper?
-
Show HN: Project S.A.T.U.R.D.A.Y – open-source, self hosted, J.A.R.V.I.S