Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Hi all, I've been working on this application for quite some time. It started out as a Stable Diffusion art app, and is now transitioning into a full featured AI assistant of sorts.
The video features me talking to the computer using my headphones. It records my speech, translates bytes to text, passes that to the LLM which generates text, and then uses another model for text to speech.
The video also shows me asking for an image at which point the LLM generates a prompt. Stable Diffusion is loaded and the prompt is passed to SD to generate the image.
The models I'm using:
- TTS: SpeechT5
- LLM: Mistral 7b
- Stable Diffusion: Turbo
- STT: whisper-tiny
- Vision: various, still in development
As I mentioned there at the end, vision is still in development. I have a working prototype in which images are taken every second, translated into text and then passed to my chat prompt. It works OK but is often wrong.
The project is open source under GPL-3, written with Python using PyQT6. You can find it here:
https://github.com/Capsize-Games/airunner
The compiled stable version is available for download on itch, but only includes image generation capabilities, everything else is in the unreleased 3.0.0 version
https://capsizegames.itch.io/ai-runner