Show HN: Predictive Text Using Only 13KB of JavaScript. No LLM

Our great sponsors

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

Our great sponsors

autowrite

2 2 5.0 HTML

Context-aware autocomplete and autocorrect powered by word surprisals.

Another day, another story on HN that will have me down a rabbit hole (yesterday's was a three-hour tangent into ternary bit compression after the Microsoft paper) :D
Your project is delightful -- thank you for sharing. I have explored this realm a bit before [0] [1], but in Python. The tool I made was for personal use, but streaming every keystroke through a network connection added a lot of unnecessary latency.
I used word surprisals (entropy) to calculate the most likely candidates, and gave a boost to words from my own writing (thus, the predictive engine was "fine-tuned" on my writing). The result is a dictionary of words with their probabilities of use. This can be applied to bigrams, too. Your project has me thinking: how could that be pruned, massively, to create the smallest possible structure. Your engine feels like the answer.
My use case is technical writing: you know what you want to say, including long words you have to repeat over again, but you want a quicker way of typing.
[0]: https://jamesg.blog/2023/12/15/auto-write/
[1]: https://github.com/capjamesg/autowrite

obsidian-completr

3 261 1.5 TypeScript

Auto-completion plugin for the obsidian editor.
SurveyJS

surveyjs.io sponsored

Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
nanoGPT

69 31,713 5.4 Python

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Nice work! I built something similar years ago and I did compile the probabilities based on a corpus of text (public domain books) in an attempt to produce writing in the style of various authors. The results were actually quite similar to the output of nanoGPT[0]. It was very unoptimized and everything was kept in memory. I also knew nothing about embeddings at the time and only a little about NLP techniques that would certainly have helped. Using a graph database would have probably been better than the datastructure I came up with at the time. You should look into stuff like Datalog, Tries[1], and N-Triples[2] for more inspiration.
You're idea of splitting the probabilities based on whether you're starting the sentence or finishing it is interesting but you might be able to benefit from an approach that creates a "window" of text you can use for lookup, using an LCS[3] algorithm could do that. There's probably a lot of optimization you could do based on the probabilities of different sequences, I think this was the fundamental thing I was exploring in my project.
Seeing this has inspired me further to consider working on that project again at some point.
[0] https://github.com/karpathy/nanoGPT
[1] https://en.wikipedia.org/wiki/Trie
[2] https://en.wikipedia.org/wiki/N-Triples
[3] https://en.wikipedia.org/wiki/Longest_common_subsequence

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project