simonwillisonblog vs awesome-ml

With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

surveyjs.io

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

simonwillisonblog		awesome-ml
	Project
28	Mentions	27
163	Stars	1,402
-	Growth	-
8.1	Activity	8.8
about 23 hours ago	Latest Commit	9 days ago
JavaScript	Language
Apache License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

simonwillisonblog

Posts with mentions or reviews of simonwillisonblog. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-09.

Sandboxing Python with Win32 App Isolation
1 project | news.ycombinator.com | 14 Mar 2024
AI for Web Devs: Addressing Bugs, Security, & Reliability
1 project | dev.to | 31 Jan 2024

Simon Willison has pointed out several examples of prompt injection attacks and why it may never be a solved problem:
Where Have All the Websites Gone?
3 projects | news.ycombinator.com | 9 Jan 2024

I want more people to have link blogs.
I have one in the sidebar of https://simonwillison.net/ which I've been running since November 2003. You can search through all 6,836 links here: https://simonwillison.net/search/?type=blogmark
I can post things to it with a bookmarklet. It has an Atom feed.
It's such a low-friction way of publishing. A lot of https://daringfireball.net works like this too. I also like https://waxy.org/ and https://kottke.org/ for this.
I'd love to see more of these.
Ask HN: Is it feasible to train my own LLM?
3 projects | news.ycombinator.com | 2 Jan 2024
Moving Away from Substack
1 project | news.ycombinator.com | 16 Nov 2023

My approach is to publish to my own blog at https://simonwillison.net and then copy and paste content from that into a Substack newsletter at https://simonw.substack.com a few times a month.
It's been working really well.
Substack don't have an API, but they do support copy and paste - so I built myself a tool that assembles my blog content into rich text I can copy and paste straight into the Substack editor.
I wrote about how that works here: https://simonwillison.net/2023/Apr/4/substack-observable/
Building a Blog in Django
12 projects | news.ycombinator.com | 12 Sep 2023

Hah, yeah securing something like WordPress can be a challenge, especially if you're running a bunch of plugins.
My blog is a pretty straight-forward Django setup without many other dependencies, so it's a lot less of an attack surface: https://github.com/simonw/simonwillisonblog
Show HN: Superfunctions – AI prompt templates as an API
1 project | news.ycombinator.com | 20 Aug 2023

That specific prompt is just an example and it's pretty bad, it was the shortest and simplest prompt I could come up with that would be easily understood.
You can set response content-types (text, html, json, etc...). If you use json it will get pretty good results because I have some is some logic to attempt to pick out json or json5 objects from the text output. I dont yet have logic to support json arrays, but I'm hoping to add that soon.
But still client side validation is needed for applications with untrusted input. I dont attempt to solve prompt injection. I saw a lot of interesting posts on this topic from this blog https://simonwillison.net/. I need to find sometime to read more about it.
Try this one instead, it should be better
Stopping at 90%
2 projects | news.ycombinator.com | 2 Aug 2023

I've started to consider "commit to writing about it" as the price I have to pay for giving into the lure of another project. It's one of the main reasons I publish so much content on https://simonwillison.net/ and https://til.simonwillison.net
A project with a published write-up unlocks so much more value than one which you complete without giving others a chance of understanding what you built.
I've maintained internal blogs (sometimes just a Slack channel or Confluence area) at previous employers for this purpose too.
Stanford A.I. Courses
7 projects | news.ycombinator.com | 2 Jul 2023

I think you are asking specifically about practical LLM engineering and not the underlying science.
Honestly this is all moving so fast you can do well by reading the news, following a few reddits/substacks, and skimming the prompt engineering papers as they come out every week (!).
https://www.latent.space/p/ai-engineer provides an early manifesto for this nascent layer of the stack.
Zvi writes a good roundup (though he is concerned mostly with alignment so skip if you don’t like that angle): https://thezvi.substack.com/p/ai-18-the-great-debate-debates
Simon W has some good writeups too: https://simonwillison.net/
I strongly recommend playing with the OpenAI APIs and working with langchain in a Colab notebook to get a feel for how these all fit together. Also, the tools here are incredibly simple and easy to understand (very new) so looking at, say, https://github.com/minimaxir/simpleaichat/tree/main/simpleai... or https://github.com/smol-ai/developer and digging in to the prompts, what goes in system vs assistant roles, how you gourde the LLM, etc.
Seeking Your Top Recommendations for Resources on ChatGPT and Generative AI
3 projects | /r/ChatGPTPro | 28 Jun 2023

Simon Willison's Weblog

awesome-ml

Posts with mentions or reviews of awesome-ml. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-26.

AI Infrastructure Landscape
6 projects | news.ycombinator.com | 26 Feb 2024

I do something like that for open source:
https://github.com/underlines/awesome-ml
But it lost a bit of traction lately.
It needs re-work for the categories, or better, a tagging system, because these products and libraries can sit in more than one space.
Plus it either needs massive collaboration, or some form of automation (with an LLM and indexer), as I can't keep up with it.
OpenVoice: Versatile Instant Voice Cloning
10 projects | news.ycombinator.com | 1 Jan 2024

This aera is barely new. Look at how old some of the projects are:
https://github.com/underlines/awesome-ml/blob/master/audio-a...
The thing that changes is the complexity to run it. I was training my wife's voice and my voice for fun and needed 15min of audio and trained on my 3080 for 40 minutes.
Now it's 2 Minutes.
Show HN: Floneum, a graph editor for local AI workflows
3 projects | news.ycombinator.com | 12 Jul 2023

Thanks for your clarifications. I added it to my awesome list:
https://github.com/underlines/awesome-marketing-datascience/...
AI for AWS Documentation
6 projects | news.ycombinator.com | 6 Jul 2023

RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:
- Chunking can interfer with context boundaries
- Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)
- Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)
- RAG will miserably fail with requests like "summarize the whole document"
- to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb
1 https://github.com/underlines/awesome-marketing-datascience/...
Explore and compare the parameters of top-performing LLMs
2 projects | /r/LocalLLaMA | 19 Jun 2023

I do the same and with currently with 700+ github stars people seem to like it, but it's still curated/manual, because the hf search API is so limited and I don't have the time to create a scraper.
Vicuna v1.3 13B and 7B released, trained with twice the amount of ShareGPT data
2 projects | /r/LocalLLaMA | 18 Jun 2023

Added to the list
Useful Links and Info
4 projects | /r/LocalLLaMA | 13 Jun 2023

I keep mine fairly up to date as well, almost daily: https://github.com/underlines/awesome-marketing-datascience/blob/master/README.md
How to keep track of all the LLMs out there?
2 projects | /r/LocalLLaMA | 12 Jun 2023
Run and create custom ChatGPT-like bots with OpenChat
15 projects | news.ycombinator.com | 7 Jun 2023

Disclaimer: I am curating LLM-tools on github [1]
A few thoughts:
* allow for custom endpoint URLs, this way people can use open source LLMs with a fake openAI API backend like basaran[2] or llama-api-server[3]
* look into better embedding methods for info-retrieval like InstructorEmbeddings or Document Summary Index
* Don't use a single embedding per content item, use multiple to increase retrieval quality
1 https://github.com/underlines/awesome-marketing-datascience/...
2 https://github.com/hyperonym/basaran
3 https://github.com/iaalm/llama-api-server
Seeking clarification about LLM's, Tools, etc.. for developers.
2 projects | /r/LocalLLaMA | 19 May 2023

Oobabooga isn't a wrapper for llama.cpp, but it can act as such. A usual Oobabooga installation on windows will use a GPTQ wheel (binary) compiled for cuda/windows, or alternatively use llama.cpp's API and act as a GUI. On Linux you had the choice to use the triton or cuda branch for GPTQ, but I don't know if that is still the case. You can also go the route to use virtualized and hardware accelerated WSL2 Ubuntu on Windows and use anything similar to linux. See my guide

What are some alternatives?

When comparing simonwillisonblog and awesome-ml you can also consider the following projects:

pg_cjk_parser - Postgres CJK Parser pg_cjk_parser is a fts (full text search) parser derived from the default parser in PostgreSQL 11. When a postgres database uses utf-8 encoding, this parser supports all the features of the default parser while splitting CJK (Chinese, Japanese, Korean) characters into 2-gram tokens. If the database's encoding is not utf-8, the parser behaves just like the default parser.

anything-llm - The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.

pgvector - Open-source vector similarity search for Postgres

OpenChat - LLMs custom-chatbots console ⚡

awesome-personal-blogs - A delightful list of personal tech blogs

AGiXT - AGiXT is a dynamic AI Agent Automation Platform that seamlessly orchestrates instruction management and complex task execution across diverse AI providers. Combining adaptive memory, smart features, and a versatile plugin system, AGiXT delivers efficient and comprehensive AI solutions.

tsv-utils - eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

llama-mps - Experimental fork of Facebooks LLaMa model which runs it with GPU acceleration on Apple Silicon M1/M2

knowledge - Everything I know

mnotify - A matrix cli client

zsv - zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser

mteb - MTEB: Massive Text Embedding Benchmark