simonwillisonblog vs pgvector

With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

surveyjs.io

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

simonwillisonblog		pgvector
	Project
28	Mentions	78
163	Stars	9,211
-	Growth	5.6%
8.1	Activity	9.9
about 18 hours ago	Latest Commit	6 days ago
JavaScript	Language	C
Apache License 2.0	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

simonwillisonblog

Posts with mentions or reviews of simonwillisonblog. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-09.

Sandboxing Python with Win32 App Isolation
1 project | news.ycombinator.com | 14 Mar 2024
AI for Web Devs: Addressing Bugs, Security, & Reliability
1 project | dev.to | 31 Jan 2024

Simon Willison has pointed out several examples of prompt injection attacks and why it may never be a solved problem:
Where Have All the Websites Gone?
3 projects | news.ycombinator.com | 9 Jan 2024

I want more people to have link blogs.
I have one in the sidebar of https://simonwillison.net/ which I've been running since November 2003. You can search through all 6,836 links here: https://simonwillison.net/search/?type=blogmark
I can post things to it with a bookmarklet. It has an Atom feed.
It's such a low-friction way of publishing. A lot of https://daringfireball.net works like this too. I also like https://waxy.org/ and https://kottke.org/ for this.
I'd love to see more of these.
Ask HN: Is it feasible to train my own LLM?
3 projects | news.ycombinator.com | 2 Jan 2024
Moving Away from Substack
1 project | news.ycombinator.com | 16 Nov 2023

My approach is to publish to my own blog at https://simonwillison.net and then copy and paste content from that into a Substack newsletter at https://simonw.substack.com a few times a month.
It's been working really well.
Substack don't have an API, but they do support copy and paste - so I built myself a tool that assembles my blog content into rich text I can copy and paste straight into the Substack editor.
I wrote about how that works here: https://simonwillison.net/2023/Apr/4/substack-observable/
Building a Blog in Django
12 projects | news.ycombinator.com | 12 Sep 2023

Hah, yeah securing something like WordPress can be a challenge, especially if you're running a bunch of plugins.
My blog is a pretty straight-forward Django setup without many other dependencies, so it's a lot less of an attack surface: https://github.com/simonw/simonwillisonblog
Show HN: Superfunctions – AI prompt templates as an API
1 project | news.ycombinator.com | 20 Aug 2023

That specific prompt is just an example and it's pretty bad, it was the shortest and simplest prompt I could come up with that would be easily understood.
You can set response content-types (text, html, json, etc...). If you use json it will get pretty good results because I have some is some logic to attempt to pick out json or json5 objects from the text output. I dont yet have logic to support json arrays, but I'm hoping to add that soon.
But still client side validation is needed for applications with untrusted input. I dont attempt to solve prompt injection. I saw a lot of interesting posts on this topic from this blog https://simonwillison.net/. I need to find sometime to read more about it.
Try this one instead, it should be better
Stopping at 90%
2 projects | news.ycombinator.com | 2 Aug 2023

I've started to consider "commit to writing about it" as the price I have to pay for giving into the lure of another project. It's one of the main reasons I publish so much content on https://simonwillison.net/ and https://til.simonwillison.net
A project with a published write-up unlocks so much more value than one which you complete without giving others a chance of understanding what you built.
I've maintained internal blogs (sometimes just a Slack channel or Confluence area) at previous employers for this purpose too.
Stanford A.I. Courses
7 projects | news.ycombinator.com | 2 Jul 2023

I think you are asking specifically about practical LLM engineering and not the underlying science.
Honestly this is all moving so fast you can do well by reading the news, following a few reddits/substacks, and skimming the prompt engineering papers as they come out every week (!).
https://www.latent.space/p/ai-engineer provides an early manifesto for this nascent layer of the stack.
Zvi writes a good roundup (though he is concerned mostly with alignment so skip if you don’t like that angle): https://thezvi.substack.com/p/ai-18-the-great-debate-debates
Simon W has some good writeups too: https://simonwillison.net/
I strongly recommend playing with the OpenAI APIs and working with langchain in a Colab notebook to get a feel for how these all fit together. Also, the tools here are incredibly simple and easy to understand (very new) so looking at, say, https://github.com/minimaxir/simpleaichat/tree/main/simpleai... or https://github.com/smol-ai/developer and digging in to the prompts, what goes in system vs assistant roles, how you gourde the LLM, etc.
Seeking Your Top Recommendations for Resources on ChatGPT and Generative AI
3 projects | /r/ChatGPTPro | 28 Jun 2023

Simon Willison's Weblog

pgvector

Posts with mentions or reviews of pgvector. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-25.

Integrate txtai with Postgres
2 projects | dev.to | 25 Apr 2024

# Install Postgres and pgvector !apt-get update && apt install postgresql postgresql-server-dev-14 !git clone --branch v0.6.2 https://github.com/pgvector/pgvector.git !cd pgvector && make && make install # Start database !service postgresql start !sudo -u postgres psql -U postgres -c "ALTER USER postgres PASSWORD 'pass';"
Vector Database solutions on AWS
1 project | dev.to | 28 Mar 2024

When talking about Vector Databases, in the market we can find the specialized ones and multi-model, most of the major database providers like Oracle, PostgreSQL or MongoDB, for mention some of them, have integrated a specific solution to retrieve vector data.
Using pgvector To Locate Similarities In Enterprise Data
2 projects | dev.to | 21 Mar 2024

For this example, I wanted to focus on how pgvector – an open-source vector similarity search for Postgres – can be used to identify data similarities that exist in enterprise data.
pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL
1 project | dev.to | 19 Mar 2024

pgvector supports dense vector search well, but it does not have plan to support sparse vector.
Pg_vectorize: The simplest way to do vector search and RAG on Postgres
6 projects | news.ycombinator.com | 6 Mar 2024

There's an issue in the pgvector repo about someone having several ~10-20million row tables and getting acceptable performance with the right hardware and some performance tuning: https://github.com/pgvector/pgvector/issues/455
I'm in the early stages of evaluating pgvector myself. but having used pinecone I currently am liking pgvector better because of it being open source. The indexing algorithm is clear, one can understand and modify the parameters. Furthermore the database is postgresql, not a proprietary document store. When the other data in the problem is stored relationally, it is very convenient to have the vectors stored like this as well. And postgresql has good observability and metrics. I think when it comes to flexibility for specialized applications, pgvector seems like the clear winner. But I can definitely see pinecone's appeal if vector search is not a core component of the problem/business, as it is very easy to use and scales very easily
FLaNK 04 March 2024
26 projects | dev.to | 4 Mar 2024
Vector Database and Spring IA
2 projects | dev.to | 11 Feb 2024

The Spring AI project aims to streamline the development of applications that incorporate artificial intelligence functionality without unnecessary complexity. On this example we use features like: Embedding, Prompts, ETL and save all embedding on PGvector(Postgres Vector database)
Use pgvector for searching images on Azure Cosmos DB for PostgreSQL
2 projects | dev.to | 7 Feb 2024

Official GitHub repository of the pgvector extension
pgvector 0.6.0: 30x faster with parallel index builds
1 project | dev.to | 31 Jan 2024

pgvector 0.6.0 was just released and will be available on Supabase projects soon. Again, a special shout out to Andrew Kane and everyone else who worked on parallel index builds.
Store embeddings in Azure Cosmos DB for PostgreSQL with pgvector
2 projects | dev.to | 29 Jan 2024

The pgvector extension adds vector similarity search capabilities to your PostgreSQL database. To use the extension, you have to first create it in your database. You can install the extension, by connecting to your database and running the CREATE EXTENSION command from the psql command prompt:

What are some alternatives?

When comparing simonwillisonblog and pgvector you can also consider the following projects:

pg_cjk_parser - Postgres CJK Parser pg_cjk_parser is a fts (full text search) parser derived from the default parser in PostgreSQL 11. When a postgres database uses utf-8 encoding, this parser supports all the features of the default parser while splitting CJK (Chinese, Japanese, Korean) characters into 2-gram tokens. If the database's encoding is not utf-8, the parser behaves just like the default parser.

Milvus - A cloud-native vector database, storage for next generation AI applications

awesome-personal-blogs - A delightful list of personal tech blogs

faiss - A library for efficient similarity search and clustering of dense vectors.

tsv-utils - eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

Weaviate - Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

awesome-ml - Curated list of useful LLM / Analytics / Datascience resources

Elasticsearch - Free and Open, Distributed, RESTful Search Engine

knowledge - Everything I know

qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

zsv - zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser

ann-benchmarks - Benchmarks of approximate nearest neighbor libraries in Python