telekinesis
llm-cluster
telekinesis | llm-cluster | |
---|---|---|
12 | 3 | |
16 | 58 | |
- | - | |
5.6 | 4.9 | |
29 days ago | 2 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
telekinesis
-
Show HN: Sort and Filter Ask HN Who's Hiring by LLM-Embedding Proximity
https://payperrun.com/%3E/search?displayParams={%22q%22:%22S...
(There are quite a few, you might want to filter by date!)
-
Ask HN: Who is hiring? (November 2023)
Hey everyone, I just made this thread easier to search through here:
https://payperrun.com/%3E/search?displayParams={%22q%22:%22D...
It uses LLM embeddings to sort postsby semantic proximity, but you can also filter out posts with comma separated values like this:
-
Ask HN: What do you regret doing or not doing in your 30s?
https://news.ycombinator.com/item?id=33118584
[Shameless plug: I found all these on my llm-embedding based search engine I launched today: https://payperrun.com/%3E/search?displayParams={%22q%22:%22A...
It's much better than HN's default search: https://hn.algolia.com/?q=Ask+HN%3A+What+do+you+regret+doing... ]
-
My thoughts on starting an online business as someone who's never done it before
https://payperrun.com/%3E/search?displayParams={%22q%22:%22A...
-
We should promote more personal indexing, rather than algorhythmic indexing
There have been a few attempts at a crowdsourced-rank search engine (which is similar to what you're suggesting - people indexing the content), but it seems to be a hard cookie, most of the examples of similar ideas I could find on ProductHunt or ShowHN seem dead:
https://payperrun.com/%3E/search?displayParams={%22q%22:%22c...
(btw, I just launched this llm-embedding based search service that lets you check if a startup idea has already been tried/failed).
I don't know if this idea has a higher death rate than the baseline, but my guess is Google/PageRank is good enough for most use-cases, and then if you want quality sources, you can just follow them on YouTube, Twitter, Instagram, etc. Wait, maybe I shouldn't try to compete with Google?
-
Show HN: An Embedding-Based Search Service over ShowHN, AskHN, GitHub, More
I like the section on how it works: https://payperrun.com/%3E/search?display=How%20this%20servic...
The vector search is using https://lancedb.com/ and OpenAI embeddings.
-
Embeddings: What they are and why they matter
Behaves as I expected now!
I went here looking for more info about payperrun https://payperrun.com/%3E/welcome and clicked on the "Spotlight" section and saw 4 popups blocked - I never see popups anywhere these days and have to admit that sends me away pretty quickly.
- Show HN: Payperrun.com – A New Way to Monetize Your Code
- telekinesis: Just-in-time SDKs
- Show HN: Just-in-Time SDKs
llm-cluster
-
Embeddings: What they are and why they matter
I'm trying to understand the clustering code but not doing too well.
https://github.com/simonw/llm-cluster/blob/main/llm_cluster....
So does this take each row from the DB, convert to a numpy array (?), then uses an existing model called MiniBatchKMeans (?) to go over that array and generate a bunch of labels. Then add it to a dictionary and print to console.
-
LLM now provides tools for working with embeddings
I imagine there are all kinds of improvements that could be made to this kind of thing.
I'd love to understand if there's a good way to automatically pick an interesting number of clusters, as opposed to picking a number at the start.
https://github.com/simonw/llm-cluster/blob/main/llm_cluster....
What are some alternatives?
chasr-server - End-To-End Encrypted GPS Tracking Service
roadmap - This is the public roadmap for Salesforce Heroku services.
terra.py - Python SDK for Terra
DBoW2 - Enhanced hierarchical bag-of-word library for C++
pyxet - Python SDK for XetHub
datasette-faiss - Maintain a FAISS index for specified Datasette tables
DP_means - Dirichlet Process K-means
bert - TensorFlow code and pre-trained models for BERT
marqo - Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
vectordb - A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.