hyperDB
quokka
hyperDB | quokka | |
---|---|---|
8 | 23 | |
1,342 | 1,082 | |
- | - | |
6.5 | 8.3 | |
9 months ago | 8 months ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hyperDB
-
Why your dataframe library needs to understand vector embeddings
Open for rebuttals from vector database vendors, especially this one: https://github.com/jdagdelen/hyperDB
- HyperDB: A hyper-fast local vector database for use with LLM Agents.
-
Ask HN: Seeking a Vector Database for ClickHouse Users – Suggestions Appreciated
I've been using https://github.com/jdagdelen/hyperDB and it's been really easy to use. I think Clickhouse support is on the short-term roadmap.
- A hyper-fast local vector database for use with LLM Agents. Now accepting SAFEs at $35M cap.
-
A hyper-fast local vector database for use with LLM Agents
https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/galax...
This was a great laugh. Also, after the benchmark, it says:
> Benchmark Credit: Benim Kıçım
"Benim Kıçım" means "my ass" in Turkish.
quokka
-
How Query Engines Work
An awesome read!
Something related that I found out about from HN a few months back is another engine called quokka. It's particularly interesting and applicable how quokka schedules distributed queries to outperform Spark https://github.com/marsupialtail/quokka/blob/master/blog/why...
- Quokka – Distributed Polars on Ray
-
Algorithmic Trading with Go
Hi Justin, you might be interested in my blog: https://github.com/marsupialtail/quokka/blob/master/blog/bac... advocating a cloud based approach.
You don't have to use the system I am building, but it's worth thinking about that design.
-
Daft: A High-Performance Distributed Dataframe Library for Multimodal Data
SQL support is very challenging.
I work on Quokka (https://github.com/marsupialtail/quokka). I support Iceberg reads. Recently we are adding SQL support from just parsing the DuckDB logical plan, though that is very challenging as well.
The Python world lacks a standard for a plug and play SQL query optimizer. Apache Calcite is good for the JVM world, but not great if you are trying to cut out the JVM.
- Why your dataframe library needs to understand vector embeddings
-
The Inner Workings of Distributed Databases
In case people are interested, I wrote a post about fault tolerance strategies of data systems like Spark and Flink: https://github.com/marsupialtail/quokka/blob/master/blog/fau...
The key difference here is that these systems don't store data, so fault tolerance means recovering within a query instead of not losing data.
-
Launch HN: DAGWorks – ML platform for data science teams
would love to collaborate on an integration with pyquokka (https://github.com/marsupialtail/quokka) once I put out a stable release end of this month :-)
-
is spark always your go to solution ?
Then you should keep an eye on quokka. This may become the "Spark" for Polars/DuckDB. It seems to be under active development though I'm not sure how stable it is.
- Distributed fault tolerance made simple
- Fault tolerance for distributed data systems is quite simple
What are some alternatives?
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
opteryx - 🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
cempaka - "Write a trading bot which buys low and sells high." Sounds simple enough, right?
awesome-pipeline - A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
spyql - Query data on the command line with SQL-like SELECTs powered by Python expressions
pg8000 - A Pure-Python PostgreSQL Driver
blog - Some notes on things I find interesting and important.
sqlglot - Python SQL Parser and Transpiler
hamilton - Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
bytewax - Python Stream Processing
go-talib - A pure Go port of TA-Lib (http://ta-lib.org)
absurd-django