quokka
cempaka
quokka | cempaka | |
---|---|---|
23 | 1 | |
1,084 | 3 | |
- | - | |
8.3 | 2.9 | |
8 months ago | 11 months ago | |
Python | Java | |
Apache License 2.0 | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
quokka
-
How Query Engines Work
An awesome read!
Something related that I found out about from HN a few months back is another engine called quokka. It's particularly interesting and applicable how quokka schedules distributed queries to outperform Spark https://github.com/marsupialtail/quokka/blob/master/blog/why...
- Quokka – Distributed Polars on Ray
-
Algorithmic Trading with Go
Hi Justin, you might be interested in my blog: https://github.com/marsupialtail/quokka/blob/master/blog/bac... advocating a cloud based approach.
You don't have to use the system I am building, but it's worth thinking about that design.
-
Daft: A High-Performance Distributed Dataframe Library for Multimodal Data
SQL support is very challenging.
I work on Quokka (https://github.com/marsupialtail/quokka). I support Iceberg reads. Recently we are adding SQL support from just parsing the DuckDB logical plan, though that is very challenging as well.
The Python world lacks a standard for a plug and play SQL query optimizer. Apache Calcite is good for the JVM world, but not great if you are trying to cut out the JVM.
- Why your dataframe library needs to understand vector embeddings
-
The Inner Workings of Distributed Databases
In case people are interested, I wrote a post about fault tolerance strategies of data systems like Spark and Flink: https://github.com/marsupialtail/quokka/blob/master/blog/fau...
The key difference here is that these systems don't store data, so fault tolerance means recovering within a query instead of not losing data.
-
Launch HN: DAGWorks – ML platform for data science teams
would love to collaborate on an integration with pyquokka (https://github.com/marsupialtail/quokka) once I put out a stable release end of this month :-)
-
is spark always your go to solution ?
Then you should keep an eye on quokka. This may become the "Spark" for Polars/DuckDB. It seems to be under active development though I'm not sure how stable it is.
- Distributed fault tolerance made simple
- Fault tolerance for distributed data systems is quite simple
cempaka
-
Algorithmic Trading with Go
While I was laid off and looking for work, I connected with a cryptocurrency market making firm that had access to a private broker feed which is not attached to any matching, so would occasionally go crossed and offer a pure arb opportunity for the same pairs. I had done some algo trading with Java in Kospi 200 options in 2011-2012 so I decided to put together a simple bot for them to try and grab the crossed markets when they occur. Even an incredibly simple trade like this requires quite a lot of work to get the risk management in place. I also took it as a chance to catch up on new Java features since I had been out of that ecosystem for awhile.
It did successfully grab the arbs but there wasn't enough juice to justify more work on it and I got a job in the meantime, so I open sourced the whole thing: https://github.com/abissell/cempaka
What are some alternatives?
opteryx - 🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
intelligent-trading-bot - Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering
awesome-pipeline - A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
go-talib - A pure Go port of TA-Lib (http://ta-lib.org)
spyql - Query data on the command line with SQL-like SELECTs powered by Python expressions
GoIB - Pure Go interface to Interactive Brokers IB API
pg8000 - A Pure-Python PostgreSQL Driver
blog - Some notes on things I find interesting and important.
sqlglot - Python SQL Parser and Transpiler
hamilton - Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
bytewax - Python Stream Processing