hydra
quokka
hydra | quokka | |
---|---|---|
27 | 23 | |
2,684 | 1,091 | |
2.4% | - | |
8.5 | 8.3 | |
26 days ago | 9 months ago | |
C | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hydra
-
Pg_lakehouse: Query Any Data Lake from Postgres
How does this compare to Hydra? https://www.hydra.so/
-
Using ClickHouse to scale an events engine
Don't feel bad, lots of people get bitten by not reading all the way down to the bottom of their readme: https://github.com/hydradatabase/hydra/blob/v1.1.2/README.md... While Hydra may very well license their own code Apache 2, they ship the AGPLv3 columnar which to my very best IANAL understanding taints the whole stack and AGPLv3's everything all the way through https://github.com/hydradatabase/hydra/blob/v1.1.2/columnar/...
-
Moving a Billion Postgres Rows on a $100 Budget
Columnar store PostgreSQL extension exists, here are two but I think I’m missing at least another one:
https://github.com/citusdata/cstore_fdw
https://github.com/hydradatabase/hydra
You can also connect other stores using the foreign data wrappers, like parquet files stored on an object store, duckdb, clickhouse… though the joins aren’t optimised as PostgreSQL would do full scan on the external table when joining.
- Hydra (YC W22) adds upsert to columnar Postgres
- Hydra
-
Is ClickHouse Moving Away from Open Source?
New column store alternative : https://github.com/hydradatabase/hydra
HN: https://news.ycombinator.com/item?id=37571974
-
Show HN: Hydra - Open-Source Columnar Postgres
some previous discussions:
https://news.ycombinator.com/item?id=37247945
https://news.ycombinator.com/item?id=36987920
and a relevant observation is that there are actually multiple license files in the repo so the consumer should read their explicit licensing section of the readme <https://github.com/hydradatabase/hydra#license> since the GitHub sidebar is misleading
-
CDC from postgres to postgres.
Hydra DB Link to Github -> Worked well for aggregated query usecases but not for queries that build reports. Also, data insertion and updation is abyssmal on columnar dbs.
-
How Query Engines Work
There's a lot of experience about db operation and how to approach MVCC encoded in PostgreSQL that shouldn't be underestimated.
[0]: https://github.com/hydradatabase/hydra
- Hydra: Column-Oriented Postgres
quokka
-
How Query Engines Work
An awesome read!
Something related that I found out about from HN a few months back is another engine called quokka. It's particularly interesting and applicable how quokka schedules distributed queries to outperform Spark https://github.com/marsupialtail/quokka/blob/master/blog/why...
- Quokka – Distributed Polars on Ray
-
Algorithmic Trading with Go
Hi Justin, you might be interested in my blog: https://github.com/marsupialtail/quokka/blob/master/blog/bac... advocating a cloud based approach.
You don't have to use the system I am building, but it's worth thinking about that design.
-
Daft: A High-Performance Distributed Dataframe Library for Multimodal Data
SQL support is very challenging.
I work on Quokka (https://github.com/marsupialtail/quokka). I support Iceberg reads. Recently we are adding SQL support from just parsing the DuckDB logical plan, though that is very challenging as well.
The Python world lacks a standard for a plug and play SQL query optimizer. Apache Calcite is good for the JVM world, but not great if you are trying to cut out the JVM.
- Why your dataframe library needs to understand vector embeddings
-
The Inner Workings of Distributed Databases
In case people are interested, I wrote a post about fault tolerance strategies of data systems like Spark and Flink: https://github.com/marsupialtail/quokka/blob/master/blog/fau...
The key difference here is that these systems don't store data, so fault tolerance means recovering within a query instead of not losing data.
-
Launch HN: DAGWorks – ML platform for data science teams
would love to collaborate on an integration with pyquokka (https://github.com/marsupialtail/quokka) once I put out a stable release end of this month :-)
-
is spark always your go to solution ?
Then you should keep an eye on quokka. This may become the "Spark" for Polars/DuckDB. It seems to be under active development though I'm not sure how stable it is.
- Distributed fault tolerance made simple
- Fault tolerance for distributed data systems is quite simple
What are some alternatives?
duckdb - DuckDB is an analytical in-process SQL database management system
opteryx - 🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
citus - Distributed PostgreSQL as an extension
cempaka - "Write a trading bot which buys low and sells high." Sounds simple enough, right?
ClickHouse - ClickHouse® is a real-time analytics DBMS
awesome-pipeline - A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
postgres - PostgreSQL in Neon
spyql - Query data on the command line with SQL-like SELECTs powered by Python expressions
Udacity-Data-Engineering-Projects - Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
pg8000 - A Pure-Python PostgreSQL Driver
vasco - vasco: Discover hidden patterns in your Postgres data
blog - Some notes on things I find interesting and important.