paradedb
hydra
paradedb | hydra | |
---|---|---|
16 | 26 | |
3,962 | 2,647 | |
11.0% | 4.6% | |
9.8 | 8.5 | |
4 days ago | 10 days ago | |
Rust | C | |
GNU Affero General Public License v3.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
paradedb
- Using ClickHouse to scale an events engine
-
Code Search Is Hard
Elasticsearch is good, and it does scale, but it is much more cumbersome and expensive to scale and operate than Postgres. If you use the managed service, you'll pay for the operational pain in the form of higher pricing.
The Postgres movement is strong and extensions like ParadeDB https://github.com/paradedb/paradedb are designed specifically to solve this pain point (Disclaimer: I work for ParadeDB)
-
Ask HN: Best way to mirror a Postgres database to parquet?
No timeline yet, but we know it's a high-priority feature and are working hard on it. Best way would be to join our Slack (link here: https://github.com/paradedb/paradedb/blob/dev/README.md) to follow along. It will be in the coming weeks/months, though.
-
Transforming Postgres into a Fast OLAP Database
You're right. We're working on this currently. You can track the issue here: https://github.com/paradedb/paradedb/issues/717
-
We built our customer data warehouse all on Postgres
There are definitely ways to cleanly make Postgres scale for analytics. We didn't discuss in this blog, but we will be writing about them in the future. For example, check out what the folks at ParadeDB are doing. https://github.com/paradedb/paradedb. Neon is doing an awesome job separating compute from storage. Supabase contributed foreign data wrappers make it super easy to read from S3 into Postgres. Lots of great work going out there :)
- Show HN: Pg_analytics – Speed Up Postgres Analytical Queries by 94x
-
Multi-Database Support in DuckDB
Check out https://github.com/paradedb/paradedb/tree/dev/pg_analytics, we're shipping this week
- ParadeDB – PostgreSQL for Search
-
Postgresql index
Shameless plug, but I'm one of the makers of `pg_bm25` (https://github.com/paradedb/paradedb). We're making a faster tsvector/tsrank as a Postgres extension. Maybe it can help, our benchmarks show much faster performance especially as row count increases
- Building an open source vector database. Looking for advice.
hydra
-
Using ClickHouse to scale an events engine
Don't feel bad, lots of people get bitten by not reading all the way down to the bottom of their readme: https://github.com/hydradatabase/hydra/blob/v1.1.2/README.md... While Hydra may very well license their own code Apache 2, they ship the AGPLv3 columnar which to my very best IANAL understanding taints the whole stack and AGPLv3's everything all the way through https://github.com/hydradatabase/hydra/blob/v1.1.2/columnar/...
-
Moving a Billion Postgres Rows on a $100 Budget
Columnar store PostgreSQL extension exists, here are two but I think I’m missing at least another one:
https://github.com/citusdata/cstore_fdw
https://github.com/hydradatabase/hydra
You can also connect other stores using the foreign data wrappers, like parquet files stored on an object store, duckdb, clickhouse… though the joins aren’t optimised as PostgreSQL would do full scan on the external table when joining.
- Hydra (YC W22) adds upsert to columnar Postgres
- Hydra
-
Is ClickHouse Moving Away from Open Source?
New column store alternative : https://github.com/hydradatabase/hydra
HN: https://news.ycombinator.com/item?id=37571974
-
Show HN: Hydra - Open-Source Columnar Postgres
some previous discussions:
https://news.ycombinator.com/item?id=37247945
https://news.ycombinator.com/item?id=36987920
and a relevant observation is that there are actually multiple license files in the repo so the consumer should read their explicit licensing section of the readme <https://github.com/hydradatabase/hydra#license> since the GitHub sidebar is misleading
-
CDC from postgres to postgres.
Hydra DB Link to Github -> Worked well for aggregated query usecases but not for queries that build reports. Also, data insertion and updation is abyssmal on columnar dbs.
-
How Query Engines Work
There's a lot of experience about db operation and how to approach MVCC encoded in PostgreSQL that shouldn't be underestimated.
[0]: https://github.com/hydradatabase/hydra
-
Hydra: Column-Oriented Postgres
And just like last time, watch out for the misleading GitHub license detector because it's not entirely Apache as the GitHub summary claims but rather *some* is Apache and buried in the interior is some AGPL stuff: https://github.com/hydradatabase/hydra#license
What are some alternatives?
MeiliSearch - A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
duckdb - DuckDB is an in-process SQL OLAP Database Management System
tantivy - Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
citus - Distributed PostgreSQL as an extension
prism - Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
ClickHouse - ClickHouse® is a free analytics DBMS for big data
retake - PostgreSQL for Search [Moved to: https://github.com/paradedb/paradedb]
postgres - PostgreSQL in Neon
bionicgpt - BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality [Moved to: https://github.com/bionic-gpt/bionic-gpt]
Udacity-Data-Engineering-Projects - Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
vasco - vasco: MIC & MINE statistics for Postgres