hydra
citus
hydra | citus | |
---|---|---|
14 | 61 | |
8,229 | 9,840 | |
1.6% | 1.2% | |
6.3 | 9.4 | |
22 days ago | 10 days ago | |
Python | C | |
MIT License | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hydra
- Hydra – a Framework for configuring complex applications
-
Show HN: Hydra - Open-Source Columnar Postgres
Nice tool, only unfortunate name, consider changing it. Already very well know security tool named hydra https://github.com/vanhauser-thc/thc-hydra been around since 2001. Then facebook went ahead and named their config tool hydra https://github.com/facebookresearch/hydra on top of it. Like we get it, hydra popular mythology but we could use more original naming for tools
-
Show HN: Hydra 1.0 – open-source column-oriented Postgres
This looks really impressive, and I'm excited to see how it performs on our data!
P.S., I think the name conflicts with Hydra, the configuration management library: https://hydra.cc/
-
Best practice for saving logits/activation values of model in PyTorch Lightning
I've been trying to learn PyTorch Lightning and Hydra in order to use/create my own custom deep learning template (e.g. like this) as it would greatly help with my research workflow. A lot of the work I do requires me to analyse metrics based on the logits/activations of the model.
-
[D] Alternatives to fb Hydra?
However, hydra seems to have several limitations that are really annoying and are making me reconsider my choice. Most problematic is the inability to group parameters together in a multirun. Hydra only supports trying all combinations of parameters, as described in https://github.com/facebookresearch/hydra/issues/1258, which does not seem to be a priority for hydra. Furthermore, hydras optuna optimizer implementation does not allow for early pruning of bad runs, which while not a deal breaker is definitely a nice to have feature.
-
Show HN: Lightweight YAML Config CLI for Deep Learning Projects
Do you hate the fact that they don't let you return the config file: https://github.com/facebookresearch/hydra/issues/407
-
Config management for deep learning
I kind of built this due to frustrations with Hydra. Hydra is an end to end framework, it locks you into a certain DL project format, it decides logging, model saving and a whole host of things. For example Hydra can do the same config file overwriting that I allow but you have to store the config file with the name config.yaml inside a specific folder. On top of that hydra doesn’t let you return the config file from the main function so you have to put all the major logic in the main function itself (link), the authors claim this is by design. I can find Hydra useful for a mature less experimental project. But in my robotics and ML research, I like being able to write code where I want and integrating it how I want, especially when debugging for which I think this package is useful. TLDR; If you just want the config file functionality use my package, if you want a complete DL project manager use Hydra. While hydra implements this config file functionality, it also adds a lot of restrictions to project structure that you might not like.
-
The YAML Document from Hell
For managing configs of ML experiments (where each experiment can override a base config, and "variant" configs can further override the experiment config, etc), Hydra + Yaml + OmegaConf is really nice.
https://hydra.cc/
I admit I don't fully understand all the advanced options in Hydra, but the basic usage is already very useful. A nice guide is here:
https://florianwilhelm.info/2022/01/configuration_via_yaml_a...
- Hydra - namestitev in osnovna uporaba
- Hydra - namestitevt in osnovna uporaba
citus
- SPQR 1.3.0: a production-ready system for horizontal scaling of PostgreSQL
- Citus: PostgreSQL extension that transforms Postgres into a distributed database
-
Figma's Databases team lived to tell the scale
I see they don't mention Citus (https://github.com/citusdata/citus), which is already a fairly mature native Postgres extension. From the details given in the article, in sounds like they just reimplemented it.
I wonder if they were unaware of it or disregarded it for a reason —I currently am in a similar situation as the one described in the blog, trying to shard a massive Postgres DB.
-
PostgreSQL Is Enough
It is possible, if you pay for it. You can do Multi-AZ Clustered Instances in RDS, where you get the benefits of Multi-AZ failover with traffic sharing.
If you can run your own infra – at least on an EC2 level – you can do things like Citus [0] for Postgres, which is about as close to "just add database nodes" as you'll get.
[0]: https://www.citusdata.com/
-
Vitess 18
So while searching for something like this for postgres I came across citus. Any one know how that stacks up?
https://github.com/citusdata/citus
- In-Depth Guide: Citus Technical Readme
-
Revolutionizing Database Scaling with CitusDB
References: CitusDB
- Squeeze the hell out of the system you have
- Show HN: Hydra 1.0 – open-source column-oriented Postgres
- Schema-based sharding comes to PostgreSQL with Citus
What are some alternatives?
dynaconf - Configuration Management for Python ⚙
Greenplum - Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.
ConfigParser
yugabyte-db - YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
python-dotenv - Reads key-value pairs from a .env file and can set them as environment variables. It helps in developing applications following the 12-factor principles.
vitess - Vitess is a database clustering system for horizontal scaling of MySQL.
python-decouple - Strict separation of config from code.
TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
django-environ - Django-environ allows you to utilize 12factor inspired environment variables to configure your Django application.
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
classyconf - Declarative and extensible library for configuration & code separation
stolon - PostgreSQL cloud native High Availability and more.