incubation-engineering
databooks
incubation-engineering | databooks | |
---|---|---|
18 | 2 | |
- | 103 | |
- | 1.0% | |
- | 6.6 | |
- | 7 months ago | |
Python | ||
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
incubation-engineering
-
Why Postgres RDS didn't work for us
However if you really want to optimize data currently residing in Postgres for analytical workloads, as the original comment suggests - consider moving to a dedicated OLAP DB like ClickHouse.
See results from Gitlab benchmarking ClickHouse vs TimescaleDB: https://gitlab.com/gitlab-org/incubation-engineering/apm/apm...
Key findings:
-
Automating Your Homelab with Proxmox, Cloud-init, Terraform, and Ansible
ansible: stage: configure image: alpine rules: - if: $ANSIBLE_SETUP_VM != "" && $ANSIBLE_SETUP_HOST != "" variables: ANSIBLE_HOST_KEY_CHECKING: "False" script: - apk add curl bash openssh python3 py3-pip - pip3 install ansible paramiko - ansible-galaxy collection install -r ansible/requirements.yml - curl --silent "https://gitlab.com/gitlab-org/incubation-engineering/mobile-devops/download-secure-files/-/raw/main/installer" | bash - mkdir /root/.ssh && cp .secure_files/ansible.priv /root/.ssh/id_rsa && chmod 600 /root/.ssh/id_rsa - ansible-playbook ansible/main.yml -i ansible/inventory --extra-vars vyos_host=$ANSIBLE_SETUP_VM --limit $ANSIBLE_SETUP_HOST,$ANSIBLE_SETUP_VM ```
-
Float Compression 3: Filters
Interesting to match with the observations from the practice of using ClickHouse[1][2] for time series:
1. Reordering to SOA helps a lot - this is the whole point of column-oriented databases.
2. Specialized codecs like Gorilla[3], DoubleDelta[4], and FPC[5] lose to simply using ZSTD[6] compression in most cases, both in compression ratio and in performance.
3. Specialized time-series DBMS like InfluxDB or TimescaleDB lose to general-purpose relational OLAP DBMS like ClickHouse [7][8][9].
[1] https://clickhouse.com/blog/optimize-clickhouse-codecs-compr...
[2] https://github.com/ClickHouse/ClickHouse
[3] https://clickhouse.com/docs/en/sql-reference/statements/crea...
[4] https://clickhouse.com/docs/en/sql-reference/statements/crea...
[5] https://clickhouse.com/docs/en/sql-reference/statements/crea...
[6] https://github.com/facebook/zstd/
[7] https://arxiv.org/pdf/2204.09795.pdf "SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things" (2022)
[8] https://gitlab.com/gitlab-org/incubation-engineering/apm/apm... https://gitlab.com/gitlab-org/incubation-engineering/apm/apm...
[9] https://www.sciencedirect.com/science/article/pii/S187705091...
- ClickHouse Cloud is now in Public Beta
-
Dokter 1.4.0 released
Documentation of rules is now available: https://gitlab.com/gitlab-org/incubation-engineering/ai-assist/dokter/-/blob/main/docs/overview.md
- Dokter: the doctor for your Dockerfiles
databooks
-
Show HN: Arguably – The best Python CLI library, arguably
I spent the past few weeks working on `arguably`.
Other CLI libraries like `click` and `typer` are great, but I wanted to make one that “disappears” instead of making you put `@click.option` or `typer.Option` everywhere (as happened [here](https://github.com/datarootsio/databooks/blob/39badd2c9cbdfa...)). For most cases, you decorate a function with `@arguably.command`, and it just does what you'd expect:
* Positional args for the function become positional CLI args
-
MLFlow users, what would you want from an integration with GitLab?
If you're working on diffs for Jupyter Notebooks, it's worth looking into this: https://github.com/datarootsio/databooks
What are some alternatives?
hadolint - Dockerfile linter, validate inline bash, written in Haskell
orchest - Build data pipelines, the easy way 🛠️
ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
erudito - Erudito: Easy API/CLI to ask questions about your documentation
rich-argparse - A rich help formatter for argparse
v4
pyintelowl - Robust Python SDK and Command Line Client for interacting with IntelOwl's API.
ClickBench - ClickBench: a Benchmark For Analytical Databases
meta-spy - 👾 CLI MetaSpy (Facebook, Instagram) scraper and crawler - instagram account, facebook accounts, pages and search
clickhouse-operator - Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse clusters running on Kubernetes