incubation-engineering
mgbench
incubation-engineering | mgbench | |
---|---|---|
18 | 1 | |
- | 17 | |
- | - | |
- | 10.0 | |
- | almost 2 years ago | |
Python | ||
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
incubation-engineering
-
Why Postgres RDS didn't work for us
However if you really want to optimize data currently residing in Postgres for analytical workloads, as the original comment suggests - consider moving to a dedicated OLAP DB like ClickHouse.
See results from Gitlab benchmarking ClickHouse vs TimescaleDB: https://gitlab.com/gitlab-org/incubation-engineering/apm/apm...
Key findings:
-
Automating Your Homelab with Proxmox, Cloud-init, Terraform, and Ansible
ansible: stage: configure image: alpine rules: - if: $ANSIBLE_SETUP_VM != "" && $ANSIBLE_SETUP_HOST != "" variables: ANSIBLE_HOST_KEY_CHECKING: "False" script: - apk add curl bash openssh python3 py3-pip - pip3 install ansible paramiko - ansible-galaxy collection install -r ansible/requirements.yml - curl --silent "https://gitlab.com/gitlab-org/incubation-engineering/mobile-devops/download-secure-files/-/raw/main/installer" | bash - mkdir /root/.ssh && cp .secure_files/ansible.priv /root/.ssh/id_rsa && chmod 600 /root/.ssh/id_rsa - ansible-playbook ansible/main.yml -i ansible/inventory --extra-vars vyos_host=$ANSIBLE_SETUP_VM --limit $ANSIBLE_SETUP_HOST,$ANSIBLE_SETUP_VM ```
-
Float Compression 3: Filters
Interesting to match with the observations from the practice of using ClickHouse[1][2] for time series:
1. Reordering to SOA helps a lot - this is the whole point of column-oriented databases.
2. Specialized codecs like Gorilla[3], DoubleDelta[4], and FPC[5] lose to simply using ZSTD[6] compression in most cases, both in compression ratio and in performance.
3. Specialized time-series DBMS like InfluxDB or TimescaleDB lose to general-purpose relational OLAP DBMS like ClickHouse [7][8][9].
[1] https://clickhouse.com/blog/optimize-clickhouse-codecs-compr...
[2] https://github.com/ClickHouse/ClickHouse
[3] https://clickhouse.com/docs/en/sql-reference/statements/crea...
[4] https://clickhouse.com/docs/en/sql-reference/statements/crea...
[5] https://clickhouse.com/docs/en/sql-reference/statements/crea...
[6] https://github.com/facebook/zstd/
[7] https://arxiv.org/pdf/2204.09795.pdf "SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things" (2022)
[8] https://gitlab.com/gitlab-org/incubation-engineering/apm/apm... https://gitlab.com/gitlab-org/incubation-engineering/apm/apm...
[9] https://www.sciencedirect.com/science/article/pii/S187705091...
- ClickHouse Cloud is now in Public Beta
-
Dokter 1.4.0 released
Documentation of rules is now available: https://gitlab.com/gitlab-org/incubation-engineering/ai-assist/dokter/-/blob/main/docs/overview.md
- Dokter: the doctor for your Dockerfiles
mgbench
What are some alternatives?
hadolint - Dockerfile linter, validate inline bash, written in Haskell
ClickBench - ClickBench: a Benchmark For Analytical Databases
ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
orchest - Build data pipelines, the easy way 🛠️
v4
databooks - A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.
clickhouse-operator - Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse clusters running on Kubernetes
hosts - 🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
duckdb - DuckDB is an in-process SQL OLAP Database Management System
spyql - Query data on the command line with SQL-like SELECTs powered by Python expressions
clickhouse-sink-connector - Replicate data from MySQL, Postgres and MongoDB to ClickHouse