dbd
sqlmesh
dbd | sqlmesh | |
---|---|---|
4 | 12 | |
55 | 1,281 | |
- | 7.7% | |
0.0 | 9.9 | |
about 2 years ago | 4 days ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dbd
-
Easy loading Kaggle dataset to a database
I've created two examples of how to use the dbd tool to load Kaggle dataset data files (csv, json, xls, parquet) to your Postgres, MySQL, or SQLite database.Basically, you don't have to create any tables, nor run any SQL INSERT or COPY statements. Everything is automated. You just reference the datasets and files with a URL and execute a 'dbd run' command.The examples are here. Perhaps you find it useful. Let me know, what you think!
-
Easy loading dataset files to a database
I've created two examples of how to use the [dbd](https://github.com/zsvoboda/dbd) tool to load Kaggle dataset data files (csv, json, xls, parquet) to your Postgres, MySQL, or SQLite database.
-
dbd: create your database from data files on your directory
I work on the new open-sourced tool called dbd that enables you to load data from your local data files to your database and transform it using insert-from-select statements. The tool supports templating (Jinja2). It works with Postgres, MySQL, SQLite, Snowflake, Redshift, and BigQuery.
-
New opensource ELT tool
I was looking for some declarative ELT tool for creating my analytics solutions, and DBT was the closest I've found. I liked its concept, but I came across quite a few limitations when I wanted to use it. I couldn't specify and create basic things like data types, indexes, primary/foreign keys, etc. In the end, I decided to implement my own - more straightforward and more flexible. I've published the result - dbd on GitHub. Perhaps, you can find it helpful. Your feedback is greatly appreciated!
sqlmesh
-
Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative
There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.
-
DBT lays off 15% of their staff
I agree with you that they don't have a competitor yet. I think https://sqlmesh.com will be that competitor in the not too distant future though.
- SQL Mesh - Auto DAG generation!!
-
Data transformation tools other than DBT
SQLMesh is a new SQL templating framework that addresses some of dbt's biggest gaps (column lineage, unit testing). It's not an enterprise solution, but it's an interesting project. https://github.com/TobikoData/sqlmesh
-
Semantic Understanding of SQL
It’s a part of the SQLMesh IDE: https://github.com/TobikoData/sqlmesh
- Virtual Data Environments
- Blog Post on how DoorDash used the metrics layer to scale and standardize Metrics for Experimentation
- A dbt killer is born (SQLMesh)
-
SQLMesh: The future of DataOps
If you don't plan on using Airflow, you can just add a custom connection implementation using one of the existing ones as a reference.
What are some alternatives?
Skytrax-Data-Warehouse - A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
ethereum-etl - Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
dbt-coves - CLI tool for dbt users to simplify creation of staging models (yml and sql) files
pgsync - Postgres to Elasticsearch/OpenSearch sync
sayn - Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
data-toolset - Upgrade from avro-tools and parquet-tools jars to a more user-friendly Python package.
astro-sdk - Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
api - Moved to https://github.com/covid19india/data/
versatile-data-kit - One framework to develop, deploy and operate data workflows with Python and SQL.
pydwt - Modeling tool like DBT to use SQL Alchemy core with a DataFrame interface like
astro - Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow. [Moved to: https://github.com/astronomer/astro-sdk]