OpenMetadata
dbt-expectations
OpenMetadata | dbt-expectations | |
---|---|---|
26 | 10 | |
4,140 | 947 | |
4.9% | 2.4% | |
10.0 | 6.6 | |
6 days ago | 9 days ago | |
TypeScript | Shell | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenMetadata
-
How to Dynamically Adjust the Height of a Textarea in ReactJS
In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.
-
Blog - Project Nessie: A Look in the Depths
How does this compare with https://github.com/open-metadata/OpenMetadata
-
What is your favorite data catalog?
u/cmcau try https://open-metadata.org much easier to setup , for details https://docs.open-metadata.org and for any support https://slack.open-metadata.org
-
Data Governance Hands On with Amazon DataZone
Then, a pool of tools appeared on the market with features that allow covering some of the challenges cited, especially those related to data cataloging. Informatica's tool is perhaps the best known among the licensed. Among the open source tools, I highlight Data Hub (www.datahubproject.io) developed on LinkedIn, Open Metadata (https://open-metadata.org/) and Amundsen (https://www.amundsen.io /) powered by Lyft. In addition to cataloging and discovering data artifacts, these tools allow for a view of data lineage, including technical documentation and business terms, and building relationships between data artifacts. Also, it is possible to register data owners, the people responsible for the data in those tools. This greatly facilitates access request and evaluation process (which today is a major bottleneck).
-
What OSS are you using for data contracts?
Probably, in order to have it integrate with tools like OpenLineage and OpenMetadata and such I will have to make open-source contributions.
-
Thoughts around decube.io (data observability and catalog platform)
We are the team behind OpenMetadata . Our mission is to build a centralized metadata platform that offers data discovery, collaboration, governance and quality. We believe that having tool for each of these categories not only result user frustration but metadata silos.
-
Great expectations?
As anyone ever tried open metadata for data QA testing? Curious about that https://open-metadata.org/
-
Our data catalog is difficult to manage and not built for the wider org - what can we do?
We're looking to PoC https://open-metadata.org/ shortly
-
Looking for an open-source data lineage app, where objects and connections can be manually defined (not just automatically ingested)
Hello everyone, I'm looking for an open-source data lineage app (e.g. tokern, datahubproject, openmetadata).
-
Ask HN: Do you use JSON Schema? Help us shape its future stability guarantees
We at OpenMetadata(https://open-metadata.org) use JsonSchema extensively to define the metadata standards. JsonSchema is one of the reasons we are able to ship and get the project to what it is today in quick time. More about it here https://www.youtube.com/watch?v=ZrVTZwmTR3k
dbt-expectations
-
Dbt tests vs Soda SQL
Have not used Soda, but dbt indeed is pretty good especially when adding dbt-expectations
-
Data-eng related highlights from the latest Thoughtworks Tech Radar
dbt-expectations
-
Data Quality Dimensions: Assuring Your Data Quality with Great Expectations
I highly.. highly.. recommend the dbt-expectations extension from Catologica for dbt. It's a port of Great Expectations, except you can quickly thunk it in your schema.yml's and have it run as part of your dbt test process. Super powerful and it's prevented us from shipping bad data many times.
-
Managing SQL Tests
I'm used to utilising dbt and defining my tests there (along with dbt-utils or https://github.com/calogica/dbt-expectations): I simply add a list item to a column definition and can already define a great number of tests without having to copy code. I can even extend the pre-defined using generic tests. Writing custom tests also integrates nicely. Additionally it's very convenient to tag tests or define a severity. The learning curve for a business engineer is almost flat as long as they know some SQL.
-
What are some Data Quality check related frameworks for datasets ranging from 100GB to 1TB in size?
Use dbt's testing functionality during your transformations with catalogica/dbt-expectations (Great Expectations framework ported to dbt)
-
Great Expectations is annoyingly cumbersome
Check out dbt-expectations https://github.com/calogica/dbt-expectations
-
CI/CD in data engineering - help a noob
There are certain things I would like to add such as data quality, I can use something like dbt great expectations, but I am not sure how much more I should force it before getting an airflow setup..
- How do you query and quality check data produced in intermediate steps in analytics pipeline?
-
ETL Pipelines with Airflow: The Good, the Bad and the Ugly
[dbt Labs employee here]
Check out dbt-expectations package[1]. It's a port of the Great Expectations checks to dbt as tests. The advantage of this is you don't need another tool for these pretty standard tests, and can be early incorporated into dbt workflows.
[1] https://github.com/calogica/dbt-expectations
-
Unit testing SQL in DBT
Also check out dbt-expectations that is a port of Great Expectations that greatly expands the configurable (non-assert) tests.
What are some alternatives?
datahub - The Metadata Platform for your Data Stack
dbt-utils - Utility functions for dbt projects.
marquez - Collect, aggregate, and visualize a data ecosystem's metadata
dbt-oracle - A dbt adapter for oracle db backend
odd-platform - First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
materialize - The data warehouse for operational workloads.
Hyperactive - An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.
Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
NVTabular - NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Draft.js - A React framework for building text editors.
cuetils - CLI and library for diff, patch, and ETL operations on CUE, JSON, and Yaml