great_expectations
Poetry
Our great sponsors
great_expectations | Poetry | |
---|---|---|
15 | 377 | |
9,466 | 29,483 | |
2.0% | 2.6% | |
9.9 | 9.7 | |
2 days ago | 4 days ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
great_expectations
-
Data Quality at Scale with Great Expectations, Spark, and Airflow on EMR
Great Expectations (GE) is an open-source data validation tool that helps ensure data quality.
- Looking for Unit Testing framework in Database Migration Process
-
Soda Core (OSS) is now GA! So, why should you add checks to your data pipelines?
GE is arguably the most well known OSS alternative to Soda Core. The third option is deequ, originally developed and released in OSS by AWS. Our community has told us that Soda Core is different because itβs easy to get going and embed into data pipelines. And it also allows some of the check authoring work to be moved to other members of the data team. I'm sure there are also scenarios where Soda Core is not the best option. For example, when you only use Pandas dataframes or develop in Scala.
- Greatexpectations - Always know what to expect from your data.
- Greatexpectations β Always know what to expect from your data
-
Package for drift detection
great_expectations: https://github.com/great-expectations/great_expectations
-
[D] Do you use data engineering pipelines for real life projects?
For example I just found "Great Expectations" and "Kedro", "Flyte" and I was wondering at which point in time and project complexity should we choose one of these tools instead of the ancient cave man way?
-
Data pipeline suggestions
Testing: GreatExpectations
-
Where can I find free data engineering ( big data) projects online?
Ingestion / ETL: Airbyte, Singer, Jitsu Transformation: dbt Orchestration: Airflow, Dagster Testing: GreatExpectations Observability: Monosi Reverse ETL: Grouparoo, Castled Visualization: Lightdash, Superset
- [P] Deepchecks: an open-source tool for high standards validations for ML models and data.
Poetry
-
Understanding Dependencies in Programming
You can manage dependencies in Python with the package manager pip, which comes pre-installed with Python. Pip allows you to install and uninstall Python packages, and it uses a requirements.txt file to keep track of which packages your project depends on. However, pip does not have robust dependency resolution features or isolate dependencies for different projects; this is where tools like pipenv and poetry come in. These tools create a virtual environment for each project, separating the project's dependencies from the system-wide Python environment and other projects.
-
Implementing semantic image search with Amazon Titan and Supabase Vector
Poetry provides packaging and dependency management for Python. If you haven't already, install poetry via pip:
-
From Kotlin Scripting to Python
Poetry
-
How to Enhance Content with Semantify
The Semantify repository provides an example Astro.js project. Ensure you have poetry installed, then build the project from the root of the repository:
-
Uv: Python Packaging in Rust
Has anyone else been paying attention to how hilariously hard it is to package PyTorch in poetry?
https://github.com/python-poetry/poetry/issues/6409
-
Boring Python: dependency management (2022)
Based on this comment 5 days ago[0], it's working? I'm not sure didn't dig in too far but based on that comment it seems fair to say that it's not fully Poetry's fault because torch removed hashes (which poetry needs to be effective) for a while only recently adding it back in.
Not sure where I would stand if I fully investigated it tho.
[0] https://github.com/python-poetry/poetry/issues/6409#issuecom...
-
Fun with Avatars: Crafting the core engine | Part. 1
We will be running this project in Python 3.10 on Mac/Linux, and we will use Poetry to manage our dependencies. Later, we will bundle our app into a container using docker for deployment.
-
Python Packaging, One Year Later: A Look Back at 2023 in Python Packaging
Here are the two main packaging issues I run into, specifically when using Poetry:
1) Lack of support for building extension modules (as mentioned by the article). There is a workaround using an undocumented feature [0], which I've tried, but ultimately decided it was not the right approach. I still use Poetry, but build the extension as a separate step in CI, rather than kludging it into Poetry.
2) Lack of support for offline installs [1], e.g. being able to download the dependencies, copy them to another machine, and perform the install from the downloaded dependencies (similar to using "pip --no-index --find-links=."). Again, you can work around this (by using "poetry export --with-credentials" and "pip download" for fetching the dependencies, then firing up pypiserver [2] to run a local PyPI server on the offline machine), but ideally this would all be a first class feature of Poetry, similar to how it is in pip.
I don't have the capacity to create Pull Requests for addressing these issues with Poetry, and I'm very grateful for the maintainers and those who do contribute. Instead, on the linked issues I share my notes on the matter, in the hope that it may at least help others and potentially get us closer to a solution.
Regardless, I'm sticking with Poetry for now. Though to be fair, the only other Python packaging tools I've used extensively are Pipenv and pip/setuptools. It's time consuming to thoroughly try out these other packaging tools, and is generally lower priority than developing features/fixing bugs, so it's helpful to read about the author's experience with these other tools, such as PDM and Hatch.
[0] https://github.com/python-poetry/poetry/issues/2740
[1] https://github.com/python-poetry/poetry/issues/2184
[2] https://pypi.org/project/pypiserver/
-
Introducing Flama for Robust Machine Learning APIs
We believe that poetry is currently the best tool for this purpose, besides of being the most popular one at the moment. This is why we will use poetry to manage the dependencies of our project throughout this series of posts. Poetry allows you to declare the libraries your project depends on, and it will manage (install/update) them for you. Poetry also allows you to package your project into a distributable format and publish it to a repository, such as PyPI. We strongly recommend you to learn more about this tool by reading the official documentation.
-
How do you resolve dependency conflicts?
I started using poetry. The problem is poetry will not install if there is dependency conflict and there is no way to ignore: github
What are some alternatives?
evidently - Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b
Pipenv - Python Development Workflow for Humans.
kedro-great - The easiest way to integrate Kedro and Great Expectations
PDM - A modern Python package and dependency manager supporting the latest PEP standards
deepchecks - Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.
hatch - Modern, extensible Python project management
re_data - re_data - fix data issues before your users & CEO would discover them π
pyenv - Simple Python version management
streamlit - Streamlit β A faster way to build and share data apps.
pip-tools - A set of tools to keep your pinned Python dependencies fresh.
seldon-core - An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
virtualenv - Virtual Python Environment builder