aws-data-wrangler
dagster-example-pipeline
Our great sponsors
aws-data-wrangler | dagster-example-pipeline | |
---|---|---|
1 | 1 | |
3,559 | 64 | |
- | - | |
10.0 | 0.0 | |
9 months ago | about 2 years ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
aws-data-wrangler
-
Interacting with Amazon S3 using AWS Data Wrangler (awswrangler) SDK for Pandas: A Comprehensive Guide
AWS Data Wrangler GitHub Repository: https://github.com/awslabs/aws-data-wrangler
dagster-example-pipeline
-
Developing in Dagster
The associated code repo can be found here
What are some alternatives?
boto3 - AWS SDK for Python
mlrun - MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
Trapheus - This tool automates restoration of RDS database instances from snapshots into any dev, staging or production environments. It supports individual RDS Snapshot as well as cluster snapshot restore operations.
Apache Superset - Apache Superset is a Data Visualization and Data Exploration Platform [Moved to: https://github.com/apache/superset]
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
AWS Data Wrangler - pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
ray_snowflake - Ray Data Connector for Snowflake
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
demo-code - Bits of code I use during live demos
canarypy - CanaryPy - A light and powerful canary release for Data Pipelines
aws-simple-websocket - Using AWS's API Gateway + Lambda to run a simple websocket application. For learning/testing.
portable-data-stack-dagster - A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB, PostgreSQL and Superset