AWS Data Wrangler
Trapheus
Our great sponsors
AWS Data Wrangler | Trapheus | |
---|---|---|
9 | 5 | |
3,779 | 96 | |
1.5% | - | |
9.4 | 9.3 | |
about 21 hours ago | 21 days ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
AWS Data Wrangler
-
Read files from s3 using Pandas/s3fs or AWS Data Wrangler?
I had no problem with awswrangler (https://github.com/aws/aws-sdk-pandas) and it supports reading and writing partitions which was really helpful and a few other optimizations that made it a great tool
-
Go+: Go designed for data science
Yep, agreed. Go is a great language for AWS Lambda type workflows.
Python isn't as great (Python Lambda Layers built on Macs don't always work). AWS Data Wrangler (https://github.com/awslabs/aws-data-wrangler) provides pre-built layers, which is a work around, but something that's as portable as Go would be the best solution.
Trapheus
We haven't tracked posts mentioning Trapheus yet.
Tracking mentions began in Dec 2020.
What are some alternatives?
PyAthena - PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.
Optimus - :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
ga-extractor - Tool for extracting Google Analytics data suitable for migrating to other platforms/databases
python-mysql-replication - Pure Python Implementation of MySQL replication protocol build on top of PyMYSQL
gonum - Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
openuds - OpenUDS Is an Open Source Source multiplatform connection broker, created by Spanish Company Virtualcable S.L.U. and released under Open Source with the help of several Spanish Universities.
Redash - Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
zef - Toolkit for graph-relational data across space and time
getting-started - Getting started with Docker
getting-started - This repository is a getting started guide to Singer.
dagster-example-pipeline - Template Dagster repo using poetry and a single Docker container; works well with CICD
gophernotes - The Go kernel for Jupyter notebooks and nteract.