Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression. Learn more →
Top 23 Python S3 Projects
-
awesome-aws
A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.
-
You can use a library such as moto https://github.com/getmoto/moto
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
Project mention: Amazon S3 Tools: Command Line S3 Client Software and S3 Backup | news.ycombinator.com | 2022-12-02
-
-
Project mention: smart_open: Utils for streaming large files (S3, HDFS, gzip, bz2...) | reddit.com/r/coolgithubprojects | 2022-06-30
-
Project mention: sa7mon/S3Scanner: Scan for open S3 buckets and dump the contents | reddit.com/r/PrivateCyberMiliTec | 2022-11-03
-
Project mention: What are your favourite GitHub repos that shows how data engineering should be done? | reddit.com/r/dataengineering | 2022-11-18
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
Project mention: How to manage large files with Heroku and Amazon S3 Buckets in Django Projects | dev.to | 2022-08-16
Now that we have defined the solution flow, let’s talk about the tools. The first one I want to mention is django-s3direct, a library to directly upload files to the Amazon bucket from the admin panel. Also, it provides a model field that corresponds to the URL stored in the database.
-
s3viewer
Storage Explorer - Publicly open storage viewer (Amazon S3 Bucket, Azure Blob, FTP server, HTTP Index Of/)
-
astro-sdk
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
I think you should take a look at the Astro SDK It’s an open source python package that removes the complexity of writing DAGs , particularly in the context of Extract, Load, Transform (ELT) use cases. Look at the doc here, especially aql.transform, aql.run_raw_sql, etc. That will definitely help you
-
-
glacier_deep_archive_backup
Extremely low cost, off-site backup/restore using AWS S3 Glacier Deep Archive
Project mention: Ask HN: What are your “scratch own itch” projects? | news.ycombinator.com | 2022-11-13Encrypted backup to AWS Glacier Deep Archive ($1/TB/month)
https://github.com/mrichtarsky/glacier_deep_archive_backup
And for ErgodoxEZ:
Compress your keymap so you can add more features without hitting the limit
https://github.com/mrichtarsky/ergodox-compress-keymap
Generate Heatmap from your keypresses so you can see whether your layout is optimal
-
amazon-s3-find-and-forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Project mention: Deleting particular data from S3 External Tables | reddit.com/r/dataengineering | 2022-10-31Take a look at this: https://github.com/awslabs/amazon-s3-find-and-forget We use it for GDPR compliance; it will open a file, delete a row and pack it back. It will modify the file so watch out if you are using Glue job bookmarks. Because you are using external tables, the manifest file will also have to be updated with a proper lenght for the new, updated file. If you have hundreds of tables and thousands of files, and you need to do this on a regular basis this would be the scalable solution, but if you have few files honestly I would do it manually
-
Something like Pathy.
-
-
Project mention: what's the best python client for AWS automation these days? | reddit.com/r/Python | 2023-01-24
- https://github.com/samuelcolvin/aioaws (aiobotocore wrapper)
-
sumologic-aws-lambda
A collection of lambda functions to collect data from Cloudwatch, Kinesis, VPC Flow logs, S3, security-hub and AWS Inspector
Project mention: Serverless Ops 102 - CloudWatch Logs and Centralized Logging with AWS Lambda | dev.to | 2022-05-13Sumologic log forwarder
-
-
Project mention: Best Linux friendly cloud storage services | reddit.com/r/linuxquestions | 2022-10-15
s3fs with a provider like Backblaze will probably be the absolute cheapest you’ll get.
-
benji
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
-
Skytrax-Data-Warehouse
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
-
-
inferencedb
🚀 Stream inferences of real-time ML models in production to any data lake (Experimental)
Project mention: [P] InferenceDB - Makes it easy to store predictions of real-time ML models in S3 | reddit.com/r/MachineLearning | 2022-06-11 -
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python S3 related posts
- s3fs-fuse - allows to mount your s3/minio bucket link to your local directory
- AWS Announces Open Source Mountpoint for Amazon S3
- Amazon S3 Tools: Command Line S3 Client Software and S3 Backup
- sa7mon/S3Scanner: Scan for open S3 buckets and dump the contents
- Deleting particular data from S3 External Tables
- Free tools to analyze internet exposure of S3 buckets?
- Configure Vultr CORS to accept direct uploads from Active Storage
-
A note from our sponsor - InfluxDB
www.influxdata.com | 31 Mar 2023
Index
What are some of the best open-source S3 projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | awesome-aws | 11,351 |
2 | Moto | 6,730 |
3 | s3cmd | 4,136 |
4 | wal-e | 3,369 |
5 | smart_open | 2,813 |
6 | S3Scanner | 2,010 |
7 | DataEngineeringProject | 750 |
8 | django-s3direct | 628 |
9 | s3viewer | 391 |
10 | astro-sdk | 238 |
11 | BucketStore | 218 |
12 | glacier_deep_archive_backup | 207 |
13 | amazon-s3-find-and-forget | 204 |
14 | pathy | 161 |
15 | TileDB-Py | 155 |
16 | aioaws | 148 |
17 | sumologic-aws-lambda | 143 |
18 | s3-credentials | 142 |
19 | s3fs | 142 |
20 | benji | 129 |
21 | Skytrax-Data-Warehouse | 116 |
22 | synapse-s3-storage-provider | 85 |
23 | inferencedb | 74 |