Python S3

Open-source Python projects categorized as S3

Top 23 Python S3 Projects

  • awesome-aws

    A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.

  • Moto

    A library that allows you to easily mock out tests based on AWS infrastructure.

    Project mention: Unit testing Athena ETL? | reddit.com/r/aws | 2023-03-16

    You can use a library such as moto https://github.com/getmoto/moto

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • s3cmd

    Official s3cmd repo -- Command line tool for managing Amazon S3 and CloudFront services

    Project mention: Amazon S3 Tools: Command Line S3 Client Software and S3 Backup | news.ycombinator.com | 2022-12-02
  • wal-e

    Continuous Archiving for Postgres

  • smart_open

    Utils for streaming large files (S3, HDFS, gzip, bz2...)

    Project mention: smart_open: Utils for streaming large files (S3, HDFS, gzip, bz2...) | reddit.com/r/coolgithubprojects | 2022-06-30
  • S3Scanner

    Scan for open S3 buckets and dump the contents

    Project mention: sa7mon/S3Scanner: Scan for open S3 buckets and dump the contents | reddit.com/r/PrivateCyberMiliTec | 2022-11-03
  • DataEngineeringProject

    Example end to end data engineering project.

    Project mention: What are your favourite GitHub repos that shows how data engineering should be done? | reddit.com/r/dataengineering | 2022-11-18
  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • django-s3direct

    Directly upload files to S3 compatible services with Django.

    Project mention: How to manage large files with Heroku and Amazon S3 Buckets in Django Projects | dev.to | 2022-08-16

    Now that we have defined the solution flow, let’s talk about the tools. The first one I want to mention is django-s3direct, a library to directly upload files to the Amazon bucket from the admin panel. Also, it provides a model field that corresponds to the URL stored in the database.

  • s3viewer

    Storage Explorer - Publicly open storage viewer (Amazon S3 Bucket, Azure Blob, FTP server, HTTP Index Of/)

  • astro-sdk

    Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

    Project mention: Most ideal Airflow task structure? | reddit.com/r/dataengineering | 2023-03-28

    I think you should take a look at the Astro SDK It’s an open source python package that removes the complexity of writing DAGs , particularly in the context of Extract, Load, Transform (ELT) use cases. Look at the doc here, especially aql.transform, aql.run_raw_sql, etc. That will definitely help you

  • BucketStore

    A simple library for interacting with Amazon S3.

  • glacier_deep_archive_backup

    Extremely low cost, off-site backup/restore using AWS S3 Glacier Deep Archive

    Project mention: Ask HN: What are your “scratch own itch” projects? | news.ycombinator.com | 2022-11-13

    Encrypted backup to AWS Glacier Deep Archive ($1/TB/month)

    https://github.com/mrichtarsky/glacier_deep_archive_backup

    And for ErgodoxEZ:

    Compress your keymap so you can add more features without hitting the limit

    https://github.com/mrichtarsky/ergodox-compress-keymap

    Generate Heatmap from your keypresses so you can see whether your layout is optimal

    https://github.com/mrichtarsky/ergodox-heatmap

  • amazon-s3-find-and-forget

    Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

    Project mention: Deleting particular data from S3 External Tables | reddit.com/r/dataengineering | 2022-10-31

    Take a look at this: https://github.com/awslabs/amazon-s3-find-and-forget We use it for GDPR compliance; it will open a file, delete a row and pack it back. It will modify the file so watch out if you are using Glue job bookmarks. Because you are using external tables, the manifest file will also have to be updated with a proper lenght for the new, updated file. If you have hundreds of tables and thousands of files, and you need to do this on a regular basis this would be the scalable solution, but if you have few files honestly I would do it manually

  • pathy

    a python Path interface for file and cloud bucket storage

    Project mention: Pathlib is cool | reddit.com/r/Python | 2022-07-28

    Something like Pathy.

  • TileDB-Py

    Python interface to the TileDB storage engine

  • aioaws

    Asyncio compatible SDK for aws services.

    Project mention: what's the best python client for AWS automation these days? | reddit.com/r/Python | 2023-01-24

    - https://github.com/samuelcolvin/aioaws (aiobotocore wrapper)

  • sumologic-aws-lambda

    A collection of lambda functions to collect data from Cloudwatch, Kinesis, VPC Flow logs, S3, security-hub and AWS Inspector

    Project mention: Serverless Ops 102 - CloudWatch Logs and Centralized Logging with AWS Lambda | dev.to | 2022-05-13

    Sumologic log forwarder

  • s3-credentials

    A tool for creating credentials for accessing S3 buckets

  • s3fs

    Amazon S3 filesystem for PyFilesystem2 (by PyFilesystem)

    Project mention: Best Linux friendly cloud storage services | reddit.com/r/linuxquestions | 2022-10-15

    s3fs with a provider like Backblaze will probably be the absolute cheapest you’ll get.

  • benji

    Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices

  • Skytrax-Data-Warehouse

    A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.

  • synapse-s3-storage-provider

    Synapse storage provider to fetch and store media in Amazon S3

  • inferencedb

    🚀 Stream inferences of real-time ML models in production to any data lake (Experimental)

    Project mention: [P] InferenceDB - Makes it easy to store predictions of real-time ML models in S3 | reddit.com/r/MachineLearning | 2022-06-11
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-03-28.

Python S3 related posts

Index

What are some of the best open-source S3 projects in Python? This list will help you:

Project Stars
1 awesome-aws 11,351
2 Moto 6,730
3 s3cmd 4,136
4 wal-e 3,369
5 smart_open 2,813
6 S3Scanner 2,010
7 DataEngineeringProject 750
8 django-s3direct 628
9 s3viewer 391
10 astro-sdk 238
11 BucketStore 218
12 glacier_deep_archive_backup 207
13 amazon-s3-find-and-forget 204
14 pathy 161
15 TileDB-Py 155
16 aioaws 148
17 sumologic-aws-lambda 143
18 s3-credentials 142
19 s3fs 142
20 benji 129
21 Skytrax-Data-Warehouse 116
22 synapse-s3-storage-provider 85
23 inferencedb 74
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com