amazon-s3-find-and-forget vs DataEngineeringProject

amazon-s3-find-and-forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR) (by awslabs)

data-lake amazon-s3 S3 Gdpr AWS Parquet data-erasure right-to-be-forgotten Ccpa Big Data Privacy Data

Source Code

Suggest alternative

Edit details

DataEngineeringProject

Example end to end data engineering project. (by damklis)

Big Data Scraping MongoDB ElasticSearch data-engineering Kafka kafka-connect debezium django-rest-framework Redis Airflow Minio S3 Python data-pipeline HacktoberFest

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

amazon-s3-find-and-forget		DataEngineeringProject
	Project
3	Mentions	5
232	Stars	985
0.9%	Growth	-
7.3	Activity	0.0
8 days ago	Latest Commit	over 1 year ago
Python	Language	Python
Apache License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

amazon-s3-find-and-forget

Posts with mentions or reviews of amazon-s3-find-and-forget. We have used some of these posts to build our list of alternatives and similar projects.

Deleting particular data from S3 External Tables
1 project | /r/dataengineering | 31 Oct 2022

Take a look at this: https://github.com/awslabs/amazon-s3-find-and-forget We use it for GDPR compliance; it will open a file, delete a row and pack it back. It will modify the file so watch out if you are using Glue job bookmarks. Because you are using external tables, the manifest file will also have to be updated with a proper lenght for the new, updated file. If you have hundreds of tables and thousands of files, and you need to do this on a regular basis this would be the scalable solution, but if you have few files honestly I would do it manually
Update S3 Files
1 project | /r/aws | 27 Jan 2022

Have a look at S3 Find and Forget
How to handle GDPR requests for data stored in S3 ?
1 project | /r/dataengineering | 22 Nov 2021

S3 Find and Forget is probably worth looking into, even if just to get ideas on how to implement a similar solution for yourself

DataEngineeringProject

Posts with mentions or reviews of DataEngineeringProject. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-11-18.

What are your favourite GitHub repos that shows how data engineering should be done?
4 projects | /r/dataengineering | 18 Nov 2022
Is it me or are beginner-friendly ETL pipeline guides that explain from the ground-up how to incorporate the use of various technologies notoriously difficult to find.
1 project | /r/dataengineering | 23 Jul 2021
Starting A Data Engineering Project Series
1 project | /r/dataengineering | 7 Jun 2021

News RSS Feeds
5 Data Sources for Data Engineering Projects
3 projects | dev.to | 5 Jun 2021

Lastly, the most readily available data source would be data scraped from the internet. To be slightly less vague, I have outlined a project that web-scrapes new online articles every ten minutes to provide all the latest news curated into one place. This project utilizes a wide variety of relevant data engineering tools, which makes it a great project example. The author of this project is Damian Kliś, and he outlines his model architecture below:
Can You Recommend Good Data Engineering Projects
1 project | /r/dataengineering | 18 Feb 2021

Here is my project that got me a few interviews so far: https://github.com/damklis/DataEngineeringProject

What are some alternatives?

When comparing amazon-s3-find-and-forget and DataEngineeringProject you can also consider the following projects:

isp-data-pollution - ISP Data Pollution to Protect Private Browsing History with Obfuscation

blinkist-scraper - 📚 Python tool to download book summaries and audio from Blinkist.com, and generate some pretty output

awesome-aws - A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.

synapse-s3-storage-provider - Synapse storage provider to fetch and store media in Amazon S3

data-toolset - Upgrade from avro-tools and parquet-tools jars to a more user-friendly Python package.

yaetos - Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified

s3-credentials - A tool for creating credentials for accessing S3 buckets

Zillow-Data-Engineering

openwisp-monitoring - Network monitoring system written in Python and Django, designed to be extensible, programmable, scalable and easy to use by end users: once the system is configured, monitoring checks, alerts and metric collection happens automatically.

openverse-catalog - Identifies and collects data on cc-licensed content across web crawl data and public apis.

datajob - Build and deploy a serverless data pipeline on AWS with no effort.

cryptostore - A scalable storage service for cryptocurrency data

amazon-s3-find-and-forget vs isp-data-pollution DataEngineeringProject vs blinkist-scraper amazon-s3-find-and-forget vs awesome-aws DataEngineeringProject vs synapse-s3-storage-provider amazon-s3-find-and-forget vs data-toolset DataEngineeringProject vs yaetos amazon-s3-find-and-forget vs s3-credentials DataEngineeringProject vs Zillow-Data-Engineering DataEngineeringProject vs openwisp-monitoring DataEngineeringProject vs openverse-catalog DataEngineeringProject vs datajob DataEngineeringProject vs cryptostore

Compare amazon-s3-find-and-forget vs DataEngineeringProject and see what are their differences.

amazon-s3-find-and-forget

DataEngineeringProject

amazon-s3-find-and-forget

DataEngineeringProject

What are some alternatives?