data_engineering_on_gcp_book
scraper
Our great sponsors
data_engineering_on_gcp_book | scraper | |
---|---|---|
12 | 3 | |
116 | 445 | |
- | - | |
2.6 | 0.0 | |
about 3 years ago | over 2 years ago | |
Go | ||
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
data_engineering_on_gcp_book
-
How possible is it for a beginner to establish pipelines, data warehouse, and visualization solution as a team of 1?
This book will walk you through setting up a complete data engineering stack on GCP: https://github.com/Nunie123/data_engineering_on_gcp_book
-
Python & SQL knowledge needed for ETL?
As for resources, this book goes over a lot of these: https://github.com/Nunie123/data_engineering_on_gcp_book. However, this goes over the 'how', not the 'why'. The only method I know for understanding the 'why' is experience. Whether at work or personal projects.
-
Learning Python and SQL: What should be my next step?
Here's a good book to follow along to introduce you to common tooling and design patterns: https://github.com/Nunie123/data_engineering_on_gcp_book
-
Github Repo with All Data tranformation,Cleaning,Validation
I'm not sure if this is exactly what you're looking for, but here's a book on GitHub that talks about the tools and steps for building data pipelines into a data warehouse: https://github.com/Nunie123/data_engineering_on_gcp_book
-
What is the low hanging fruit for a brand new GCP data engineer to learn?
Check out this book: https://github.com/Nunie123/data_engineering_on_gcp_book
-
Unsure about overall process of data engineering
If you're interested in example of how to build a complete data engineering infrastructure, you should check out this book: https://github.com/Nunie123/data_engineering_on_gcp_book
-
[HELP] Airflow Reverse proxy + load balancer +docker
If you want to try Airflow without the setup headache, you can try Composer on GCP, which is a hosted version of Airflow. I wrote some info on how to do that here: https://github.com/Nunie123/data_engineering_on_gcp_book/blob/master/ch_2_orchestration.md
-
Transition from a Quality engineer to Data engineer
This book might be a good resource for you: https://github.com/Nunie123/data_engineering_on_gcp_book
-
Accepted a data engineer intern role at a Big N company - how do I learn as much as possible?
If you want a place to start on personal projects you can check out this book, https://github.com/Nunie123/data_engineering_on_gcp_book, which will walk you through the basics of setting up a full data engineering stack.
-
What tools, software, programming languages, and etc. does a data engineer need to have in 2021
If you are interested in tooling, here's a free book on setting up a basic data engineering tech stack on GCP: https://github.com/Nunie123/data_engineering_on_gcp_book
scraper
-
Reset Collection Cartridge Art
I previously used a scraper found here https://github.com/sselph/scraper to scrap from screenscraper.fr, but I haven't used it lately and it looks like the project is dead.
-
Is there a scraper in existence that uses file hashes instead of file names?
Thanks, I'm reading the source code now. It looks like the hash comparison is done against OpenVGDB, but I'm also curious how the images are fetched. Would you happen to know this by chance?
- Scrape games with out WiFi
What are some alternatives?
shotcaller - A moddable RTS/MOBA game made with bracket-lib and minigene.
playwright-python - Python version of the Playwright testing and automation library.
FactGraph - FactGraph monorepo (backend + frontend + landing page + blog)
pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)
beubo - Beubo is a free, simple, and minimal CMS with unlimited extensibility using plugins
vopono - Run applications through VPN tunnels with temporary network namespaces
distribyted - Torrent client with HTTP, fuse, and WebDAV interfaces. Start exploring your torrent files right away, even zip, rar, or 7zip archive contents!
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
go-plugin - Golang plugin system over RPC.
Arthur - How to build your own AI art installation from scratch [Moved to: https://github.com/maxvfischer/DIY-ai-art]
dali - Indie assembler/linker for Dalvik VM .dex & .apk files (Work In Progress)
OpenVGDB - OpenVGDB