FactGraph
data_engineering_on_gcp_book
Our great sponsors
FactGraph | data_engineering_on_gcp_book | |
---|---|---|
1 | 12 | |
1 | 116 | |
- | - | |
0.0 | 2.6 | |
almost 5 years ago | about 3 years ago | |
GNU Affero General Public License v3.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
FactGraph
-
What is your “I don't care if this succeeds” project?
I used to have a project like this. I was going to call it FactGraph: https://github.com/FactGraph/FactGraph/wiki
My idea was to build up a big community-maintained database containing facts and evidence, where everything is linked into a huge network. Everything would have a weight (sometimes automatically calculated from parent nodes), and the software would calculate probabilities for some big questions. Every user could also build their own personalized graph to explore their own worldview, and maybe even uncover some cognitive dissonance that they weren't aware of. Or you could use it to compare and contrast different philosophies, religions. Could even calculate a "coherence score" for each religion and denomination after crunching all of the available evidence.
Then I discovered RootClaim: https://www.rootclaim.com
They're doing something very similar, with a more targeted approach where they focus on some specific questions. e.g. COVID-19: https://www.rootclaim.com/analysis/what-is-the-source-of-cov...
RootClaim really seems to be nailing it so far, and hopefully they can continue to grow and become something like the project I was imagining.
data_engineering_on_gcp_book
-
How possible is it for a beginner to establish pipelines, data warehouse, and visualization solution as a team of 1?
This book will walk you through setting up a complete data engineering stack on GCP: https://github.com/Nunie123/data_engineering_on_gcp_book
-
Python & SQL knowledge needed for ETL?
As for resources, this book goes over a lot of these: https://github.com/Nunie123/data_engineering_on_gcp_book. However, this goes over the 'how', not the 'why'. The only method I know for understanding the 'why' is experience. Whether at work or personal projects.
-
Learning Python and SQL: What should be my next step?
Here's a good book to follow along to introduce you to common tooling and design patterns: https://github.com/Nunie123/data_engineering_on_gcp_book
-
Github Repo with All Data tranformation,Cleaning,Validation
I'm not sure if this is exactly what you're looking for, but here's a book on GitHub that talks about the tools and steps for building data pipelines into a data warehouse: https://github.com/Nunie123/data_engineering_on_gcp_book
-
What is the low hanging fruit for a brand new GCP data engineer to learn?
Check out this book: https://github.com/Nunie123/data_engineering_on_gcp_book
-
Unsure about overall process of data engineering
If you're interested in example of how to build a complete data engineering infrastructure, you should check out this book: https://github.com/Nunie123/data_engineering_on_gcp_book
-
[HELP] Airflow Reverse proxy + load balancer +docker
If you want to try Airflow without the setup headache, you can try Composer on GCP, which is a hosted version of Airflow. I wrote some info on how to do that here: https://github.com/Nunie123/data_engineering_on_gcp_book/blob/master/ch_2_orchestration.md
-
Transition from a Quality engineer to Data engineer
This book might be a good resource for you: https://github.com/Nunie123/data_engineering_on_gcp_book
-
Accepted a data engineer intern role at a Big N company - how do I learn as much as possible?
If you want a place to start on personal projects you can check out this book, https://github.com/Nunie123/data_engineering_on_gcp_book, which will walk you through the basics of setting up a full data engineering stack.
-
What tools, software, programming languages, and etc. does a data engineer need to have in 2021
If you are interested in tooling, here's a free book on setting up a basic data engineering tech stack on GCP: https://github.com/Nunie123/data_engineering_on_gcp_book
What are some alternatives?
dali - Indie assembler/linker for Dalvik VM .dex & .apk files (Work In Progress)
shotcaller - A moddable RTS/MOBA game made with bracket-lib and minigene.
decent-signal - A decent WebRTC signalling library.
beubo - Beubo is a free, simple, and minimal CMS with unlimited extensibility using plugins
noteworthy - Noteworthy is a collection of experimental meta-protocols for building, deploying and managing distributed overlay networks.
distribyted - Torrent client with HTTP, fuse, and WebDAV interfaces. Start exploring your torrent files right away, even zip, rar, or 7zip archive contents!
go-plugin - Golang plugin system over RPC.
electron-browser-shell - A minimal, tabbed web browser with support for Chrome extensions—built on Electron.
meal-scheduler
scraper - Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
vopono - Run applications through VPN tunnels with temporary network namespaces