Our great sponsors
- InfluxDB - Collect and Analyze Billions of Data Points in Real Time
- Onboard AI - Learn any GitHub repo in 59 seconds
- SaaSHub - Software Alternatives and Reviews
-
Here is a list of open source projects that are said to be awesome for beginners.
-
Apache Spark
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
If you’re still new to development in general and not that comfortable with development tools (using an IDE, the terminal, etc.) check out this link: the missing semester in your CS education. It covers the more practical sides of coding that aren’t taught in university courses. Learn this along the way.
-
Apache Airflow
-
dbt-core
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
dbt Core
-
Apache Parquet
-
Apache Avro
-
Onboard AI
Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.
-
sqlfluff
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
SQLFluff
-
Apache Arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
Apache Arrow
-
Apache Cassandra
-
Apache Hadoop
-
Apache Kafka
-
delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)
Delta Lake
-
Apache Pinot
-
Apache NiFi
-
Apache Hudi
-
-
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Although Trino (formerly Presto) is in the awesome for beginners list, it’s also a really good DE project as it is a distributed query engine that connects to most of the projects listed above. So depending on where you work in this project you can gain a depth of knowledge on the query engine or breadth across all the connectors …or go hybrid .
-
As our project grows, we've seen first-hand how difficult it is for others to contribute to open-source: from setting up the development environment, understanding the codebase, drafting a PR, etc. We've learned a lot from helping others successfully contribute to our project so we share our thoughts here in a blog post, don't hesitate to reach out if you need help! Happy to help you contribute to any of our projects or any other!
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives