How to create projects for myself to enrich my resume?

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • sqlfluff

    A modular SQL linter and auto-formatter with support for multiple dialects and templated code.

  • Include bells and whistles to impress the reader: Most projects will have the common things like ETL scripts (e.g. SQL, Python, Airflow, dbt, etc) covered. To go the extra mile and stand out, you should also include things like data quality tests (e.g. dbt tests, great expectations, soda), linting scripts (e.g. sqlfluff, black), CI pipelines that check for linting and unit tests for ETL code before code can be merged to main (e.g. github actions). Include instructions on how to run those tests or linting or CI pipelines in your README file and include screenshots of the success or failure output to give the reader an example.

  • awesome-readme

    A curated list of awesome READMEs

  • Provide a succinct and comprehensive README: readers of your personal project will always start with the README to know where to begin. The goal of the README is to provide the reader an understanding of the business problem you are trying to solve, how your solution goes about solving it (solution architecture diagram), and how to get started and run your code. There are plenty of great README examples here: https://github.com/matiassingers/awesome-readme

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • modern-elt-demo

    A modern ELT demo using airbyte, dbt, snowflake and dagster

  • Break your project down into components and folders: technical readers of your project will want to see that you have broken down the project into logical folders so that the code appears organized. There's nothing worse than clicking on a github link and seeing 40 files at the root of the repository and the reader asking themselves "where do I start?". Here is an example that I threw together in a day: https://github.com/Data-Engineer-Camp/modern-elt-demo

  • diataxis-documentation-framework

    A systematic approach to creating better documentation.

  • High quality blog articles Writing blog articles is a great way to (1) solidify your understanding on a topic and (2) show readers and potential employers your understanding. Solidifying your understanding is really important for your personal development, and will prove useful when an interviewer quizzes you on hard technical concepts and you are able to impress them with your concise and comprehensive explanation. "Ok, you've convinced me - now how do I write a high quality blog article?" According to the diataxis documentation framework, there are several different kinds of documentation or blog article you can write. The one's I would recommend you focus on are: explanation articles, and how-to articles. Explanation articles, as its name suggests, explain a particular topic e.g. “What is Spark?”. Whereas how-to articles are focussed on documenting the steps to perform a specific task e.g. “How to dockerize your ETL project?”. See the diataxis framework for more detailed definitions and examples. Once you've written your articles, you can publish them on a blog site like substack or medium. Both of the above tasks takes effort. You may have to invest several weekends to get it to a quality you are happy with. Whilst not everyone who sees your resume or LinkedIn profile will go through your personal projects and blog articles in detail, but you will get a small portion of people that will see and recognize the effort you have put in, and those people will be the ones that would provide you with your first opportunity. I hope this helps, and good luck!

  • black

    The uncompromising Python code formatter

  • Include bells and whistles to impress the reader: Most projects will have the common things like ETL scripts (e.g. SQL, Python, Airflow, dbt, etc) covered. To go the extra mile and stand out, you should also include things like data quality tests (e.g. dbt tests, great expectations, soda), linting scripts (e.g. sqlfluff, black), CI pipelines that check for linting and unit tests for ETL code before code can be merged to main (e.g. github actions). Include instructions on how to run those tests or linting or CI pipelines in your README file and include screenshots of the success or failure output to give the reader an example.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts