Our great sponsors
-
sqlfluff
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Include bells and whistles to impress the reader: Most projects will have the common things like ETL scripts (e.g. SQL, Python, Airflow, dbt, etc) covered. To go the extra mile and stand out, you should also include things like data quality tests (e.g. dbt tests, great expectations, soda), linting scripts (e.g. sqlfluff, black), CI pipelines that check for linting and unit tests for ETL code before code can be merged to main (e.g. github actions). Include instructions on how to run those tests or linting or CI pipelines in your README file and include screenshots of the success or failure output to give the reader an example.
Provide a succinct and comprehensive README: readers of your personal project will always start with the README to know where to begin. The goal of the README is to provide the reader an understanding of the business problem you are trying to solve, how your solution goes about solving it (solution architecture diagram), and how to get started and run your code. There are plenty of great README examples here: https://github.com/matiassingers/awesome-readme
Break your project down into components and folders: technical readers of your project will want to see that you have broken down the project into logical folders so that the code appears organized. There's nothing worse than clicking on a github link and seeing 40 files at the root of the repository and the reader asking themselves "where do I start?". Here is an example that I threw together in a day: https://github.com/Data-Engineer-Camp/modern-elt-demo
High quality blog articles Writing blog articles is a great way to (1) solidify your understanding on a topic and (2) show readers and potential employers your understanding. Solidifying your understanding is really important for your personal development, and will prove useful when an interviewer quizzes you on hard technical concepts and you are able to impress them with your concise and comprehensive explanation. "Ok, you've convinced me - now how do I write a high quality blog article?" According to the diataxis documentation framework, there are several different kinds of documentation or blog article you can write. The one's I would recommend you focus on are: explanation articles, and how-to articles. Explanation articles, as its name suggests, explain a particular topic e.g. “What is Spark?”. Whereas how-to articles are focussed on documenting the steps to perform a specific task e.g. “How to dockerize your ETL project?”. See the diataxis framework for more detailed definitions and examples. Once you've written your articles, you can publish them on a blog site like substack or medium. Both of the above tasks takes effort. You may have to invest several weekends to get it to a quality you are happy with. Whilst not everyone who sees your resume or LinkedIn profile will go through your personal projects and blog articles in detail, but you will get a small portion of people that will see and recognize the effort you have put in, and those people will be the ones that would provide you with your first opportunity. I hope this helps, and good luck!
Include bells and whistles to impress the reader: Most projects will have the common things like ETL scripts (e.g. SQL, Python, Airflow, dbt, etc) covered. To go the extra mile and stand out, you should also include things like data quality tests (e.g. dbt tests, great expectations, soda), linting scripts (e.g. sqlfluff, black), CI pipelines that check for linting and unit tests for ETL code before code can be merged to main (e.g. github actions). Include instructions on how to run those tests or linting or CI pipelines in your README file and include screenshots of the success or failure output to give the reader an example.
Related posts
- Front page news headline scraping data engineering project
- How to setup Black and pre-commit in python for auto text-formatting on commit
- Let's meet Black: Python Code Formatting
- Show HN: Visualize the Entropy of a Codebase with a 3D Force-Directed Graph
- Introducing Flask-Muck: How To Build a Comprehensive Flask REST API in 5 Minutes