Our great sponsors
|4 days ago||5 days ago|
|MIT License||GNU General Public License v3.0 or later|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Ask HN: How do you test SQL?
18 projects | news.ycombinator.com | 31 Jan 2023
This linter can really enforce some best practices https://github.com/sqlfluff/sqlfluff
A list of best practices:
What is something you would learn at college but not a bootcamp (hard skills)
2 projects | reddit.com/r/cscareerquestions | 12 Jan 2023
BigQuery SQL and SQLFluff
Is the knowledge on how Compilers work applicable to the role of a Data Engineer?
2 projects | reddit.com/r/dataengineering | 11 Jan 2023
There's a SQL parser/linter called SQLFluff that my team uses for our CI/CD. I've made a few pull requests to fix the parser for the particular SQL dialect we used, and my college compiler classes definitely helped.
sqlfluff VS ANTLR - a user suggested alternative
2 projects | 12 Dec 2022
How to create projects for myself to enrich my resume?
5 projects | reddit.com/r/dataengineering | 29 Oct 2022
Include bells and whistles to impress the reader: Most projects will have the common things like ETL scripts (e.g. SQL, Python, Airflow, dbt, etc) covered. To go the extra mile and stand out, you should also include things like data quality tests (e.g. dbt tests, great expectations, soda), linting scripts (e.g. sqlfluff, black), CI pipelines that check for linting and unit tests for ETL code before code can be merged to main (e.g. github actions). Include instructions on how to run those tests or linting or CI pipelines in your README file and include screenshots of the success or failure output to give the reader an example.
I failed a coding interview. Can anyone help me solve this?
2 projects | reddit.com/r/SQL | 13 Oct 2022
Capitals I pretty much auto write, although I'll used the code formatter I wrote if someone sends me something messy. Bad and reused aliases, however, require manual fixing before I can get to the code review stage, so a PR using those will be rejected as needs work. sqlfluff is a decent formatter & linter if you need to get into details like that regularly.
Terraform - Pre commit hooks
2 projects | reddit.com/r/Terraform | 4 Oct 2022
How-to-Guide: Contributing to Open Source
19 projects | reddit.com/r/dataengineering | 11 Jun 2022
Ask HN: Preferred SQL Auto-Formatter?
2 projects | news.ycombinator.com | 21 May 2022
Not serving all of our needs but it did its job: https://github.com/sqlfluff/sqlfluff
This Week In Python
5 projects | dev.to | 8 Apr 2022
sqlfluff – A SQL linter and auto-formatter for Humans
New age ETL products every data team needs to know
2 projects | news.ycombinator.com | 23 Mar 2023
2. Reverse ETL:2 projects | news.ycombinator.com | 23 Mar 2023
Is it safe to update docker/docker-compose?
2 projects | reddit.com/r/synology | 9 Feb 2023
Here's the docker-compose file https://github.com/airbytehq/airbyte/blob/master/docker-compose.yaml2 projects | reddit.com/r/synology | 9 Feb 2023
I'm trying to insall https://airbyte.com/ is a great selfhosted ELT platform. In common words, it's an app that can access all kinds of api to scrub the data and put it in a database. I really like the idea of being able to own my data and make all kinds of analyse with it.
Top 10 Best Open Source GitHub repos for Developers 2023
11 projects | dev.to | 7 Feb 2023
AirByte GitHub: https://github.com/airbytehq/airbyte
What are your thoughts on projects using the Elastic License?
2 projects | reddit.com/r/opensource | 26 Jan 2023
Doing a quick GitHub search reveals quite a few projects using the ELv2 license, including Airbyte and InvoiceNinja. Elastic (the company) aside, what are your thoughts on the Elastic License v2? Does your employer allow projects with an ELv2 license? Do you consider it open source? I understand that it's not OSI approved, but wondering where people stand when it comes to commercial open source software.
Airbyte Source Connectors performance bottelneck
2 projects | reddit.com/r/dataengineering | 15 Jan 2023
I have been using airbyte sources, S3 mainly, it is so slow, I'm getting 1k-3k records per second, on a high end machine 4 Cpus and 16GB Ram. I checked the stats of the docker container it's hardly utilising the resources only consuming CPU, no memory usage at all, https://github.com/airbytehq/airbyte/issues/12532 I read on this issue that the connectors are slow because it traverse 1 records at one time, and prints it. What to do?? I need the performance of 20k-30k records per second.
Data Pipeline: From ETL to EL plus T
2 projects | dev.to | 8 Jan 2023
Yes, absolutely, Airbyte, and there are many similar solutions, but Airbyte is open source and relatively easy to use.
Show HN: Data integration platform with 300 open-source connectors
2 projects | news.ycombinator.com | 16 Dec 2022
Airbyte: Data integration platform with 300+ open-source connectors
2 projects | reddit.com/r/opensource | 16 Dec 2022
Just an advisory here, the (majority of the) shared platform itself is not Open Source. They provide an overview here.
What are some alternatives?
vscode-sqlfluff - An extension to use the sqlfluff linter in vscode.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
dagster - An orchestration platform for the development, production, and observation of data assets.
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
spark-rapids - Spark RAPIDS plugin - accelerate Apache Spark with GPUs
jitsu - Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
dbt - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. [Moved to: https://github.com/dbt-labs/dbt-core]
dbt-utils - Utility functions for dbt projects.
supabase - The open source Firebase alternative. Follow to stay updated about our public Beta.
superset - Apache Superset is a Data Visualization and Data Exploration Platform