sqlfluff
airbyte
Our great sponsors
sqlfluff | airbyte | |
---|---|---|
31 | 121 | |
5,771 | 9,991 | |
3.8% | 6.3% | |
9.4 | 10.0 | |
4 days ago | 5 days ago | |
Python | Python | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sqlfluff
-
Ask HN: How do you test SQL?
This linter can really enforce some best practices https://github.com/sqlfluff/sqlfluff
A list of best practices:
-
What is something you would learn at college but not a bootcamp (hard skills)
BigQuery SQL and SQLFluff
-
Is the knowledge on how Compilers work applicable to the role of a Data Engineer?
There's a SQL parser/linter called SQLFluff that my team uses for our CI/CD. I've made a few pull requests to fix the parser for the particular SQL dialect we used, and my college compiler classes definitely helped.
-
sqlfluff VS ANTLR - a user suggested alternative
2 projects | 12 Dec 2022
-
How to create projects for myself to enrich my resume?
Include bells and whistles to impress the reader: Most projects will have the common things like ETL scripts (e.g. SQL, Python, Airflow, dbt, etc) covered. To go the extra mile and stand out, you should also include things like data quality tests (e.g. dbt tests, great expectations, soda), linting scripts (e.g. sqlfluff, black), CI pipelines that check for linting and unit tests for ETL code before code can be merged to main (e.g. github actions). Include instructions on how to run those tests or linting or CI pipelines in your README file and include screenshots of the success or failure output to give the reader an example.
-
I failed a coding interview. Can anyone help me solve this?
Capitals I pretty much auto write, although I'll used the code formatter I wrote if someone sends me something messy. Bad and reused aliases, however, require manual fixing before I can get to the code review stage, so a PR using those will be rejected as needs work. sqlfluff is a decent formatter & linter if you need to get into details like that regularly.
- Terraform - Pre commit hooks
-
How-to-Guide: Contributing to Open Source
SQLFluff
-
Ask HN: Preferred SQL Auto-Formatter?
Not serving all of our needs but it did its job: https://github.com/sqlfluff/sqlfluff
-
This Week In Python
sqlfluff – A SQL linter and auto-formatter for Humans
airbyte
-
New age ETL products every data team needs to know
- https://airbyte.com/
2. Reverse ETL:
-
Is it safe to update docker/docker-compose?
Here's the docker-compose file https://github.com/airbytehq/airbyte/blob/master/docker-compose.yaml
I'm trying to insall https://airbyte.com/ is a great selfhosted ELT platform. In common words, it's an app that can access all kinds of api to scrub the data and put it in a database. I really like the idea of being able to own my data and make all kinds of analyse with it.
-
Top 10 Best Open Source GitHub repos for Developers 2023
AirByte GitHub: https://github.com/airbytehq/airbyte
-
What are your thoughts on projects using the Elastic License?
Doing a quick GitHub search reveals quite a few projects using the ELv2 license, including Airbyte and InvoiceNinja. Elastic (the company) aside, what are your thoughts on the Elastic License v2? Does your employer allow projects with an ELv2 license? Do you consider it open source? I understand that it's not OSI approved, but wondering where people stand when it comes to commercial open source software.
-
Airbyte Source Connectors performance bottelneck
I have been using airbyte sources, S3 mainly, it is so slow, I'm getting 1k-3k records per second, on a high end machine 4 Cpus and 16GB Ram. I checked the stats of the docker container it's hardly utilising the resources only consuming CPU, no memory usage at all, https://github.com/airbytehq/airbyte/issues/12532 I read on this issue that the connectors are slow because it traverse 1 records at one time, and prints it. What to do?? I need the performance of 20k-30k records per second.
-
Data Pipeline: From ETL to EL plus T
Yes, absolutely, Airbyte, and there are many similar solutions, but Airbyte is open source and relatively easy to use.
- Show HN: Data integration platform with 300 open-source connectors
-
Airbyte: Data integration platform with 300+ open-source connectors
Just an advisory here, the (majority of the) shared platform itself is not Open Source. They provide an overview here.
What are some alternatives?
vscode-sqlfluff - An extension to use the sqlfluff linter in vscode.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
dagster - An orchestration platform for the development, production, and observation of data assets.
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
meltano
spark-rapids - Spark RAPIDS plugin - accelerate Apache Spark with GPUs
jitsu - Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
dbt - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. [Moved to: https://github.com/dbt-labs/dbt-core]
dbt-utils - Utility functions for dbt projects.
supabase - The open source Firebase alternative. Follow to stay updated about our public Beta.
superset - Apache Superset is a Data Visualization and Data Exploration Platform