hudi
sqlfluff
Our great sponsors
hudi | sqlfluff | |
---|---|---|
20 | 35 | |
5,053 | 7,199 | |
2.0% | 2.1% | |
9.9 | 9.6 | |
7 days ago | 4 days ago | |
Java | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hudi
-
Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog
Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.
-
The "Big Three's" Data Storage Offerings
Structured, Semi-structured and Unstructured can be stored in one single format, a lakehouse storage format like Delta, Iceberg or Hudi (assuming those don't require low-latency SLAs like subsecond).
-
Data-eng related highlights from the latest Thoughtworks Tech Radar
Apache Hudi
- For those of you with Lakehouse Architectures, how do you handle duplicate records?
-
AWS ACID data lakehouse
Try Apache Hudi, it is fully integrated with AWS and offers almost everything that you requested.
-
Data n00b looking for guidance on how to setup data lake/warehouse
the corresponding kafka topics have 30d retention and I intend on having s3 sink connector for long term storage (open to other ideas here too, I noticed theres a hudi connector also)
- apache/hudi: Upserts, Deletes And Incremental Processing on Big Data.
- Big Data file formats
-
How-to-Guide: Contributing to Open Source
Apache Hudi
-
What do you use for Data versioning?
You could have a look at Apache Hudi - especially if you're running your Data Pipelines using Spark or Flink.
sqlfluff
-
🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑💻 🥇
Repo : https://github.com/sqlfluff/sqlfluff
-
SQL Reserved Words – The Empirical List
I'm surprised sqlfluff hasn't been mentioned yet. Perhaps not a comprehensive list, but it's worked for everything I've thrown at it. There's an ANSI keyword list [0], and then dialect-specific lists for everything from DB2 [1] to Snowflake [2].
[0]: https://github.com/sqlfluff/sqlfluff/blob/main/src/sqlfluff/...
-
Show HN: Postgres Language Server
It has tons of annoying quirks, but I couldn't imagine running a DBT project without it: https://github.com/sqlfluff/sqlfluff
-
Front page news headline scraping data engineering project
Move SQL queries to sql files and read from files (Use sqlfluff to lint the code https://github.com/sqlfluff/sqlfluff)
- Anything like SQLFluff written in Rust?
-
Code autoformatter for SQL in VSCode that plays nicely with dbt
SQLFluff is a good CLI tool for this and includes support for jinja and dbt. I don't think there's a VSCode plugin for it yet.
-
Ask HN: How do you test SQL?
This linter can really enforce some best practices https://github.com/sqlfluff/sqlfluff
A list of best practices:
-
What is something you would learn at college but not a bootcamp (hard skills)
BigQuery SQL and SQLFluff
-
Is the knowledge on how Compilers work applicable to the role of a Data Engineer?
There's a SQL parser/linter called SQLFluff that my team uses for our CI/CD. I've made a few pull requests to fix the parser for the particular SQL dialect we used, and my college compiler classes definitely helped.
-
sqlfluff VS ANTLR - a user suggested alternative
2 projects | 12 Dec 2022
What are some alternatives?
iceberg - Apache Iceberg
vscode-sqlfluff - An extension to use the sqlfluff linter in vscode.
kudu - Mirror of Apache Kudu
sqlparse - A non-validating SQL parser module for Python
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
dbt-utils - Utility functions for dbt projects.
debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
ale - Check syntax in Vim/Neovim asynchronously and fix files, with Language Server Protocol (LSP) support
pinot - Apache Pinot - A realtime distributed OLAP datastore
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum: