corp
pgsink
corp | pgsink | |
---|---|---|
12 | 5 | |
413 | 76 | |
-0.2% | - | |
4.6 | 0.0 | |
18 days ago | about 1 year ago | |
Go | ||
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
corp
-
Are there database design Standards out there? As in, formal documents listing exact best practices for OLTP database design?
Here's one that covers some of your points and that I like in general: https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md Except instead of prefixing my table names with the processing stage, I keep them in schemas by processing stage (source, staging, analytics). So, I can tell my analysts to look into the analytics schema for all the final tables, and they won't be bothered by intermediate models. The table names also have a precise structure that corresponds to our specific subject.
- Looking to understand why the dbt style guide recommends to use *all lower case* for keywords, field names, and function names?
-
Best practices for data modeling with SQL and dbt
I find the content more or less ripped from of dbt's own styleguide
-
SQL Code Style Properties Questions
For anyone wondering this is the DBT style guide I am referencing from.
-
A modern data stack for startups
While the tool choice is obvious, how to use dbt is going to be a more controversial. There's a load of great resources on dbt best practices, but as you can see from my Slack questions, there's enough ambiguity to tie you up.
-
Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more!
Just a slight critique, but I noticed some of the dbt models are a bit hard to read. Especially your dim_users SCD2 model, which uses lots of nested subqueries and multiple columns on the same line. You may want to refer to this style guide from dbt Labs. I find CTEs are a lot easier to parse and read.
-
What are some good resources for learning to write clean, production-quality code?
I really like thisthis SQL STYLE GUIDE, and if you use dbt, the dbt style guide.
-
How do you format your SQL queries?
I like this one very much from dbt very much.
-
Where do you like to do the L of ELT? Python or DBT?
I recommend you write one. You can take inspiration from dbt's one or Gitlab
-
Confused about benefits of CTE
I've seen fishtown analytics coding conventions recommend a lot around here, but there are a few things about their recommendations of CTE use that confuse me.
pgsink
-
GitHub - go-jet/jet: Type safe SQL builder with code generation and automatic query result data mapping
This is a really awesome project. I’ve used it on https://github.com/lawrencejones/pgsink to generate type safe bindings to the Postgres catalog tables, along with a few of the tables the project maintains itself.
-
Trade-offs from using ULIDs at incident.io
pgx is really good: it's what I used to write logical decoders in https://github.com/lawrencejones/pgsink
-
A modern data stack for startups
It used to be that companies would write their own hacky scripts to perform this extraction - I've had terrible incidents caused by ETL database triggers in the past, and even built a few generic ETL tools myself.
- Sync Postgres to BigQuery, possible? How?
-
Ask HN: Show me your Half Baked project
Postgres change-capture device that supports high-throughput and low-latency capture to a variety of sinks (at first release, just Google BigQuery):
https://github.com/lawrencejones/pgsink
I know there's debezium and Netflix's dblog, but this project aims to be much simpler.
Forget about kafka and any other dependency: just point it at Postgres, and your data will be pushed into BigQuery. And for people with highly-performance-sensitive databases, the read workload has been designed with Postgres efficiency in mind.
I'm hoping pgsink could be a gateway drug to get small companies up and running with a data warehouse. If your datastore of choice is Postgres, it's a huge help to replicate everything into an analytics datastore. A similar tool has helped my company extract expensive work out of our primary database, which is super useful for scaling.
The project is 90% there, about 10hrs and some testing away from being useable. Once there, I'll be hitting up some start-up friends and seeing if they want to give it a whirl.
What are some alternatives?
nodejs-bigquery - Node.js client for Google Cloud BigQuery: A fast, economical and fully-managed enterprise data warehouse for large-scale data analytics.
pastty - Copy and paste across devices
sql-style-guide - An opinionated guide for writing clean, maintainable SQL.
dupver - Deduplicating VCS for large binary files in Go
terraform - Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
DataflowTemplates - Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
streamify - A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
debezium-examples - Examples for running Debezium (Configuration, Docker Compose files etc.)
spark-bigquery-connector - BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
xact - Model based design for developers
dbt-metabase - dbt + Metabase integration
thgtoa - The Hitchhiker’s Guide to Online Anonymity