DataflowTemplates
pgsink
Our great sponsors
DataflowTemplates | pgsink | |
---|---|---|
4 | 5 | |
1,089 | 76 | |
1.6% | - | |
9.8 | 0.0 | |
5 days ago | about 1 year ago | |
Java | Go | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DataflowTemplates
-
Which Database to use for rest api
Google provide a Dataflow template for copying from BigQuery to Datastore, see this stack overflow answer.
- Sync Postgres to BigQuery, possible? How?
-
New to GCP - need help designing pipeline from production Heroku Postgres to BigQuery
Ah, looks like the template default appends new rows. If I want to overwrite the table, looks like I might be able to just replace this line in the template code to WRITE_TRUNCATE (see here). Cool!
-
Tricky Dataflow ep.1 : Auto create BigQuery tables in pipelines
However, learning to use Apache Beam, which is the open source framework behind Dataflow, is no bed of roses: The official documentation is sparse, GCP-provided templates don't work out-of-the-box, and the Javadoc is, well, a javadoc.
pgsink
-
GitHub - go-jet/jet: Type safe SQL builder with code generation and automatic query result data mapping
This is a really awesome project. I’ve used it on https://github.com/lawrencejones/pgsink to generate type safe bindings to the Postgres catalog tables, along with a few of the tables the project maintains itself.
-
Trade-offs from using ULIDs at incident.io
pgx is really good: it's what I used to write logical decoders in https://github.com/lawrencejones/pgsink
-
A modern data stack for startups
It used to be that companies would write their own hacky scripts to perform this extraction - I've had terrible incidents caused by ETL database triggers in the past, and even built a few generic ETL tools myself.
- Sync Postgres to BigQuery, possible? How?
-
Ask HN: Show me your Half Baked project
Postgres change-capture device that supports high-throughput and low-latency capture to a variety of sinks (at first release, just Google BigQuery):
https://github.com/lawrencejones/pgsink
I know there's debezium and Netflix's dblog, but this project aims to be much simpler.
Forget about kafka and any other dependency: just point it at Postgres, and your data will be pushed into BigQuery. And for people with highly-performance-sensitive databases, the read workload has been designed with Postgres efficiency in mind.
I'm hoping pgsink could be a gateway drug to get small companies up and running with a data warehouse. If your datastore of choice is Postgres, it's a huge help to replicate everything into an analytics datastore. A similar tool has helped my company extract expensive work out of our primary database, which is super useful for scaling.
The project is 90% there, about 10hrs and some testing away from being useable. Once there, I'll be hitting up some start-up friends and seeing if they want to give it a whirl.
What are some alternatives?
janusgraph - JanusGraph: an open-source, distributed graph database
pastty - Copy and paste across devices