flow
open-data
flow | open-data | |
---|---|---|
10 | 25 | |
506 | 2,221 | |
5.3% | 1.2% | |
9.7 | 0.0 | |
5 days ago | 19 days ago | |
C++ | ||
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
flow
-
Unexpected downsides of UUID keys in PostgreSQL
We use a macaddr8 that embeds a wall-clock timestamp (so they're ascending order, achieving data locality) with some additional randomness. It's worked really well for us:
https://github.com/estuary/flow/blob/master/supabase/migrati...
we use macaddr8 instead of bigint, because it has a postgres serialization / JSON encoding which lossless-ly round-trips with browsers and it works well with PostgREST. The same CANNOT be said for bigint, which is a huge footgun.
-
Need Advice on Real-Time Data Synchronization from PostgreSQL to BigQuery: Airbyte vs. CloudQuery?
I can't claim to know much about CloudQuery, but we are an open-source platform with CDC connectors from PostgreSQL and materializations to BQ and elsewhere. We also have fully-managed connectors if you don't want to deal with hosting.
-
DAG orchestration for streaming data?
This is essentially how we model things in Flow (disclosure: I work there). We call them Derivations, which are data products that are built (derived) from other data products. Each data product (we call them Collections) is backed by a set of append-only logs, so they can be read by many different consumers at different times. IDK if our product can work for you since we don't (yet) support stuff like MQTT, but there's a pretty generous free tier if you'd be able to push the data over HTTP. Either way, I just think it's cool that others have independently arrived at similar ideas about how to model streaming tasks!
- quickly replace a small airbyte instance in my stack
-
Advise on incremental process of Kafka data on Snowflake
We Estuary Git Docs have an open-source connector for Kafka -> Snowflake that could perform the tasks of a) flattening the data and b) removing duplicates via exactly once end to end delivery
-
Ask HN: Who is hiring? (September 2022)
Estuary Technology | Backend Engineer | Developer Evangelist | Rust, Go | REMOTE OR HYBRID | UTC-7 to UTC+2
Regional offices in NYC & Columbus, OH
Estuary (https://www.estuary.dev/) is the first real-time Data Operations platform for future-proof pipelines, including both historical and real-time data set up in minutes.
Our team is rapidly growing, VC funded and led by two successful, repeat founders.
We primarily develop in Rust and Go and are heavily built on top of gazette which is an internally developed streaming engine.
Flow: https://github.com/estuary/flow
Gazette: https://gazette.readthedocs.io/en/latest/
Backend Engineer: https://www.estuary.dev/about/#backend
Developer Evangelist: https://www.estuary.dev/about/#developerevangelist
^This is an exciting opportunity to make direct impact and shape user perception of a new product that brings a fresh experience to working with real-time data.
As this is a unique role, we are open to a variety of personas (data engineers, backend developers, Solutions Engineers and of course DevRel professionals).
Estuary offers full health benefits, competitive salary, unlimited PTO, 401K, equity, and a culture that values trust, transparency, and a flexible work environment to optimize your work/life balance.
To apply, send your resume and any questions to [email protected]
-
Who's Hiring? - August 2022
Flow Gazette We are looking for a backend engineer who is early in their career (around 1-3 years of industry experience) to join our team.
-
Ask HN: Who is hiring? (July 2022)
Estuary Technology | Junior Backend Engineer | Rust, Go | REMOTE OR HYBRID | Regional offices in NYC & Columbus, OH
Estuary (https://www.estuary.dev/) is the first real-time Data Operations platform for future-poof pipelines, including both historical and real-time data set up in minutes.
Our team is rapidly growing, VC funded and led by two successful, repeat founders.
We primarily develop in rust and go and are heavily built on top of gazette which is an internally developed streaming engine.
Flow: https://github.com/estuary/flow
Gazette: https://gazette.readthedocs.io/en/latest/
We are looking for a junior backend engineer with 2-3 years of industry experience.
For engineers who have an unquenched curiosity and drive to solve complex distributed systems problems, this is an opportunity to advance your career alongside a team of subject matter experts.
We are focused on expanding our catalog of open-source data connectors and building out our managed service platform.
ESTIMATED COMPENSATION: $110,000 - $150,000.
Estuary offers full health benefits, competitive salary, unlimited PTO, 401K, equity, and a culture that values trust, transparency, and a flexible work environment to optimize your work/life balance.
Email your resume to [email protected] to apply!
-
On 2022-04-05, the default branch will be renamed from “master” to “main”
It does seem like a weird bug that this would cause errors https://github.com/estuary/flow/runs/5642694619?check_suite_... seems like it should be some kind of warning instead of an error?
-
Ask HN: Is there a way to subscribe to an SQL query for changes?
where you'd subscribe for live updates.
[1]: https://github.com/estuary/flow
open-data
- How to practice data analytics skills
-
[OptaJoe]2009 - Arsenal have won a Premier League game they were losing at half-time outside of London for the first time since December 2009 (2-1 at Liverpool). Temperament.
You can check statsbomb open data but you will to preprocess it from json to sql. They have great course and articles about analyzing the data. Another good reading is awasome-football . They provide list of resources to get data. But the most comprehensive and recommended resources eddwebster's guide. He worked for city football group and his repository updated frequently.
-
Enzo Fernández Progressive Passes - World Cup 2022
I tried visualising Enzo's progressive passes in each of his world cup matches. I used the data available on StatsBomb for this.
-
Football (soccer) player statistics - looking for free databases
https://www.football-data.org/coverage https://datahub.io/collections/football https://github.com/statsbomb/open-data https://www.kaggle.com/datasets/hugomathien/soccer https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017 https://www.kaggle.com/datasets/secareanualin/football-events https://www.kaggle.com/datasets/adityadesai13/european-football-database-20192020 https://www.kaggle.com/datasets/vivovinco/20212022-football-player-stats https://www.kaggle.com/datasets/antoinekrajnc/soccer-players-statistics
-
Ask HN: Who is hiring? (September 2022)
StatsBomb | Multiple roles | REMOTE, or Bath (UK), or Cairo (Egypt)
StatsBomb is a sports analytics startup, covering football (both the soccer and American varieties) and soon basketball. We sell data products as well as analysis tools to sports, media and gambling organisations, with a tech pipeline that includes computer vision, machine learning, stream processing, and web-based dataviz. We count many of the biggest names in football as customers, and your work will have a direct impact on our ability to deliver insights to those customers, driving success on the field.
We're hiring software engineers of various stripes (data pipeline roles with Python and Clojure, full-stack web dev roles with JavaScript) and more besides. We're fully remote, but have offices in Bath, UK and Cairo, Egypt for those that want them. We organise regular team days and also run our own industry-leading conference each year.
- Apply at: https://statsbomb.com/careers
If you'd like to find out more about football analytics:
- Play with our open data: https://github.com/statsbomb/open-data
- Read our articles: https://statsbomb.com/articles/
- Browse our conference videos: https://www.youtube.com/channel/UCmZ2ArreL9muPvH49Gaw0Bw
-
[OC] Football Wind ⚽️💨 A wind map visualisation of a typical football game. Each particle is following a force field built from the aggregation of 882,536 passes from 890 matches played in various major leagues/cups.
The data source providing all the passes is from StatBomb
-
🏆 TAA vs the u23 world: progressive passes/90 & xA/90
If you're familiar with GitHub and JSON then https://github.com/statsbomb/open-data looks decent.
-
Looking for football (soccer) granular datasets
The company StatsBomb, which specializes in football analytics, has made a lot of their data available for public use here: https://github.com/statsbomb/open-data I’ve been playing with it recently and I’ve found it to be pretty great.
-
[OC] Lionel Messi's shots and goals with Barcelona during his record-breaking 2011/2012 season, compared to his attempts in the 2014 and 2018 World Cups with Argentina
Messi has routinely been one of the best performers in European soccer, including his record-breaking 2011-2012 season in the Spanish league (“La Liga”) with Barcelona, where he set the record for most goals in a season. Unfortunately, success with the Argentina national team has frequently eluded him, finishing as a “runner-up” in the World Cup once and in the Copa America 3 times, before finally winning the Copa America in 2021. Critics often point to his difficulties with his national team as a fatal flaw. I was interested in how his scoring opportunities during arguably his best performance at Barcelona compared to his chances made with Argentina. The data suggests that he is regularly shooting from further away from goal when playing with Argentina when compared to his best performance with Barcelona, which could be a result of a number of factors (different team tactics, difficulty getting up the field, increasing age, less familiarity with teammates, etc.). Data: 2011/2012 La Liga season and World Cup 2018 data were collected from the very nice, public datasets provided by StatsBomb at https://github.com/statsbomb/open-data. The World Cup 2014 data was a bit more difficult to find, but was scraped from the Huffington Post . The StatsBomb data has a ton of great stats to dig into, but because the Huffington Post data had less detail, I wasn't able to go into all of it with just this plot.
-
xG stats for individual shots.
I think Statsbomb has a free API you can use on Github if you request access. https://github.com/statsbomb/open-data
What are some alternatives?
realtime - Broadcast, Presence, and Postgres Changes via WebSockets
opendata - SkillCorner Open Data with 9 matches of broadcast tracking data.
timely-dataflow - A modular implementation of timely dataflow in Rust
geometry-api-java - The Esri Geometry API for Java enables developers to write custom applications for analysis of spatial data. This API is used in the Esri GIS Tools for Hadoop and other 3rd-party data processing solutions.
rethinkdb_rebirth - The open-source database for the realtime web.
sample-data - Metrica Sports sample tracking and event data
pldb - PLDB: a Programming Language Database. A computable encyclopedia about programming languages.
football_analytics - đź“Šâš˝ A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), including a curated list of publicly available resources published by the football analytics community.
Hasura - Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
nba-movement-data - SportVU movement tracking data.
github-actions - A GitHub Action for installing and configuring the gcloud CLI.
geomesa - GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.