legion vs open-data

legion

The Legion Parallel Programming System (by StanfordLegion)

Suggest topics

Source Code

legion.stanford.edu

Suggest alternative

Edit details

open-data

Free football data from StatsBomb (by statsbomb)

Football football-data Soccer open-data sports-stats sports-data

Source Code

statsbomb.com

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

legion		open-data
	Project
11	Mentions	25
649	Stars	2,221
1.5%	Growth	1.2%
9.9	Activity	0.0
about 1 month ago	Latest Commit	18 days ago
C++	Language
Apache License 2.0	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

legion

Posts with mentions or reviews of legion. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-10-20.

Legion 24.03.0 – Control Replication
1 project | news.ycombinator.com | 28 Mar 2024
Antithesis of a One-in-a-Million Bug: Taming Demonic Nondeterminism
1 project | news.ycombinator.com | 22 Mar 2024

I work on a distributed runtime system for heterogeneous supercomputers [1].
As an example of the sort of bug we regularly deal with, I am at this exact moment tracking down a freeze that occurs on 8,192 nodes of a supercomputer [2]. That means I'm using about 64,000 GPUs and about half a million CPU cores. The smallest node count I've seen my issue is 2,048 nodes and at that scale it only happens about 10% of the time.
We've been debating internally whether Antithesis could help us or not. On the one hand, the fuzzing to explore the state space, and deterministic reproduction, are exactly what we want. On the other hand, we believe our state space is much larger than what you see in a typical distributed database. (And not just because of the sheer scale of things, but even on a single node we have state machines with order hundreds to thousands of states in them.) Based on the post here and the "scenario" count explored in CouchDB, I'm not convinced you'd be able to handle us. :-)
I'd be curious what you think. Happy to discuss here, or contact info in profile.
[1]: https://legion.stanford.edu/
[2]: https://www.olcf.ornl.gov/frontier/
Progress on No-GIL CPython
5 projects | news.ycombinator.com | 20 Oct 2023

Parallelism in CS is a bit like security in CS. People know it matters in the abstract senses but you really only get into it if you look for the training specifically. We're getting better at both over time: just as more languages/libraries/etc. are secure by default, more now are parallel by default. There's a ways to go, but I'm glad we didn't do this prematurely, because the technology has improved a lot in the last decade. Look for example at what we can do (safely!) with Rayon in Rust vs (unsafely!) with OpenMP in C++.
And there are things even further afield like what I work on [1][2][3].
[1]: https://legion.stanford.edu/
[2]: https://regent-lang.org/
[3]: https://github.com/nv-legate/cunumeric
Mojo is now available on Mac
13 projects | news.ycombinator.com | 19 Oct 2023

Chapel has at least several full-time developers at Cray/HPE and (I think) the US national labs, and has had some for almost two decades. That's much more than $100k.
Chapel is also just one of many other projects broadly interested in developing new programming languages for "high performance" programming. Out of that large field, Chapel is not especially related to the specific ideas or design goals of Mojo. Much more related are things like Codon (https://exaloop.io), and the metaprogramming models in Terra (https://terralang.org), Nim (https://nim-lang.org), and Zig (https://ziglang.org).
But Chapel is great! It has a lot of good ideas, especially for distributed-memory programming, which is its historical focus. It is more related to Legion (https://legion.stanford.edu, https://regent-lang.org), parallel & distributed Fortran, ZPL, etc.
Announcing Chapel 1.32
6 projects | news.ycombinator.com | 9 Oct 2023

I should also note that there is Pygion if you want to use Python. Not a lot of great reference material right now, but there's the paper:
https://legion.stanford.edu/pdfs/pygion2019.pdf
And code samples:
https://github.com/StanfordLegion/legion/tree/stable/binding...
Is anyone using PyPy for real work?
13 projects | news.ycombinator.com | 31 Jul 2023

We use PyPy for performing verification of our software stack [1], and also for profiling tools [2]. The verification tool is basically a complete reimplementation of our main product, and therefore encodes a massive amount of business logic (and therefore difficult to impossible to rewrite in another language). As with other users, we found the switch to PyPy was seamless and provides us with something like a 2.5x speedup out of the box, with (I think) higher speedups in some specific cases.
We eventually rewrote the profiler tool in Rust for additional speedups, but as mentioned for the verification engine, it's probably too complicated to ever do that so we really appreciate drop-in tools like PyPy that can speed up our code.
[1]: https://github.com/StanfordLegion/legion/blob/master/tools/l...
[2]: https://github.com/StanfordLegion/legion/blob/master/tools/l...
Make your programs run faster by better using the data cache (2020)
1 project | news.ycombinator.com | 23 Jun 2023

Legion is also doing something like that: https://legion.stanford.edu/
Is Parallel Programming Hard, and, If So, What Can You Do About It? [pdf]
4 projects | news.ycombinator.com | 19 Feb 2023

If you really want to dig into it you can read up on the tutorials and/or papers from the Legion project: https://legion.stanford.edu/
But briefly, these task-based programs preserve sequential semantics. That means (whatever the system actually does when running your program), as long as you follow the rules, the parallelism should be invisible to the execution of the program.
Ask HN: Who is hiring? (September 2022)
20 projects | news.ycombinator.com | 1 Sep 2022

Computer Science Research Dept., SLAC National Accelerator Laboratory | Research Scientist / Engineer | Menlo Park, CA or REMOTE, VISA | Full Time
We're a research group within SLAC, headed by Alex Aiken (https://theory.stanford.edu/~aiken/). We focus on fundamental CS research that has the potential to impact science, mainly in the areas of high-performance and distributed computing, programming languages, compilers, networks, operating systems, etc. One of our major projects is Legion, a forward-looking programming system for distributed computing (https://legion.stanford.edu/). Legion has been used to create new programming languages (https://regent-lang.org/), seamless distributed NumPy (https://developer.nvidia.com/cunumeric), and a drop-in replacement for Keras and PyTorch (https://flexflow.ai/), among many other things.
We are looking for strong scientists and engineers to join our group. For clarity (because these terms vary by industry/company), scientists mainly focus on producing research results (e.g., papers and research software) while engineers mainly focus on software development and deliverables (e.g., system or application implementation). For scientist positions please expect to provide a CV with relevant publications.
The official application links are below, but please feel free to contact me directly if you have questions. (My HN username @slac.stanford.edu)
Scientist (Computer Science):
https://erp-hprdext.erp.slac.stanford.edu/psp/hprdext/EMPLOY...
Engineer (Computer Science):
https://erp-hprdext.erp.slac.stanford.edu/psp/hprdext/EMPLOY...
We've had some reports that the application site doesn't work well in Google Chrome. You might want to apply in Firefox.
The Underwhelming Impact of Software Engineering Research (April 2022)
4 projects | news.ycombinator.com | 9 Apr 2022

There are some points in the middle, but it's rare. I worked on one of these [1]. We've been building the system for just over ten years, and are starting to see some truly killer apps being built on top of it [2, 3].
While it has some great benefits once you arrive, the upfront costs are enormous. You basically need to find a funding source (or sources) that will pay for this product while you're building it. Also, in order for the research payoff to be worth it, you need both the product itself, and subsequent innovations it enables, to be research-worthy. Not all areas of research can support this. On top of it all, even when you do this, you'll still spend years of effort in activities that are essentially not research. You're basically responsible for all of your own customer support, sales, marketing, etc.---like a startup, but without the financial upside if you succeed. Yes there is recognition and so on, but the payoffs aren't as dramatic. Most people aren't ready to commit to this path.
Keep in mind that you can't build this in 5 years either. So a single generation of PhD students can't get it done. The only reason we were successful is because the key staff on the project stuck around for 5+ years after their PhDs because we all believed in doing the work.
Given all that, I don't hold it against people at all who just want to build prototypes and then move on to the next thing. It's way less risky and higher reward relative to the costs.
[1]: https://legion.stanford.edu/
[2]: https://flexflow.ai/
[3]: https://developer.nvidia.com/cunumeric

open-data

Posts with mentions or reviews of open-data. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-25.

How to practice data analytics skills
3 projects | news.ycombinator.com | 25 Dec 2023
[OptaJoe]2009 - Arsenal have won a Premier League game they were losing at half-time outside of London for the first time since December 2009 (2-1 at Liverpool). Temperament.
3 projects | /r/soccer | 18 Feb 2023

You can check statsbomb open data but you will to preprocess it from json to sql. They have great course and articles about analyzing the data. Another good reading is awasome-football . They provide list of resources to get data. But the most comprehensive and recommended resources eddwebster's guide. He worked for city football group and his repository updated frequently.
Enzo Fernández Progressive Passes - World Cup 2022
1 project | /r/chelseafc | 1 Feb 2023

I tried visualising Enzo's progressive passes in each of his world cup matches. I used the data available on StatsBomb for this.
Football (soccer) player statistics - looking for free databases
1 project | /r/datasets | 21 Nov 2022

https://www.football-data.org/coverage https://datahub.io/collections/football https://github.com/statsbomb/open-data https://www.kaggle.com/datasets/hugomathien/soccer https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017 https://www.kaggle.com/datasets/secareanualin/football-events https://www.kaggle.com/datasets/adityadesai13/european-football-database-20192020 https://www.kaggle.com/datasets/vivovinco/20212022-football-player-stats https://www.kaggle.com/datasets/antoinekrajnc/soccer-players-statistics
Ask HN: Who is hiring? (September 2022)
20 projects | news.ycombinator.com | 1 Sep 2022

StatsBomb | Multiple roles | REMOTE, or Bath (UK), or Cairo (Egypt)
StatsBomb is a sports analytics startup, covering football (both the soccer and American varieties) and soon basketball. We sell data products as well as analysis tools to sports, media and gambling organisations, with a tech pipeline that includes computer vision, machine learning, stream processing, and web-based dataviz. We count many of the biggest names in football as customers, and your work will have a direct impact on our ability to deliver insights to those customers, driving success on the field.
We're hiring software engineers of various stripes (data pipeline roles with Python and Clojure, full-stack web dev roles with JavaScript) and more besides. We're fully remote, but have offices in Bath, UK and Cairo, Egypt for those that want them. We organise regular team days and also run our own industry-leading conference each year.
- Apply at: https://statsbomb.com/careers
If you'd like to find out more about football analytics:
- Play with our open data: https://github.com/statsbomb/open-data
- Read our articles: https://statsbomb.com/articles/
- Browse our conference videos: https://www.youtube.com/channel/UCmZ2ArreL9muPvH49Gaw0Bw
[OC] Football Wind ⚽️💨 A wind map visualisation of a typical football game. Each particle is following a force field built from the aggregation of 882,536 passes from 890 matches played in various major leagues/cups.
1 project | /r/dataisbeautiful | 24 Jun 2022

The data source providing all the passes is from StatBomb
🏆 TAA vs the u23 world: progressive passes/90 & xA/90
1 project | /r/FantasyPL | 19 Jan 2022

If you're familiar with GitHub and JSON then https://github.com/statsbomb/open-data looks decent.
Looking for football (soccer) granular datasets
1 project | /r/datasets | 17 Jan 2022

The company StatsBomb, which specializes in football analytics, has made a lot of their data available for public use here: https://github.com/statsbomb/open-data I’ve been playing with it recently and I’ve found it to be pretty great.
[OC] Lionel Messi's shots and goals with Barcelona during his record-breaking 2011/2012 season, compared to his attempts in the 2014 and 2018 World Cups with Argentina
2 projects | /r/dataisbeautiful | 21 Dec 2021

Messi has routinely been one of the best performers in European soccer, including his record-breaking 2011-2012 season in the Spanish league (“La Liga”) with Barcelona, where he set the record for most goals in a season. Unfortunately, success with the Argentina national team has frequently eluded him, finishing as a “runner-up” in the World Cup once and in the Copa America 3 times, before finally winning the Copa America in 2021. Critics often point to his difficulties with his national team as a fatal flaw. I was interested in how his scoring opportunities during arguably his best performance at Barcelona compared to his chances made with Argentina. The data suggests that he is regularly shooting from further away from goal when playing with Argentina when compared to his best performance with Barcelona, which could be a result of a number of factors (different team tactics, difficulty getting up the field, increasing age, less familiarity with teammates, etc.). Data: 2011/2012 La Liga season and World Cup 2018 data were collected from the very nice, public datasets provided by StatsBomb at https://github.com/statsbomb/open-data. The World Cup 2014 data was a bit more difficult to find, but was scraped from the Huffington Post . The StatsBomb data has a ton of great stats to dig into, but because the Huffington Post data had less detail, I wasn't able to go into all of it with just this plot.
xG stats for individual shots.
1 project | /r/SoccerBetting | 24 Jul 2021

I think Statsbomb has a free API you can use on Github if you request access. https://github.com/statsbomb/open-data

What are some alternatives?

When comparing legion and open-data you can also consider the following projects:

pldb - PLDB: a Programming Language Database. A computable encyclopedia about programming languages.

opendata - SkillCorner Open Data with 9 matches of broadcast tracking data.

preshed - 💥 Cython hash tables that assume keys are pre-hashed

geometry-api-java - The Esri Geometry API for Java enables developers to write custom applications for analysis of spatial data. This API is used in the Esri GIS Tools for Hadoop and other 3rd-party data processing solutions.

arkouda - Arkouda (αρκούδα): Interactive Data Analytics at Supercomputing Scale :bear:

sample-data - Metrica Sports sample tracking and event data

legate.sparse

football_analytics - 📊⚽ A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), including a curated list of publicly available resources published by the football analytics community.

HTR-solver - Hypersonic Task-based Research (HTR) solver for the Navier-Stokes equations at hypersonic Mach numbers including finite-rate chemistry for dissociating air and multicomponent transport.

nba-movement-data - SportVU movement tracking data.

soleil-x - Soleil-X is a turbulence/particle/radiation solver written in the Regent language for execution with the Legion runtime.

geomesa - GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.