Our great sponsors
-
plpgsql_check
plpgsql_check is a linter tool (does source code static analyze) for the PostgreSQL language plpgsql (the native language for PostgreSQL store procedures).
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
geometry-api-java
The Esri Geometry API for Java enables developers to write custom applications for analysis of spatial data. This API is used in the Esri GIS Tools for Hadoop and other 3rd-party data processing solutions.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
We (StatsBomb) make a bunch of stuff available at:
https://github.com/statsbomb/open-data
Interesting stuff from Metrica too:
https://github.com/metrica-sports/sample-data
There are other less legit sources that you may be able to work out for yourself. :)
We (StatsBomb) make a bunch of stuff available at:
https://github.com/statsbomb/open-data
Interesting stuff from Metrica too:
https://github.com/metrica-sports/sample-data
There are other less legit sources that you may be able to work out for yourself. :)
It's good software (I've used it more than a decade), however I found GEOS to be a sticking point. When using it on very large polygons, e.g. 10k to 1 million vertices, memory leaks are not uncommon and performance drops off considerably. Debugging SQL -> C -> C++ is not fun and hacking C++ geometry code when it's not part of your normal work is nigh on impossible. I've found the ESRI geometry API for Java to be by far the best geometry API out there. Harder to use initially and obviously JVM specific but faster and more reliable. It's a very good fit for Hadoop / Spark or other JVM applications. Ignore the brand name, I'm not affiliated and it's FOSS with an Apache license.
https://github.com/Esri/geometry-api-java
If you really need to scale beyond what Postgres/PostGIS can handle, then you might want to check out GeoMesa[1], which is (very loosely) "PostGIS for HBase, Cassandra, or Google BigTable".
That being said, you may not need it, because Postgres/PostGIS can scale vertically to handle larger datasets than most people realize. I recommend loading your intended data (or your best simulation of it) into a Postgres instance running on one of the extremely large VMs available on your cloud provider, and running a load test with a distribution of the queries you'd expect. Assuming the deliberately over-provisioned instance is able to handle the queries, you can then run some experiments to "right-size" the instance to find the right balance of compute, memory, SSD, etc. If it can handle the queries but not at the QPS you need, then read replicas may also be a good solution.
[1] https://github.com/locationtech/geomesa