Perform computation over 500 million vectors

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/bigdata

Our great sponsors
  • Scout APM - Less time debugging, more time building
  • SonarLint - Clean code begins in your IDE with SonarLint
  • SaaSHub - Software Alternatives and Reviews
  • Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

    I would guess that Apache Spark would be an okay choice. With data stored locally in avro or parquet files. Just processing the data in python would also work, IMO.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts