Ask HN: Where to run embarrassingly parallel, Integer, no SIMD workloads?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • exact

  • >> The workload is memory bound not compute bound.

    > I don't think so? How many gigabytes per second per core are you processing?

    That's what the Intel VTune profiler tells me. 39.2% Memory bound = 21,7% of clock ticks L1 bound (execution stalled for data that was in L1) + 12.4% L3 bound on a Haswell 4 core Xeon.

    > If for some reason you can talk about this problem to random SIMD programmers online privately but you cannot post about this problem publicly

    I can talk about it publicly. I just did not want to distracted from the actual hardware question. I recently started to contribute to this https://gitlab.com/JoD/exact open source project. The algorithm tries to find a valid assignment for a bunch of equations of this form 4x1 -3x57 +1* not(x1232) <= 4 (there are special cases already accelerated). We guess an assignment for a certain variable, check all constraints, sometimes constraints imply other assignments to other variables (if x1 is true and x1232 is false x57 has to be true) then those get propagated to. One technique is called watch propagation and can be done for the SAT family of clauses. This technique is in incompatible with branching along assignments. I find SIMD over clauses dubious, as they are mostly random accessed of different length and sparse. The embarrassing parallelization comes from being able to work one different parts of the parameter space and exchange clauses learned from conflicts. We are currently not doing that yet but plan to do something HordeSAT like over MPI (there is different slightly cleverer tree exchange variant over MPI all to all but i do not have that reference handy).

    We have some horrible sins (such a virtual method table look ups in loops, no -march=native compiler flags in main branch, ...) which the main developer created and we have not cleaned up. If i could nerd snipe you to run some experiments with that codebase and contribute some SIMD loops (with -march=native -mtune=native only 4 functions are currently SIMD, none are significant to the performance) that be great. For all the divisibilty checking i currently plan this: https://www.reddit.com/r/exact/comments/wokfhl/resource_on_f... (we spend 3% of compute time in the standard libraries modulo)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project