NEC’s Forgotten FPUs

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

frovedis

1 64 4.0 C++

Framework of vectorized and distributed data analytics

All good questions.
1) It is a custom instruction set, you can rean the ISA guide over at https://www.hpc.nec/documentation
2) The main difference in simple terms is that AVX instructions have a fixed vector length (4, 8, 16 etc). With the SX the vector length is flexible so it can be 10, 4, anything up to the max_vlen (up to 256 on the latest ones). Essentially the idea is you have a single instruction that can replace a whole for loop. Without a good compiler though that means you have to re-write your nested loops.
3) There's currently two options when it comes to the compiler, you can use the proprietary NCC or use the open source LLVM fork NEC has. NCC is less compatible than GCC/Clang (particularly modern C++17 is problematic) but has a lot of advanced algorithms for taking your loops and rewriting them and vectorizing them automatically. The LLVM-fork currently supports assembly instruction intrinsics but they are still working on contributing better loop auto-vectorization into LLVM.
4) Porting software is not terribly difficult to get working, but quite a bit harder to get performing very well depending on the type of workload. Since the Scalar core is pretty standard, you can almost always take regular CPU code and get it running (unlike GPU code in general). If you don't leverage the vector processor though, the performance you get will be nothing special, especially at 1.6GHz. Most of the software made for it starts off as being CPU code and is then modified with pragmas or some refactoring to get it running with good performance on the VE. In almost all cases the resulting code still runs on a CPU just fine. One example of a project that supports both in a single code-base is the Frovedis framework[1].
I think the chip deserves a little more interest than it does. It's one of the few accelerators that you can 1) Buy today, right now 2) Has open source drivers [2] 3) Can run tensorflow [3]. The lack of fp16 support really hurt it for Deep Learning but it's like having a 1080 with 48 GB of RAM, still lots of interesting things you can do with that.
[1]: https://github.com/frovedis/frovedis

ve_drv-kmod

1 5 1.3 C

SX-Aurora TSUBASA Vector Engine device driver kernel module
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
tensorflow

2 9 0.0 C++

TensorFlow for SX-Aurora TSUBASA forked from https://github.com/tensorflow/tensorflow (by sx-aurora-dev)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Daft: A High-Performance Distributed Dataframe Library for Multimodal Data

4 projects | news.ycombinator.com | 7 Jun 2023
About Data analyst, data scientist and data engineer, resources and experiences

5 projects | dev.to | 26 Mar 2024
Implementing a ChatGPT-like LLM from scratch, step by step

3 projects | news.ycombinator.com | 27 Jan 2024
Understanding Parquet, Iceberg and Data Lakehouses

4 projects | news.ycombinator.com | 29 Dec 2023
AlphaPy: machine learning framework built on sklearn and pandas. Support pyfolio/xgboost/lightgmb/catboost(gradient boosting on decision tress) etc. Examples include financial market prediction/sports prediction/kaggle. Configurations are set though

1 project | /r/algoprojects | 10 Dec 2023

NEC’s Forgotten FPUs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
sx-aurora-tsubasa Machine Learning Spark scikit-learn Distributed Computing
Post date: 3 Sep 2021

frovedis

ve_drv-kmod

InfluxDB

tensorflow

Related posts

Daft: A High-Performance Distributed Dataframe Library for Multimodal Data

About Data analyst, data scientist and data engineer, resources and experiences

Implementing a ChatGPT-like LLM from scratch, step by step

Understanding Parquet, Iceberg and Data Lakehouses

AlphaPy: machine learning framework built on sklearn and pandas. Support pyfolio/xgboost/lightgmb/catboost(gradient boosting on decision tress) etc. Examples include financial market prediction/sports prediction/kaggle. Configurations are set though

NEC’s Forgotten FPUs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com sx-aurora-tsubasa Machine Learning Spark scikit-learn Distributed Computing Post date: 3 Sep 2021

frovedis

ve_drv-kmod

InfluxDB

tensorflow

Related posts

Daft: A High-Performance Distributed Dataframe Library for Multimodal Data

About Data analyst, data scientist and data engineer, resources and experiences

Implementing a ChatGPT-like LLM from scratch, step by step

Understanding Parquet, Iceberg and Data Lakehouses

AlphaPy: machine learning framework built on sklearn and pandas. Support pyfolio/xgboost/lightgmb/catboost(gradient boosting on decision tress) etc. Examples include financial market prediction/sports prediction/kaggle. Configurations are set though

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
sx-aurora-tsubasa Machine Learning Spark scikit-learn Distributed Computing
Post date: 3 Sep 2021