Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
It really depends on the type of data you are using. There may (or may not) be some trade offs and sacrifices. There are frameworks which can basically translate your neural network information from a high level python code into equivalent HLS code which is optimized for low latency when inferred on FPGAs. Some frameworks which might be useful for you to explore are hls4ml and finn. These are some frameworks which can achieve low latency inference of neural networks on FPGAs using Xilinx Vitis HLS. These are what I found when I did a similar experiment but with much lower latency target (a few hundred ns) and a very simple MLP with 1D signal as input which was a year ago. Not sure if there are better alternatives available as of 2023. But conceptually all these work on the primary principle of having a supporting framework/methodology to first quantize the network and limit the precision of data to fixed point. The HLS then produced will also be a result of the framework applying dataflow techniques such that the resulting HLS code will produce an RTL which has the best overall latency.
It really depends on the type of data you are using. There may (or may not) be some trade offs and sacrifices. There are frameworks which can basically translate your neural network information from a high level python code into equivalent HLS code which is optimized for low latency when inferred on FPGAs. Some frameworks which might be useful for you to explore are hls4ml and finn. These are some frameworks which can achieve low latency inference of neural networks on FPGAs using Xilinx Vitis HLS. These are what I found when I did a similar experiment but with much lower latency target (a few hundred ns) and a very simple MLP with 1D signal as input which was a year ago. Not sure if there are better alternatives available as of 2023. But conceptually all these work on the primary principle of having a supporting framework/methodology to first quantize the network and limit the precision of data to fixed point. The HLS then produced will also be a result of the framework applying dataflow techniques such that the resulting HLS code will produce an RTL which has the best overall latency.
Related posts
- How to participate in open-source FPGA projects?
- Help needed to build a Hardware accelerator for CNN's
- looking for resources to design a basic deep learning feed forward accelerator
- How are TensorFlow Models implemented on PYNQ's PS & PL
- Looking for HLS frameworks to start deploying DL algorithms on FPGAs