DALI
DREAMPlace
DALI | DREAMPlace | |
---|---|---|
5 | 2 | |
4,917 | 622 | |
1.0% | - | |
9.6 | 7.4 | |
2 days ago | 11 days ago | |
C++ | C++ | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DALI
-
[D] Will data augmentations work faster on TPUs?
Another option is DALI https://github.com/NVIDIA/DALI For my project while training EfficientNet2, it was a game changer. But it a way harder to implement in code than TorchVision or Kornia.
-
DirectStorage - Loading data to GPU *directly* from the SSD drive, almost without using CPU
Check out https://github.com/nvidia/DALI
-
mmap_ninja: Speedup your training dramatically by using memory-mapped files for your dataset
Small question if you are using GPU: How to this compare to GPUDirect Storage from Nvidia? can you have even more speedup by using both? I never toy with it, but the DALI project from Nvidia seem to tackle the same data loading problem.
- [D] Efficiently loading videos in PyTorch without extracting frames
DREAMPlace
-
A Simulated Annealing FPGA Placer in Rust
Yes, see "DREAMPlace: DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement".[1] The technique in particular rather reformulates VLSI placement in terms of a non-linear optimization problem. Which is how ML frameworks (broadly) work, optimizing approximations to high-dimensional non-linear functions. So it's not like, shoving the netlist it into an LLM or an existing network or anything.
Note that DREAMPlace is a global placer; it also comes with a detail placer but global placement is what it is targeted at. I don't know of an appropriate research analogue for the routing phase of the problem that follows placing, but maybe someone else does.
[1] https://github.com/limbo018/DREAMPlace
-
Nvidia: GPUs can do better chip design in a few days than 10 man year
Huge part of why OpenROAD (and as this article.indicates, nvidia) are so focused on machine learning! Because the nitty gritty of chip design has abundant gnarly problems requiring deep deep expertise. Deploying software engineers is hard. But building ml is kind of our bag!
There's another nice upstart opensource project with even fancier ml placememt systems that spawned recently out of the openroad world, dreamplace, https://github.com/limbo018/DREAMPlace
This is just gonna get more & more biased against a couple super smart engineers who we've deeply entrusted to divine inner the workings of the chips on, & become increasingly a set of better modelled problems that we can machine learningly optimize.
What are some alternatives?
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
tensorRT_Pro - C++ library based on tensorrt integration
Blurry - Blurry is an easy blur library for Android
tiny-cuda-nn - Lightning fast C++/CUDA neural network framework
vision - Datasets, Transforms and Models specific to Computer Vision
deepdetect - Deep Learning API and Server in C++14 support for Caffe, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
executorch - On-device AI across mobile, embedded and edge for PyTorch
onnxruntime - ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
MegEngine - MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
Cores-VeeR-EH1 - VeeR EH1 core
ocaml-torch - OCaml bindings for PyTorch
ncnn - ncnn is a high-performance neural network inference framework optimized for the mobile platform