heavydb
bolt
heavydb | bolt | |
---|---|---|
1 | 6 | |
2,902 | 2,463 | |
0.3% | - | |
8.4 | 0.0 | |
about 1 month ago | over 1 year ago | |
C++ | C++ | |
Apache License 2.0 | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
heavydb
bolt
-
Show HN: Want something better than k-means? Try BanditPAM
> frown on that sort of dataset
That example was definitely contrived and designed to strongly illustrate the point. I'll counter slightly that non-peaky topologies aren't uncommon, but they're unlikely to look anything that would push KMedoids to a pathological state rather than just a slightly worse state ("worse" assuming that KMeans is the right choice for a given problem).
> worth pointing out .. data reference
Totally agreed. I hope my answer didn't come across as too negative. It's good work, and everyone else was talking about the positives, so I just didn't want to waste too much time echoing again that while getting the other points across.
> bolt reference
https://github.com/dblalock/bolt
They say as much in their paper, but they aren't the first vector quantization library by any stretch. Their contributions are, roughly:
1. If you're careful selecting the right binning strategy then you can cancel out a meaningful amount of discretization error.
2. If you do that, you can afford to choose parameters that fit everything nicely into AVX2 machine words, turning 100s of branching instructions into 1-4 instructions.
3. Doing some real-world tests to show that (1-2) matter.
Last I checked their code wasn't very effective for the places I wanted to apply it, but the paper is pretty solid. I'd replace it with a faster KMeans approximation less likely to crash on big data (maybe even initializing with KMedoids :) ), and if the thing you're quantizing is trainable with some sort of gradient update step then you should do a few optimization passes in the discretized form as well.
- Bolt: Faster matrix and vector operations that run on compressed data
- 10x faster matrix and vector operations
-
[R] Multiplying Matrices Without Multiplying
Code: https://github.com/dblalock/bolt
What are some alternatives?
llvm8 - Statically recompiling CHIP8 to Windows and macOS using LLVM
composer - Supercharge Your Model Training
vis_avs - MinGW GCC port of Advanced Visualization Studio for Winamp
halutmatmul - Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator
SVF - Static Value-Flow Analysis Framework for Source Code
draco - Draco is a library for compressing and decompressing 3D geometric meshes and point clouds. It is intended to improve the storage and transmission of 3D graphics.
mlir-aie - An MLIR-based toolchain for AMD AI Engine-enabled devices.
PGM-index - 🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
llvm-string-obfuscator - LLVM String Obfuscator
LightGBM - A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
llvm-bindings - LLVM bindings for Node.js/JavaScript/TypeScript
Snappy - A fast compressor/decompressor