voyager
incubator-gluten
voyager | incubator-gluten | |
---|---|---|
4 | 3 | |
1,178 | 1,014 | |
3.7% | 5.5% | |
7.9 | 9.9 | |
about 1 month ago | 4 days ago | |
C++ | Scala | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
voyager
- FLaNK Stack for 04 December 2023
- Voyager: An approximate nearest-neighbor search library for Python and Java
-
Approximate Nearest Neighbors Oh Yeah
Annoy came out of Spotify, and they just announced their successor library Voyager [1] last week [2].
[1]: https://github.com/spotify/voyager
- Voyager: A Library for Approximate Nearest-Neighbor Search by Spotify
incubator-gluten
-
A glimpse into the future of data processing infrastructure.
When I first learned about the Gluten project from Intel, I thought Databricks was going to be in trouble.
- FLaNK Stack for 04 December 2023
-
Blaze: Fast query execution engine for Apache Spark
Interesting, looks like it is just DataFusion engine for Spark. There is a similar project: https://github.com/oap-project/gluten - it brings ClickHouse as an engine to Spark.
What are some alternatives?
marker - Convert PDF to markdown quickly with high accuracy
LearningSparkV2 - This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
tensorflow - An Open Source Machine Learning Framework for Everyone
opaque-sql - An encrypted data analytics platform
mlpack - mlpack: a fast, header-only C++ machine learning library
blaze - Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
MITIE - MITIE: library and tools for information extraction
blaze - NumPy and Pandas interface to Big Data
LightGBM - A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Jupyter Scala - A Scala kernel for Jupyter
onnx-models - A copy of ONNX models, datasets, and code all in one GitHub repository. Follow the README to learn more.
kyuubi - Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.