tensorstore
n5
Our great sponsors
tensorstore | n5 | |
---|---|---|
8 | 2 | |
1,279 | 151 | |
1.6% | 0.7% | |
9.5 | 8.5 | |
about 14 hours ago | 17 days ago | |
C++ | Java | |
GNU General Public License v3.0 or later | BSD 2-clause "Simplified" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tensorstore
- My high-performance multidimensional array library
-
Move over vector search, tensor search is here.
shhh...don't tell them about Tensorflow, or tf.Tensor, or theano.tensor, or paddle.to_tensor or torch.Tensor or torch.Tensor or mindspore.Tensor or tensorstore or ... But in all seriousness your pitchforks might be a little late to the party here. Rightly or wrongly (large) segments of the machine learning community have adopted terminology to describe things (for your sake, I hope you have not seen what they did with the term 'deconvolution'). This is not a new phenomena either and has been happening for at least a decade and almost certainly longer. However, there are other libraries that use the array terminology np.array, mx.nd.array, chainer, jax.numpy.array so it is definitely not unanimous. Whether you think this is an egregious mis-representation of the nomenclature or not is irrelevant now as it has a pretty established use within the community. Language is an evolving untamed beast - don't get angry, get building!
-
Storing word / document vectors in RDBMS
There are tons of other ways to store vector data, one was just recently released - https://github.com/google/tensorstore
-
Google AI Introduces ‘TensorStore,’ An Open-Source C++ And Python Library Designed For Reading And Writing Large Multi-Dimensional Arrays
Continue reading | Github | Google Full Blog
- tensorstore: Library for reading and writing large multi-dimensional arrays
- TensorStore: One-stop shop for high-performance array storage
-
[N] Google releases TensorStore for High-Performance, Scalable Array Storage
Today we are introducing TensorStore, an open-source C++ and Python software library designed for storage and manipulation of n-dimensional data that:
n5
-
[N] Google releases TensorStore for High-Performance, Scalable Array Storage
Provides a uniform API for reading and writing multiple array formats, including zarr and N5.
-
[Project] package Hub: store, stream, and access large datasets in seconds
For readers' context: zarr is a self-describing n-dimensional array hierarchy format specification which can sit over more or less any key-value store. If you've ever used HDF5, it's basically that, but array chunks are exploded over the file system/ cloud store, and all the metadata is JSON. It's gaining traction in the biological imaging and geo/meteorological data communities, among other places. Work on the v3 specification is in progress, which aims to abstract away a generic protocol, as well as fold in the community behind N5, an almost-identical format used by a small but vocal number of bio-imaging labs.
What are some alternatives?
deeplake - Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]
postgres-word2vec - utils to use word embedding models like word2vec vectors in a PostgreSQL database
librapid - A highly optimised C++ library for mathematical applications and neural networks.
txtai - 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/