tensorstore
postgres-word2vec
Our great sponsors
tensorstore | postgres-word2vec | |
---|---|---|
8 | 2 | |
1,279 | 140 | |
1.6% | - | |
9.5 | 2.6 | |
about 17 hours ago | over 2 years ago | |
C++ | C | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tensorstore
- My high-performance multidimensional array library
-
Move over vector search, tensor search is here.
shhh...don't tell them about Tensorflow, or tf.Tensor, or theano.tensor, or paddle.to_tensor or torch.Tensor or torch.Tensor or mindspore.Tensor or tensorstore or ... But in all seriousness your pitchforks might be a little late to the party here. Rightly or wrongly (large) segments of the machine learning community have adopted terminology to describe things (for your sake, I hope you have not seen what they did with the term 'deconvolution'). This is not a new phenomena either and has been happening for at least a decade and almost certainly longer. However, there are other libraries that use the array terminology np.array, mx.nd.array, chainer, jax.numpy.array so it is definitely not unanimous. Whether you think this is an egregious mis-representation of the nomenclature or not is irrelevant now as it has a pretty established use within the community. Language is an evolving untamed beast - don't get angry, get building!
-
Storing word / document vectors in RDBMS
There are tons of other ways to store vector data, one was just recently released - https://github.com/google/tensorstore
-
Google AI Introduces ‘TensorStore,’ An Open-Source C++ And Python Library Designed For Reading And Writing Large Multi-Dimensional Arrays
Continue reading | Github | Google Full Blog
- tensorstore: Library for reading and writing large multi-dimensional arrays
- TensorStore: One-stop shop for high-performance array storage
-
[N] Google releases TensorStore for High-Performance, Scalable Array Storage
Today we are introducing TensorStore, an open-source C++ and Python software library designed for storage and manipulation of n-dimensional data that:
postgres-word2vec
-
Storing word / document vectors in RDBMS
I've recently stumbled upon smaller projects, like FREDDY (https://github.com/guenthermi/postgres-word2vec), a Postgres extension that looks interesting. The ability to write ad-hoc similarity queries in SQL seems like it might be valuable in some circumstances. I'm not sure about performance or storage efficacy.
-
Build a fuzzy search with PostgreSQL
The syntactic similarity is not enough for you? Word2vec could be an option to compare the semantic similarity of words. Luckily there already is a postgres-word2vec extension for this.
What are some alternatives?
deeplake - Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
n5 - Not HDF5
RediSearch - A query and indexing engine for Redis, providing secondary indexing, full-text search, vector similarity search and aggregations.
librapid - A highly optimised C++ library for mathematical applications and neural networks.
psycopg2 - PostgreSQL database adapter for the Python programming language
txtai - 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
PipelineDB - High-performance time-series aggregation for PostgreSQL
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
TurboPFor - Fastest Integer Compression
Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]
citus - Distributed PostgreSQL as an extension