postgres-word2vec
tensorstore
postgres-word2vec | tensorstore | |
---|---|---|
2 | 8 | |
140 | 1,280 | |
- | 0.2% | |
2.6 | 9.5 | |
over 2 years ago | 3 days ago | |
C | C++ | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
postgres-word2vec
-
Storing word / document vectors in RDBMS
I've recently stumbled upon smaller projects, like FREDDY (https://github.com/guenthermi/postgres-word2vec), a Postgres extension that looks interesting. The ability to write ad-hoc similarity queries in SQL seems like it might be valuable in some circumstances. I'm not sure about performance or storage efficacy.
-
Build a fuzzy search with PostgreSQL
The syntactic similarity is not enough for you? Word2vec could be an option to compare the semantic similarity of words. Luckily there already is a postgres-word2vec extension for this.
tensorstore
- My high-performance multidimensional array library
-
Move over vector search, tensor search is here.
shhh...don't tell them about Tensorflow, or tf.Tensor, or theano.tensor, or paddle.to_tensor or torch.Tensor or torch.Tensor or mindspore.Tensor or tensorstore or ... But in all seriousness your pitchforks might be a little late to the party here. Rightly or wrongly (large) segments of the machine learning community have adopted terminology to describe things (for your sake, I hope you have not seen what they did with the term 'deconvolution'). This is not a new phenomena either and has been happening for at least a decade and almost certainly longer. However, there are other libraries that use the array terminology np.array, mx.nd.array, chainer, jax.numpy.array so it is definitely not unanimous. Whether you think this is an egregious mis-representation of the nomenclature or not is irrelevant now as it has a pretty established use within the community. Language is an evolving untamed beast - don't get angry, get building!
-
Storing word / document vectors in RDBMS
There are tons of other ways to store vector data, one was just recently released - https://github.com/google/tensorstore
-
Google AI Introduces ‘TensorStore,’ An Open-Source C++ And Python Library Designed For Reading And Writing Large Multi-Dimensional Arrays
Continue reading | Github | Google Full Blog
- tensorstore: Library for reading and writing large multi-dimensional arrays
- TensorStore: One-stop shop for high-performance array storage
-
[N] Google releases TensorStore for High-Performance, Scalable Array Storage
Today we are introducing TensorStore, an open-source C++ and Python software library designed for storage and manipulation of n-dimensional data that:
What are some alternatives?
TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
deeplake - Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
RediSearch - A query and indexing engine for Redis, providing secondary indexing, full-text search, vector similarity search and aggregations.
n5 - Not HDF5
psycopg2 - PostgreSQL database adapter for the Python programming language
librapid - A highly optimised C++ library for mathematical applications and neural networks.
PipelineDB - High-performance time-series aggregation for PostgreSQL
txtai - 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
TurboPFor - Fastest Integer Compression
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
citus - Distributed PostgreSQL as an extension
Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]