lance
scratch-pdf-bot
lance | scratch-pdf-bot | |
---|---|---|
10 | 2 | |
3,275 | 35 | |
2.2% | - | |
9.8 | 6.0 | |
about 4 hours ago | 6 months ago | |
Rust | Python | |
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lance
- The Nimble File Format by Meta
-
Supabase Storage: now supports the S3 protocol
you should look at lance(https://lancedb.github.io/lance/)
-
Understanding Parquet, Iceberg and Data Lakehouses
Parquet has been the lakehouse file format of choice for nearly half a decade. But we are starting to see other contenders that are optimized more for lower latency like lance https://github.com/lancedb/lance
- FLaNK Stack Weekly for 12 June 2023
- FLaNK Stack 5-June-2023
- [Show HN] Lance is a Rust-based alternative to Parquet for ML data
-
Show HN: Lance is a Rust-based alternative to Parquet for ML data
getting bunch of 404s on the docs. for example https://eto-ai.github.io/lance/format.html (But this works: https://lancedb.github.io/lance/*)
Did you guys just pivot from eto-ai to lancedb?
-
Any job processing framework like Spark but in Rust?
For Feature Stores check out: https://github.com/eto-ai/lance
- Show HN: Lance – Deep Learning with DuckDB and Arrow
scratch-pdf-bot
-
Show HN: Lance is a Rust-based alternative to Parquet for ML data
I initially built this same "chat with PDFs" prototype with LangChain and qdrant. I then rebuilt it from scratch for the sake of learning and comparison.
Some context: I've been a jack-of-all-trades data scientist / machine learning engineer for the past 15 years (officially titled as an MLE the last four years).
I share that only because I think it plays a role in how I'm typically accustomed to working.
1. I found LangChain to be overkill for this use-case. While it might allow some to move more quickly when building, I found it to be cumbersome. My suspicion is this is largely because of my background - I understand how to build much of what's "under the hood" in LangChain. Because of this, I think it felt overly abstracted and I found the docs difficult to navigate and sometimes incomplete.
2. I used Qdrant via their docker image and it was simple to setup and start using. I didn't try to push the limits with it, so I can't say anything about performance. Because Qdrant runs as an http service, I found that it didn't fit well into my workflow - I'm accustomed to being able to visually inspect my data inside the interpreter, debugging, trying out commands, interacting and experimenting with my results, etc. Again, my suspicion is this is my own bias in how I typically work. Qdrant otherwise seemed very nice.
3. LanceDB felt powerful yet lightweight, and fit well into my workflow. It was far more intuitive for me. It was as if sqlite, the python data ecosystem, and a vector database had a child and named it LanceDB. Under the hood, it's built on Apache Arrow and integrates nicely with pandas, allowing me to seamlessly go from LanceDB table on disk, to pandas dataframe, and into some analysis or investigation of my LanceDB query results. This line [1] is a great example of why I liked it. This feels nicer to me than the world of API params and HTTP requests.
1. https://github.com/gjreda/scratch-pdf-bot/blob/main/gpt_pdf_...
What are some alternatives?
roop - one-click face swap
chatgpt-comparison-detection - Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥
deeplake - Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
RasaGPT - 💬 RasaGPT is the first headless LLM chatbot platform built on top of Rasa and Langchain. Built w/ Rasa, FastAPI, Langchain, LlamaIndex, SQLModel, pgvector, ngrok, telegram
Lixur - Lixur is an open-sourced project that seeks to build a scalable, feeless, decentralized, quantum-secure, and easy-to-use blockchain with smart, and intelligent (A.I.) contract functionality.
embedditor - ⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.
polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust
sycamore - 🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
Rio - A hardware-accelerated GPU terminal emulator focusing to run in desktops and browsers.
chatdocs - Chat with your documents offline using AI.
LMOps - General technology for enabling AI capabilities w/ LLMs and MLLMs