Geziyor
uform
Geziyor | uform | |
---|---|---|
2 | 8 | |
2,480 | 885 | |
0.6% | 8.4% | |
0.6 | 9.2 | |
7 months ago | 10 days ago | |
Go | Python | |
Mozilla Public License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Geziyor
-
Show HN: I scraped 25M Shopify products to build a search engine
As someone who has scraped millions of items myself, I had success using Geziyor (https://github.com/geziyor/geziyor) built in Go. Shopify sites are especially easy to scrape because they tend to share the same product data formatting and don't hide it behind JS rendering.
-
Show HN: Flyscrape โ A standalone and scriptable web scraper in Go
Its been 8+ years since i started scraping. I even wrote a popular Go web scraping framework previously: (https://github.com/geziyor/geziyor).
These days, I'm not even using Go for scraping, as the webpage changes makes me crazy, so I moved to Typescript+Playwright. (Crawlee framework is cool, while not strictly necessary).
My favorite stack as of 2023: TypeScript+Playwright+Crawlee(Optional)
uform
-
CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data
question: any good on-device size image embedding models?
tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?
- Multimodal Embeddings for JavaScript, Swift, and Python
- Show HN: UForm v2 Featuring Multimodal Matryoshka, Multimodal DPO, and ONNX
- UForm v1: Multimodal Chat in 1.5B Parameters
-
Show HN: I scraped 25M Shopify products to build a search engine
As you scale, you may benefit from these two projects I maintain, and the Big Tech uses :)
https://github.com/unum-cloud/usearch - for faster search
https://github.com/unum-cloud/uform - for cheaper multi-lingual multi-modal embeddings
-
Show HN: U)Search Images demo in 200 lines of Python
[2]: https://github.com/unum-cloud/uform
- Show HN: UForm v2 โ tiny CLIP-like embeddings in 21 languages and Graphcore API
-
Unum: Vector Search engine in a single file
Ouch! Thatโs fat! Which model is that?
We have built a few video-search system by now, using USearch and UForm for embedding. They are only 256 dims and you can concatenate a few from different parts of the video. Any chance it would help?
https://github.com/unum-cloud/uform
What are some alternatives?
colly - Elegant Scraper and Crawler Framework for Golang
CogVLM - a state-of-the-art-level open visual language model | ๅคๆจกๆ้ข่ฎญ็ปๆจกๅ
Pholcus - Pholcus is a distributed high-concurrency crawler software written in pure golang
usearch - Fast Open-Source Search & Clustering engine ร for Vectors & ๐ Strings ร in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram ๐
jsonrpconion - Library for building JSON RPC services on Tor network
kuzu - Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
Ferret - Declarative web scraping
LinkBERT - [ACL 2022] LinkBERT: A Knowledgeable Language Model ๐ Pretrained with Document Links
google-search-results-golang - Google Search Results GoLang API
neural-file-sorter - A neural network based file sorter. Trains an autoencoder to sort images or audio based on the similarity of their encodings, or uses the OpenAI CLIP model.
gichidan - Gichidan - CLI wrapper for Ichidan deep-web search engine.
ucall - Remote Procedure Calls - 50x lower latency and 70x higher bandwidth than FastAPI, implementing JSON-RPC & ๐ REST over io_uring and SIMDJSON โ๏ธ