Show HN: I scraped 25M Shopify products to build a search engine

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Geziyor

    Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

  • As someone who has scraped millions of items myself, I had success using Geziyor (https://github.com/geziyor/geziyor) built in Go. Shopify sites are especially easy to scrape because they tend to share the same product data formatting and don't hide it behind JS rendering.

    I'm biased, but I'd recommend exploring Typesense for search.

    It's an open source alternative to Algolia + Pinecone and e-commerce is a very common use-case.

    Here's a live demo with 32M songs: https://songs-search.typesense.org/

    Disclaimer: I work on Typesense.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • usearch

    Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

  • As you scale, you may benefit from these two projects I maintain, and the Big Tech uses :)

    https://github.com/unum-cloud/usearch - for faster search

    https://github.com/unum-cloud/uform - for cheaper multi-lingual multi-modal embeddings

  • uform

    Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

  • As you scale, you may benefit from these two projects I maintain, and the Big Tech uses :)

    https://github.com/unum-cloud/usearch - for faster search

    https://github.com/unum-cloud/uform - for cheaper multi-lingual multi-modal embeddings

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts