Search 1B pages on AWS S3 for

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • microblob

    Serve millions of JSON documents via HTTP.

  • Combining data-at-rest with some slim index structure coupled with a common access method (like HTTP) was the idea behind a tool a key-value store for JSON I once wrote: https://github.com/miku/microblob

    I first thought of building a custom index structure, but found that I did not need everything in memory all the time. Using an embedded leveldb works just fine.

  • aistore

    AIStore: scalable storage for AI applications

  • > If you think more about this, it will be like distributed key value store with support both disk and memory access. You can write one using some opensource Raft libraries, or a possible candidate is Tikv from PingCap

    My whole point was not building it ;)

    There's also https://github.com/NVIDIA/aistore

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rocksdb-cloud

    A library that provides an embeddable, persistent key-value store for fast storage optimized for AWS

  • How big is each document ? If documents are big, keep each of them as a separate file and store the ids in a database. If documents are small, then you want something like https://github.com/rockset/rocksdb-cloud for a building block

  • tantivy

    Discontinued Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust [Moved to: https://github.com/quickwit-oss/tantivy] (by quickwit-inc)

  • What we store on S3 is a regular tantivy index and another tiny data structure that we call "turbo index", which makes queries faster on object storages. For this demo, the tantivy indexes are fairly large and we issue HTTP Range requests against them.

    https://github.com/tantivy-search/tantivy

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Are you using Rust at work? If yes, for what?

    2 projects | /r/rust | 8 Jul 2021
  • Adding search to static websites

    1 project | dev.to | 4 May 2024
  • How to Visualize and Analyze Data in Open Source Communities

    4 projects | dev.to | 21 Apr 2024
  • Tauri CRUD Boilerplate

    2 projects | dev.to | 10 Apr 2024
  • Interview with Colin Lienard, Founder of GitLight

    2 projects | dev.to | 1 Apr 2024