Show HN: Deephn.org full-text search 30M Hacker News posts AND linked web pages

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • LuceneBench

    Discontinued Lucene Benchmark : benchmarking Lucene vs. SeekStorm

  • https://github.com/wolfgarbe/LuceneBench/blob/master/LuceneB...

    So, you may be hitting SimpleFSDirectory instead, which does have issues with too many searches.

    Could you share the reasons, MMapDirectory did not work for you?

  • Hacker News API

    Documentation and Samples for the Official HN API

  • >> you were crawling news.ycombinator.com, right?

    No, for retrieving the Hacker News Posts we were using the public Hacker News API, which returns the posts in JSON format: https://github.com/HackerNews/API

    The crawling speed of 100...1000 pages per second refers to crawling the external pages linked from Hacker news posts. As they are from different domains we can achieve a high crawling speed while being a polite crawler with a low crawling rate per domain.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts