Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
https://github.com/wolfgarbe/LuceneBench/blob/master/LuceneB...
So, you may be hitting SimpleFSDirectory instead, which does have issues with too many searches.
Could you share the reasons, MMapDirectory did not work for you?
>> you were crawling news.ycombinator.com, right?
No, for retrieving the Hacker News Posts we were using the public Hacker News API, which returns the posts in JSON format: https://github.com/HackerNews/API
The crawling speed of 100...1000 pages per second refers to crawling the external pages linked from Hacker news posts. As they are from different domains we can achieve a high crawling speed while being a polite crawler with a low crawling rate per domain.