url-collector
An application that crawls the Common Crawl corpus for URLs with the specified file extensions. (by bottomless-archive-project)
fess
Fess is very powerful and easily deployable Enterprise Search Server. (by codelibs)
url-collector | fess | |
---|---|---|
2 | 1 | |
0 | 965 | |
- | 1.1% | |
5.1 | 8.7 | |
over 2 years ago | 6 days ago | |
Java | Java | |
MIT License | Apache License 2.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
url-collector
Posts with mentions or reviews of url-collector.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-10-02.
-
240 million URLs for PDF and DOC files
Well, I used Java. The app is still somewhat under construction, but it is available here: https://github.com/bottomless-archive-project/url-collector
fess
Posts with mentions or reviews of fess.
We have used some of these posts to build our list of alternatives
and similar projects.
-
All I wanted to do was log in
Could be related
What are some alternatives?
When comparing url-collector and fess you can also consider the following projects:
fscrawler - Elasticsearch File System Crawler (FS Crawler)
Apache Solr - Apache Lucene and Solr open-source search software
library-of-alexandria - Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.
solr - Apache Solr open-source search software
SpotifyDiscoveryBot - A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!
webSearch - Based on my webCrawler a Search Engine
Elasticsearch - Free and Open, Distributed, RESTful Search Engine
lucene - Apache Lucene open-source search software
LuceneBench - Lucene Benchmark : benchmarking Lucene vs. SeekStorm