Java Web Crawling

Open-source Java projects categorized as Web Crawling

Top 6 Java Web Crawling Projects

  • webmagic

    A scalable web crawler framework for Java.

  • jsoup

    jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

  • Project mention: FLaNK Stack Weekly for 20 June 2023 | dev.to | 2023-06-20
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Crawler4j

    Open Source Web Crawler for Java

  • Apache Nutch

    Apache Nutch is an extensible and scalable web crawler

  • Sparkler

    Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

  • google-search-results-java

    Google Search Results JAVA API via SerpApi

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Java Web Crawling related posts

Index

What are some of the best open-source Web Crawling projects in Java? This list will help you:

Project Stars
1 webmagic 11,239
2 jsoup 10,625
3 Crawler4j 4,469
4 Apache Nutch 2,809
5 Sparkler 409
6 google-search-results-java 31

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com