A simple Java library that extracts text from a valid HTML

This page summarizes the projects mentioned and recommended in the original post on /r/javahelp

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • html-extractor

    Discontinued A simple library that parses HTML input and extracts plain text from valid tags - made for fun, no regex :)

    Any thoughts? https://github.com/kaarbe/html-extractor

  • java-html-sanitizer

    Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.

    Is there any reason not to use OWASP Html Sanitizer for this use case?

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts