HTML NLP

Open-source HTML projects categorized as NLP

Top 19 HTML NLP Projects

  1. unstructured

    Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. languagemodels

    Explore large language models in 512MB of RAM

  4. Sherlock

    Natural-language event parser for Javascript (by neilgupta)

  5. Giveme5W1H

    Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

  6. awesome-python

    🐍 Hand-picked awesome Python libraries and frameworks, organised by category (by dylanhogg)

  7. openunivcourses

    FREE ML Courses from Top Universities

  8. Groqqle

    Groqqle is a powerful web search and content summarization tool built with Python, leveraging Groq's LLM API for advanced natural language processing. It offers customizable web and news searches, image analysis, and adaptive content summaries, making it ideal for researchers, developers, and anyone seeking enhanced information retrieval.

  9. rgpt3

    Making requests from R to the GPT models

  10. botfuel-dialog

    Botfuel SDK to build highly conversational chatbots

  11. stripnet

    STriP Net: Semantic Similarity of Scientific Papers (S3P) Network

  12. go-htmldate

    CLI and Go package for extracting publication date of a web pages.

  13. speaking_with_plato

    Exploring Plato's philosophy with AI - A Data Spiral blog article

  14. infinigram

    High-speed corpus-based language model using suffix arrays for variable-length n-gram matching. Instant training, exact matching, O(m log n) queries.

    Project mention: Infinigram: Variable-Length N-grams via Suffix Arrays | dev.to | 2026-06-06

    Infinigram (pip install py-infinigram) is a corpus-based language model that uses suffix arrays for variable-length n-gram pattern matching. Unlike neural language models, there is no training step. The corpus is the model.

  15. Conversations

    A chat-bot that is community-driven and open source – powered by you! (WIP) (by MarketingPipeline)

  16. datalabel

    datalabel is a UI-based data editing tool that makes it easy to create labeled text data in a dataframe. With datalabel, you can quickly and effortlessly edit your data without having to write any code. Its intuitive interface makes it ideal for both experienced data professionals and those new to data editing.

  17. nfl-prospects-nlp

    Sentiment analysis and text generation of NFL prospect scouting reports 2014-2022

  18. Orbit-dependency-visualised

    Orbis converts any GitHub repo into an interactive 3D dependency graph by parsing ASTs, detecting architecture patterns, and rendering modules as a navigable scene. Built-in LLM assistant answers questions about the graph.

    Project mention: Orbis: Turn Any GitHub Repository Into an Interactive 3D Dependency Graph | dev.to | 2026-05-09

    The code is at https://github.com/dakshjain-1616/Orbit-dependency-visualised You can also build with NEO in your IDE using the VS Code extension or Cursor. You can use NEO MCP with Claude Code: https://heyneo.com/claude-code

  19. hanakotoba

    Exploring 花言葉 in Japanese and other literary corpora

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

HTML NLP discussion

Log in or Post with

HTML NLP related posts

  • Annotated Code for Predict Next Word Based on Context and Learned Patterns

    1 project | news.ycombinator.com | 16 Jun 2025
  • Let Claude read your Gas Meter with this Amazing new Feature

    1 project | dev.to | 29 Nov 2024
  • LLMs for Report Validation

    1 project | news.ycombinator.com | 16 Aug 2024
  • Unstructured: Open-Source Tool for Custom ML Preprocessing Pipelines

    1 project | news.ycombinator.com | 15 Aug 2024
  • Unstructured: Open-Source Tools for Custom Machine Learning Pipelines

    1 project | news.ycombinator.com | 14 Aug 2024
  • Quick tip: Using R, OpenAI and SingleStore Notebooks

    1 project | dev.to | 1 May 2024
  • Unstructured – OSS libraries and APIs to build custom preprocessing pipelines

    1 project | news.ycombinator.com | 10 Jul 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 14 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source NLP projects in HTML? This list will help you:

# Project Stars
1 unstructured 14,882
2 languagemodels 1,191
3 Sherlock 562
4 Giveme5W1H 530
5 awesome-python 460
6 openunivcourses 254
7 Groqqle 157
8 rgpt3 117
9 botfuel-dialog 100
10 stripnet 86
11 go-htmldate 11
12 speaking_with_plato 9
13 infinigram 3
14 Conversations 3
15 datalabel 3
16 nfl-prospects-nlp 1
17 Orbit-dependency-visualised 1
18 hanakotoba 1
19 quran-semantic-search 0

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that HTML is
the 9th most popular programming language
based on number of references?