SaaSHub helps you find the best software and product alternatives Learn more →
Top 19 HTML NLP Projects
-
unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
Giveme5W1H
Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
-
awesome-python
🐍 Hand-picked awesome Python libraries and frameworks, organised by category (by dylanhogg)
-
-
Groqqle
Groqqle is a powerful web search and content summarization tool built with Python, leveraging Groq's LLM API for advanced natural language processing. It offers customizable web and news searches, image analysis, and adaptive content summaries, making it ideal for researchers, developers, and anyone seeking enhanced information retrieval.
-
-
-
-
-
-
infinigram
High-speed corpus-based language model using suffix arrays for variable-length n-gram matching. Instant training, exact matching, O(m log n) queries.
Infinigram (pip install py-infinigram) is a corpus-based language model that uses suffix arrays for variable-length n-gram pattern matching. Unlike neural language models, there is no training step. The corpus is the model.
-
Conversations
A chat-bot that is community-driven and open source – powered by you! (WIP) (by MarketingPipeline)
-
datalabel
datalabel is a UI-based data editing tool that makes it easy to create labeled text data in a dataframe. With datalabel, you can quickly and effortlessly edit your data without having to write any code. Its intuitive interface makes it ideal for both experienced data professionals and those new to data editing.
-
-
Orbit-dependency-visualised
Orbis converts any GitHub repo into an interactive 3D dependency graph by parsing ASTs, detecting architecture patterns, and rendering modules as a navigable scene. Built-in LLM assistant answers questions about the graph.
Project mention: Orbis: Turn Any GitHub Repository Into an Interactive 3D Dependency Graph | dev.to | 2026-05-09The code is at https://github.com/dakshjain-1616/Orbit-dependency-visualised You can also build with NEO in your IDE using the VS Code extension or Cursor. You can use NEO MCP with Claude Code: https://heyneo.com/claude-code
-
-
Project mention: How HN: 5-translation RAG matrix fixing LLM religious hallucinations | news.ycombinator.com | 2026-04-18
HTML NLP discussion
HTML NLP related posts
-
Annotated Code for Predict Next Word Based on Context and Learned Patterns
-
Let Claude read your Gas Meter with this Amazing new Feature
-
LLMs for Report Validation
-
Unstructured: Open-Source Tool for Custom ML Preprocessing Pipelines
-
Unstructured: Open-Source Tools for Custom Machine Learning Pipelines
-
Quick tip: Using R, OpenAI and SingleStore Notebooks
-
Unstructured – OSS libraries and APIs to build custom preprocessing pipelines
-
A note from our sponsor - SaaSHub
www.saashub.com | 14 Jun 2026
Index
What are some of the best open-source NLP projects in HTML? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | unstructured | 14,882 |
| 2 | languagemodels | 1,191 |
| 3 | Sherlock | 562 |
| 4 | Giveme5W1H | 530 |
| 5 | awesome-python | 460 |
| 6 | openunivcourses | 254 |
| 7 | Groqqle | 157 |
| 8 | rgpt3 | 117 |
| 9 | botfuel-dialog | 100 |
| 10 | stripnet | 86 |
| 11 | go-htmldate | 11 |
| 12 | speaking_with_plato | 9 |
| 13 | infinigram | 3 |
| 14 | Conversations | 3 |
| 15 | datalabel | 3 |
| 16 | nfl-prospects-nlp | 1 |
| 17 | Orbit-dependency-visualised | 1 |
| 18 | hanakotoba | 1 |
| 19 | quran-semantic-search | 0 |