HTML NLP

Open-source HTML projects categorized as NLP

Top 15 HTML NLP Projects

  • unstructured

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

  • Project mention: LlamaCloud and LlamaParse | news.ycombinator.com | 2024-02-20

    Be careful with unstructured:

    https://github.com/Unstructured-IO/unstructured/blob/d11c70c...

    from: https://github.com/open-webui/open-webui/issues/687

  • bootcamp

    Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc. (by milvus-io)

  • Project mention: FLaNK AI - 01 April 2024 | dev.to | 2024-04-01
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • datefinder

    Find dates inside text using Python and get back datetime objects

  • Project mention: Sneller Regex vs Ripgrep | news.ycombinator.com | 2023-05-18

    That's with DFA minimization. Also, '\w' has 311 states while '(?-u)\w' has 5 states.

    I don't have a precise definition of enormous or impractical. Does it matter? I suppose one obvious one is when DFA construction time starts having a significant impact on total search times.

    > Additionally, the results are not the same: the number of matches is not equal to 7882. How could I make `\w` conform to other regex implementations in ripgrep?

    By following UTS#18: https://unicode.org/reports/tr18/#word

    Most regex engines make \w be ASCII-only by default. But most also have a way to opt into Unicode-aware mode. RE2, Go's regexp and ECMAScript are popular regex engines that have no way to change the interpretation of \w.

    > Fair question how regex compilers handle nefarious regexes. Go does not handle NFA with more than 1000 states, and, as you observed, we added some more restrictions when processing the NFA. It can be an interesting academic exercise to find monstrous regexes, but we haven't encountered useful regexes that hit these limits. But I guess you know some...

    It's definitely not academic. People use regexes for lexers. People use big regexes to recognize certain things like email addresses and dates. Here's a real regex used in real software to identify dates in unstructured text for example: https://github.com/akoumjian/datefinder/blob/5376ece0a522c44...

    Otherwise, as I hinted at above, the thing that can make regexes very large very quickly is when you mix Unicode classes with counted repetitions. It doesn't take a lot to make them "big."

  • Sherlock

    Natural-language event parser for Javascript (by neilgupta)

  • Giveme5W1H

    Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

  • awesome-python

    🐍 Hand-picked awesome Python libraries and frameworks, organised by category (by dylanhogg)

  • Project mention: Discover Awesome Python projects | /r/Python | 2023-05-21

    To be transparent, the actual scoring algorithm used on the site can be viewed here, and the data with all source features is available also if you want play with it.

  • botfuel-dialog

    Botfuel SDK to build highly conversational chatbots

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rgpt3

    Making requests from R to the GPT models

  • Project mention: How do I get GPT to look through hundreds of pages of PDFs? | /r/OpenAI | 2023-04-24

    It has a community-maintained library.

  • stripnet

    STriP Net: Semantic Similarity of Scientific Papers (S3P) Network

  • speaking_with_plato

    Exploring Plato's philosophy with AI - A Data Spiral blog article

  • go-htmldate

    CLI and Go package for extracting publication date of a web pages.

  • Conversations

    A chat-bot that is community-driven and open source – powered by you! (WIP) (by MarketingPipeline)

  • datalabel

    datalabel is a UI-based data editing tool that makes it easy to create labeled text data in a dataframe. With datalabel, you can quickly and effortlessly edit your data without having to write any code. Its intuitive interface makes it ideal for both experienced data professionals and those new to data editing.

  • hanakotoba

    Exploring 花言葉 in Japanese and other literary corpora

  • Project mention: Evidence – Business Intelligence as Code | news.ycombinator.com | 2023-04-20

    Thanks for sharing, that looks great! That Datapane example is more a hello world web app that runs Python code in the backend (so requires a backend server - Fly.io in this example).

    An example of a standalone report would be something like this, from one of our users: https://cloud.datapane.com/reports/dkjbvwk/literature-in-blo... (code: https://github.com/ryancahildebrandt/hanakotoba) or https://cloud.datapane.com/reports/aAMaqoA/when-fact-is-fals...

  • nfl-prospects-nlp

    Sentiment analysis and text generation of NFL prospect scouting reports 2014-2022

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-01.

HTML NLP related posts

Index

What are some of the best open-source NLP projects in HTML? This list will help you:

Project Stars
1 unstructured 5,750
2 bootcamp 1,606
3 datefinder 625
4 Sherlock 517
5 Giveme5W1H 500
6 awesome-python 232
7 botfuel-dialog 101
8 rgpt3 94
9 stripnet 85
10 speaking_with_plato 8
11 go-htmldate 4
12 Conversations 3
13 datalabel 2
14 hanakotoba 1
15 nfl-prospects-nlp 1
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com