text-classification

Open-source projects categorized as text-classification

Top 23 text-classification Open-Source Projects

  • HanLP

    中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

  • Project mention: Step by step guide to create customized chatbot by using spaCy (Python NLP library) | dev.to | 2024-03-23

    Hi Community, In this article, I will demonstrate below steps to create your own chatbot by using spaCy (spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython):

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Resume-Matcher

    Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.

  • Project mention: Hacktoberfest 2023: The Complete Guide | dev.to | 2023-09-22

    GitHub: https://github.com/srbhr/Resume-Matcher Website: https://www.resumematcher.fyi/ Discord: Resume Matcher's Discord Tech Stack: Python, NextJS, FastAPI, TypeScript

  • text-classification-cnn-rnn

    CNN-RNN中文文本分类,基于TensorFlow

  • simpletransformers

    Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI

  • spark-nlp

    State of the Art Natural Language Processing

  • Project mention: Spark NLP 5.1.0: Introducing state-of-the-art OpenAI Whisper speech-to-text, OpenAI Embeddings and Completion transformers, MPNet text embeddings, ONNX support for E5 text embeddings, new multi-lingual BART Zero-Shot text classification, and much more! | /r/Python | 2023-09-06
  • catalyst

    Accelerated deep learning R&D (by catalyst-team)

  • Project mention: Instance segmentation of small objects in grainy drone imagery | /r/computervision | 2023-12-09
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • pythoncode-tutorials

    The Python Code Tutorials

  • instructor-embedding

    [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

  • Project mention: My experience on starting with fine tuning LLMs with custom data | /r/LocalLLaMA | 2023-07-10

    If you li embeddings and vector DB, you should look into this: https://github.com/HKUNLP/instructor-embedding

  • eda_nlp

    Data augmentation for NLP, presented at EMNLP 2019

  • mteb

    MTEB: Massive Text Embedding Benchmark

  • Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

    RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:

    - Chunking can interfer with context boundaries

    - Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)

    - Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)

    - RAG will miserably fail with requests like "summarize the whole document"

    - to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb

    1 https://github.com/underlines/awesome-marketing-datascience/...

  • refinery

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

  • text_gcn

    Graph Convolutional Networks for Text Classification. AAAI 2019

  • chatgpt-comparison-detection

    Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥

  • obsei

    Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .

  • whatlang-rs

    Natural language detection library for Rust. Try demo online: https://whatlang.org/

  • Project mention: Lingua 1.5.0 - The most accurate natural language detection library for Rust, now with support for detecting multiple languages in mixed-language text | /r/rust | 2023-06-15

    How does it compare to whatlang?

  • spacy-llm

    🦙 Integrating LLMs into structured NLP pipelines

  • Project mention: Integrating LLMs into structured NLP pipelines | news.ycombinator.com | 2023-09-10
  • nlu

    1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.

  • DataAug4NLP

    Collection of papers and resources for data augmentation for NLP.

  • MeTA

    A Modern C++ Data Sciences Toolkit (by meta-toolkit)

  • Nuclia DB

    NucliaDB, The AI Search database for RAG

  • Project mention: Tantivy 0.20 is released: Schemaless column store, Schemaless aggregations, Phrase prefix queries, Percentiles, and more... | /r/rust | 2023-06-20

    You have also NucliaDB that is built on top of tantivy and addresses vector search for documents and video search.

  • BERTweet

    BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

  • small-text

    Active Learning for Text Classification in Python

  • Project mention: Small-Text: Looking for Contributors (Active Learning, Text Classification, NLP) | /r/LanguageTechnology | 2023-05-21
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

text-classification related posts

  • New AI Spam Detection Deployed to Gmail and Open Sourced by Google

    1 project | news.ycombinator.com | 13 Dec 2023
  • Which UI library for react or next are you using in your project?

    1 project | dev.to | 6 Aug 2023
  • Resume Matcher: Free Open Source Python Based ATS with ML

    1 project | /r/opensource | 30 Jul 2023
  • My personal project Resume Matcher is trending on GitHub with 500+ stars. Thank you 🙏 for this!

    1 project | /r/developersIndia | 29 Jul 2023
  • Resume Matcher – Free Open Source ATS Tool to Match Resumes to Job Descriptions

    1 project | news.ycombinator.com | 28 Jul 2023
  • Show HN: I made an open-source Resume Matcher. A Python based ATS with ML

    1 project | news.ycombinator.com | 24 Jul 2023
  • I've made a customisable SMS personal assistant which has infinite and persistent semantic memory.

    2 projects | /r/LocalLLaMA | 27 May 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 2 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source text-classification projects? This list will help you:

Project Stars
1 HanLP 32,388
2 spaCy 28,751
3 Resume-Matcher 4,522
4 text-classification-cnn-rnn 4,075
5 simpletransformers 3,984
6 spark-nlp 3,695
7 catalyst 3,227
8 pythoncode-tutorials 2,002
9 instructor-embedding 1,703
10 eda_nlp 1,536
11 mteb 1,395
12 refinery 1,365
13 text_gcn 1,326
14 chatgpt-comparison-detection 1,191
15 obsei 1,079
16 whatlang-rs 952
17 spacy-llm 945
18 nlu 809
19 DataAug4NLP 809
20 MeTA 684
21 Nuclia DB 571
22 BERTweet 557
23 small-text 520

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com