Python text-classification

Open-source Python projects categorized as text-classification

Top 23 Python text-classification Projects

  • HanLP

    中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

  • Project mention: Step by step guide to create customized chatbot by using spaCy (Python NLP library) | dev.to | 2024-03-23

    Hi Community, In this article, I will demonstrate below steps to create your own chatbot by using spaCy (spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython):

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Resume-Matcher

    Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.

  • Project mention: Hacktoberfest 2023: The Complete Guide | dev.to | 2023-09-22

    GitHub: https://github.com/srbhr/Resume-Matcher Website: https://www.resumematcher.fyi/ Discord: Resume Matcher's Discord Tech Stack: Python, NextJS, FastAPI, TypeScript

  • text-classification-cnn-rnn

    CNN-RNN中文文本分类,基于TensorFlow

  • simpletransformers

    Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI

  • catalyst

    Accelerated deep learning R&D (by catalyst-team)

  • Project mention: Instance segmentation of small objects in grainy drone imagery | /r/computervision | 2023-12-09
  • instructor-embedding

    [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

  • Project mention: My experience on starting with fine tuning LLMs with custom data | /r/LocalLLaMA | 2023-07-10

    If you li embeddings and vector DB, you should look into this: https://github.com/HKUNLP/instructor-embedding

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • eda_nlp

    Data augmentation for NLP, presented at EMNLP 2019

  • mteb

    MTEB: Massive Text Embedding Benchmark

  • Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

    RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:

    - Chunking can interfer with context boundaries

    - Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)

    - Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)

    - RAG will miserably fail with requests like "summarize the whole document"

    - to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb

    1 https://github.com/underlines/awesome-marketing-datascience/...

  • refinery

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

  • text_gcn

    Graph Convolutional Networks for Text Classification. AAAI 2019

  • chatgpt-comparison-detection

    Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥

  • obsei

    Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .

  • spacy-llm

    🦙 Integrating LLMs into structured NLP pipelines

  • Project mention: Integrating LLMs into structured NLP pipelines | news.ycombinator.com | 2023-09-10
  • nlu

    1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.

  • BERTweet

    BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

  • small-text

    Active Learning for Text Classification in Python

  • Project mention: Small-Text: Looking for Contributors (Active Learning, Text Classification, NLP) | /r/LanguageTechnology | 2023-05-21
  • happy-transformer

    Happy Transformer makes it easy to fine-tune and perform inference with NLP Transformer models.

  • TextFooler

    A Model for Natural Language Attack on Text Classification and Inference

  • PIXIU

    This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and evaluation benchmarks to holistically assess financial LLMs. Our goal is to continually push forward the open-source development of financial artificial intelligence (AI).

  • Project mention: PIXIU: NEW Data - star count:172.0 | /r/algoprojects | 2023-08-15
  • hover

    :speedboat: Label data at scale. Fun and precision included. (by phurwicz)

  • kiri

    Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. (by kiri-ai)

  • VDCNN

    Implementation of Very Deep Convolutional Neural Network for Text Classification

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python text-classification related posts

  • Which UI library for react or next are you using in your project?

    1 project | dev.to | 6 Aug 2023
  • Resume Matcher: Free Open Source Python Based ATS with ML

    1 project | /r/opensource | 30 Jul 2023
  • My personal project Resume Matcher is trending on GitHub with 500+ stars. Thank you 🙏 for this!

    1 project | /r/developersIndia | 29 Jul 2023
  • Resume Matcher – Free Open Source ATS Tool to Match Resumes to Job Descriptions

    1 project | news.ycombinator.com | 28 Jul 2023
  • Show HN: I made an open-source Resume Matcher. A Python based ATS with ML

    1 project | news.ycombinator.com | 24 Jul 2023
  • I've made a customisable SMS personal assistant which has infinite and persistent semantic memory.

    2 projects | /r/LocalLLaMA | 27 May 2023
  • Small-Text: Looking for Contributors (Active Learning, Text Classification, NLP)

    1 project | /r/LanguageTechnology | 21 May 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 4 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source text-classification projects in Python? This list will help you:

Project Stars
1 HanLP 32,388
2 spaCy 28,751
3 Resume-Matcher 4,534
4 text-classification-cnn-rnn 4,078
5 simpletransformers 3,984
6 catalyst 3,227
7 instructor-embedding 1,703
8 eda_nlp 1,536
9 mteb 1,395
10 refinery 1,365
11 text_gcn 1,326
12 chatgpt-comparison-detection 1,191
13 obsei 1,079
14 spacy-llm 945
15 nlu 809
16 BERTweet 558
17 small-text 520
18 happy-transformer 500
19 TextFooler 465
20 PIXIU 399
21 hover 314
22 kiri 240
23 VDCNN 171

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com