NLP

Top 23 NLP Open-Source Projects

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  • Project mention: Maxtext: A simple, performant and scalable Jax LLM | news.ycombinator.com | 2024-04-23

    Is t5x an encoder/decoder architecture?

    Some more general options.

    The Flax ecosystem

    https://github.com/google/flax?tab=readme-ov-file

    or dm-haiku

    https://github.com/google-deepmind/dm-haiku

    were some of the best developed communities in the Jax AI field

    Perhaps the “trax” repo? https://github.com/google/trax

    Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...

    Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py

  • bert

    TensorFlow code and pre-trained models for BERT

  • Project mention: OpenAI – Application for US trademark "GPT" has failed | news.ycombinator.com | 2024-02-15

    task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre-trained parameters.

    [0] https://arxiv.org/abs/1810.04805

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • HanLP

    中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

  • AI-For-Beginners

    12 Weeks, 24 Lessons, AI for All!

  • Project mention: FREE AI Course By Microsoft: ZERO to HERO! 🔥 | dev.to | 2024-03-18

    🔗 https://github.com/microsoft/AI-For-Beginners 🔗 https://microsoft.github.io/AI-For-Beginners/

  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

  • Project mention: Step by step guide to create customized chatbot by using spaCy (Python NLP library) | dev.to | 2024-03-23

    Hi Community, In this article, I will demonstrate below steps to create your own chatbot by using spaCy (spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython):

  • datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

  • Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑‍💻 🥇 | dev.to | 2023-10-19
  • unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

  • Project mention: The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf] | news.ycombinator.com | 2024-03-21
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

  • Project mention: 🔥🚀 Top 10 Open-Source Must-Have Tools for Crafting Your Own Chatbot 🤖💬 | dev.to | 2023-11-06

    Support Rasa on GitHub ⭐

  • Chinese-LLaMA-Alpaca

    中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

  • Project mention: Chinese-Alpaca-Plus-13B-GPTQ | /r/LocalLLaMA | 2023-05-30

    I'd like to share with you today the Chinese-Alpaca-Plus-13B-GPTQ model, which is the GPTQ format quantised 4bit models of Yiming Cui's Chinese-LLaMA-Alpaca 13B for GPU reference.

  • 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

    500 AI Machine learning Deep learning Computer vision NLP Projects with code

  • awesome-nlp

    :book: A curated list of resources dedicated to Natural Language Processing (NLP)

  • best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

  • gensim

    Topic Modelling for Humans

  • Project mention: Aggregating news from different sources | /r/learnprogramming | 2023-07-08
  • Awesome-pytorch-list

    A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.

  • ML-YouTube-Courses

    📺 Discover the latest machine learning / AI courses on YouTube.

  • Project mention: A Curated List of Free ML/ DL YouTube Courses | news.ycombinator.com | 2024-01-28
  • nlp-tutorial

    Natural Language Processing Tutorial for Deep Learning Researchers

  • haystack

    :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

  • Project mention: Release Radar • March 2024 Edition | dev.to | 2024-04-07

    View on GitHub

  • flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

  • NLTK

    NLTK Source

  • Project mention: Building a local AI smart Home Assistant | news.ycombinator.com | 2024-01-13

    alternatively, could we not simply split by common characters such as newlines and periods, to split it within sentences? it would be fragile with special handling required for numbers with decimal points and probably various other edge cases, though.

    there are also Python libraries meant for natural language parsing[0] that could do that task for us. I even see examples on stack overflow[1] that simply split text into sentences.

    [0]: https://www.nltk.org/

  • DeepLearningExamples

    State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

  • PaddleHub

    Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)

  • botpress

    The open-source hub to build & deploy GPT/LLM Agents ⚡️

  • Project mention: Botpress | news.ycombinator.com | 2023-10-27
  • FinGPT

    FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

  • Project mention: GPT-4, without specialized training, beat a GPT-3.5 class model that cost $10B | news.ycombinator.com | 2024-03-24

    There is also the open source FinGPT, that is claimed to beat GPT4 in some benchmarks at a fine tuning cost of $17.25.

    https://github.com/AI4Finance-Foundation/FinGPT

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

NLP related posts

Index

What are some of the best open-source NLP projects? This list will help you:

Project Stars
1 transformers 125,021
2 bert 36,992
3 HanLP 32,304
4 AI-For-Beginners 30,927
5 spaCy 28,704
6 datasets 18,376
7 unilm 18,319
8 rasa 17,951
9 Chinese-LLaMA-Alpaca 17,251
10 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code 17,099
11 awesome-nlp 15,977
12 best-of-ml-python 15,302
13 gensim 15,236
14 Awesome-pytorch-list 14,932
15 ML-YouTube-Courses 14,310
16 nlp-tutorial 13,691
17 haystack 13,633
18 flair 13,558
19 NLTK 13,015
20 DeepLearningExamples 12,607
21 PaddleHub 12,501
22 botpress 11,954
23 FinGPT 11,419

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com