Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge. Learn more →
Top 23 Python NLP Projects
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.Project mention: A look at Apple’s new Transformer-powered predictive text model | news.ycombinator.com | 2023-09-16
To summarize how they work: you keep some number of previously generated tokens, and once you get logits that you want to sample a new token from, you find the logits for existing tokens and multiply them by a penalty, thus lowering the probability of the corresponding tokens.
TensorFlow code and pre-trained models for BERTProject mention: Ernie, China's ChatGPT, Cracks Under Pressure | news.ycombinator.com | 2023-09-07
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: Retrieval Augmented Generation (RAG): How To Get AI Models Learn Your Data & Give You Answers | dev.to | 2023-09-18
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistantsProject mention: RasaGPT: First headless LLM chatbot built on top of Rasa, Langchain and FastAPI | news.ycombinator.com | 2023-05-08
It itself is not a GPT. It is a a framework of a framework project built on top of Rasa (https://github.com/RasaHQ/rasa) and Langchain which by default uses gpt3.5-turbo (change it in the .env file) or any foundation model you wish.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and ModalitiesProject mention: Microsoft Publishes LongNet: Scaling Transformers to 1,000,000,000 Tokens | /r/ArtificialInteligence | 2023-07-08
The repository is available here.
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)Project mention: Chinese-Alpaca-Plus-13B-GPTQ | /r/LocalLLaMA | 2023-05-30
I'd like to share with you today the Chinese-Alpaca-Plus-13B-GPTQ model, which is the GPTQ format quantised 4bit models of Yiming Cui's Chinese-LLaMA-Alpaca 13B for GPU reference.
Topic Modelling for HumansProject mention: Aggregating news from different sources | /r/learnprogramming | 2023-07-08
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.Project mention: Ask HN: How to get back into AI? | news.ycombinator.com | 2022-12-10
For Python, here's a nice compilation: https://github.com/ml-tooling/best-of-ml-python/blob/main/RE...
A very simple framework for state-of-the-art Natural Language Processing (NLP)
NLTK SourceProject mention: Best Portfolio Projects for Data Science | dev.to | 2023-09-19
Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)Project mention: Where are all the multi-modal models? | /r/singularity | 2023-02-10
China: All of the ERNIE 260B cross-modal stuff.
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.Project mention: Llama2 and Haystack on Colab | news.ycombinator.com | 2023-07-21
I recently conducted some experiments with Llama2 and Haystack (https://github.com/deepset-ai/haystack), the NLP/LLM framework.
The notebook can be helpful for those trying to load Llama2 on Colab.
1) Installed Transformers from the main branch (and other libraries)
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.Project mention: Chatgpt 到底是不是开源的？ | /r/China_irl | 2023-03-25
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
NeMo: a toolkit for conversational AIProject mention: [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN) | /r/MachineLearning | 2023-07-06
I don't test WaveRNN but from the ones that I know the best that is open source is FastPitch. And it's easy to use, here is the tutorial for voice cloning.
A PyTorch implementation of the Transformer model in "Attention is All You Need".Project mention: Question: LLMs | /r/learnmachinelearning | 2023-07-06
I did implement an "LLM" proof of concept from scratch in a course for my masters, pretty much doing a small implementation of a transformer from the Attention is all you Need paper (plus other resources). It was useless, but was a great experience to understand how it works. There are a few implementation like this out there, like this one: https://github.com/jadore801120/attention-is-all-you-need-pytorch (first google result). I think it is a fun exercise (the amount of fun depends on how much of a masochist you are :) ).
Chinese version of GPT2 training code, using BERT tokenizer.
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesProject mention: Down and Out in the Magic Kingdom | news.ycombinator.com | 2023-07-23
Mycroft Core, the Mycroft Artificial Intelligence platform.Project mention: Ask HN: Is there any open source/open hardware Echo Dot alike? | news.ycombinator.com | 2023-08-11
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.Project mention: [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) | /r/MachineLearning | 2023-03-14
Found relevant code at https://github.com/PaddlePaddle/ERNIE + all code implementations here
Google AI 2018 BERT pytorch implementation
Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.
Python NLP related posts
best way to serve llama V2 (llama.cpp VS triton VS HF text generation inference)
3 projects | /r/LocalLLaMA | 25 Sep 2023
A look at Apple’s new Transformer-powered predictive text model
4 projects | news.ycombinator.com | 16 Sep 2023
Deploying Llama2 with vLLM vs TGI. Need advice
3 projects | /r/LocalLLaMA | 14 Sep 2023
[P][R] Finetune LLMs via the Finetuning Hub
1 project | /r/MachineLearning | 9 Sep 2023
Show HN: Leverage Falcon 7B blog post
1 project | news.ycombinator.com | 9 Sep 2023
Show HN: New AI Dataset Based on LibGen and Sci-Hub
2 projects | news.ycombinator.com | 8 Sep 2023
Ernie, China's ChatGPT, Cracks Under Pressure
1 project | news.ycombinator.com | 7 Sep 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Sep 2023
What are some of the best open-source NLP projects in Python? This list will help you: