pytext vs Jieba

pytext

A natural language modeling framework based on PyTorch (by facebookresearch)

Natural Language Processing General

DISCONTINUED

Suggest alternative

Edit details

Jieba

结巴中文分词 (by fxsjy)

Natural Language Processing

Source Code

Suggest alternative

Edit details

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

pytext		Jieba
	Project
-	Mentions	6
6,354	Stars	32,375
-	Growth	-
7.8	Activity	0.0
over 1 year ago	Latest Commit	about 1 month ago
Python	Language	Python
BSD 3-clause "New" or "Revised" License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

pytext

Posts with mentions or reviews of pytext. We have used some of these posts to build our list of alternatives and similar projects.

We haven't tracked posts mentioning pytext yet.
Tracking mentions began in Dec 2020.

Jieba

Posts with mentions or reviews of Jieba. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-14.

[OC] How Many Chinese Characters You Need to Learn to Read Chinese!
2 projects | /r/dataisbeautiful | 14 Jun 2023

jieba to do Chinese word segmentation
Sentence parser for Mandarin?
9 projects | /r/ChineseLanguage | 14 Sep 2022

Jieba: Chinese text segmenter
How many in here use google sheets to keep track on their Chinese vocabulary? (2 pics) - More info in the comments
1 project | /r/ChineseLanguage | 1 Sep 2022

If you know some python you can use a popular library called Jieba 结巴 to automatically get pinyin for every word. (Jieba has actually been ported to many languages) You can also use it to break a chinese text into a set of unique words for easy addition to your spreadsheet.
Where can I download a database of Chinese word classifications (noun, verb, etc)
1 project | /r/ChineseLanguage | 28 Mar 2022
Learn vocabulary effortlessly while browsing the web [FR,EN,DE,PT,ES]
1 project | /r/languagelearning | 23 Mar 2021

Since you're saying the main issue is segmentation, there are libraries to help out with that issue. jieba is fantastic if you have a Python backend, nodejieba (50k downloads/week) if it's more JS-side.
I'm looking for a specific vocab list
2 projects | /r/ChineseLanguage | 22 Mar 2021

https://github.com/fxsjy/jieba/ (has some good word frequency data)

What are some alternatives?

When comparing pytext and Jieba you can also consider the following projects:

stanfordnlp - [Deprecated] This library has been renamed to "Stanza". Latest development at: https://github.com/stanfordnlp/stanza

spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python

Stanza - Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages

SnowNLP - Python library for processing Chinese text

PyTorch-NLP - Basic Utilities for PyTorch Natural Language Processing (NLP)

NLTK - NLTK Source

pkuseg-python - pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

Pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.