I'm looking for a reliable Japanese word segmentation algorithm

This page summarizes the projects mentioned and recommended in the original post on /r/LearnJapanese

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • ichiran

    Linguistic tools for texts in Japanese language

  • Check out ichi.moe. The word detection and splitting is quite good, and the backend is available on Github as ichiran. Unfortunately for most sane developers, the backend is written in Lisp.

  • rakutenma-python

    Rakuten MA (Python version)

  • There are some projects on github that seem promising (ex https://github.com/ikegami-yukino/rakutenma-python ) but I just have to re-emphasize that even if the computer is getting 95%+ of the sentences right, a leaner is going to be looking for help with that remaining 5% and the computer will never have it.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • ginza

    A Japanese NLP Library using spaCy as framework based on Universal Dependencies

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts