[Computer Stuff] What's the best way to split a Japanese sentence into "words"?

This page summarizes the projects mentioned and recommended in the original post on /r/linguistics

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • janome

    Japanese morphological analysis engine written in pure Python

  • I did program stuff like that a bit in Korean and Japanese. So, in short, these tools/libraries are called 'Tokenizers'. I.e. search for "Japanese tokenizer", it will also tell you that MeCab is one of them. There is no good/easy way to split words in Japanese with simple algorithms, so these libraries, that are based on statistics or AI, will be your only choice. There is a good example sentence that shows how futile this would be without those libraries: "すもももももももものうち". From here.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • A Developer's Guide to Evaluating LLMs!

    1 project | dev.to | 14 May 2024
  • Nominatim: OpenStreetMap geocoding and reverse geocoding API

    1 project | news.ycombinator.com | 14 May 2024
  • GPT-4o's Memory Breakthrough (Needle in a Needlestack)

    2 projects | news.ycombinator.com | 14 May 2024
  • BLint: Check the security properties, and capabilities in your executables

    1 project | news.ycombinator.com | 14 May 2024
  • Casino Terminal Game

    2 projects | dev.to | 14 May 2024