-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I think many LMs nowadays use unicode tokenizers, that are not tied to specific languages. E.g. sentencepiece is the most popular one: https://github.com/google/sentencepiece
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
-
sentencepiece
-
[P] TokenMonster Ungreedy ~ 35% faster inference and 35% increased context-length for large language models (compared to tiktoken). Benchmarks included.
-
LLaMA tokenizer: is a JavaScript implementation available anywhere?
-
[P] New tokenization method improves LLM performance & context-length by 25%+
-
Code runs without definition of function (automatically calls a different function instead)