sentence-splitter
word-piece-tokenizer
sentence-splitter | word-piece-tokenizer | |
---|---|---|
1 | 1 | |
216 | 5 | |
6.0% | - | |
0.0 | 3.8 | |
over 1 year ago | over 1 year ago | |
Python | Python | |
GNU General Public License v3.0 or later | Creative Commons Zero v1.0 Universal |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sentence-splitter
-
Text translation question: Helsinki-NLP skips end sentences. Any good open sourced pre-trained models for large text translation?
There are plenty of sentence splitter available, like https://github.com/mediacloud/sentence-splitter for example, but sometimes you'll have to use language specific ones.
word-piece-tokenizer
-
How we created an in-browser BERT attention visualiser without a server - TrAVis: Transformer Attention Visualiser
Secondly, we implemented the HuggingFace BERT Tokeniser in pure Python, as it can be more easily executed in-browser. Moreover, we have optimised the tokenisation algorithm, which is faster than the original HuggingFace implementation.
What are some alternatives?
Hebrew-Tokenizer - A very simple python tokenizer for Hebrew text.
bart-base-jax - JAX implementation of the bart-base model
bitextor - Bitextor generates translation memories from multilingual websites
TrAVis - TrAVis: Visualise BERT attention in your browser
spacy-experimental - 🧪 Cutting-edge experimental spaCy components and features
d3 - Bring data to life with SVG, Canvas and HTML. :bar_chart::chart_with_upwards_trend::tada:
xontrib-output-search - Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
bert - TensorFlow code and pre-trained models for BERT