sentence-splitter
bitextor
sentence-splitter | bitextor | |
---|---|---|
1 | 2 | |
216 | 282 | |
6.0% | 2.1% | |
0.0 | 5.9 | |
over 1 year ago | 8 months ago | |
Python | Python | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sentence-splitter
-
Text translation question: Helsinki-NLP skips end sentences. Any good open sourced pre-trained models for large text translation?
There are plenty of sentence splitter available, like https://github.com/mediacloud/sentence-splitter for example, but sometimes you'll have to use language specific ones.
bitextor
What are some alternatives?
word-piece-tokenizer - A Lightweight Word Piece Tokenizer
ArchiveBox - ๐ Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Hebrew-Tokenizer - A very simple python tokenizer for Hebrew text.
trankit - Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
spacy-experimental - ๐งช Cutting-edge experimental spaCy components and features
xontrib-output-search - Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
nematus - Open-Source Neural Machine Translation in Tensorflow
OpenNMT-py - Open Source Neural Machine Translation and (Large) Language Models in PyTorch
packaging - Debian, Fedora, Windows, macOS packaging scripts for Apertium, HFST, CG-3, and related techs.
machinetranslate.org - Open information and community for machine translation