🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
I have been using the pretrained tokenizers available from the huggingface/transformers library. And they have been working well for my use case.
Unsupervised text tokenizer for Neural Network-based text generation.
For papers, take a look at references here https://github.com/google/sentencepiece
OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
What LSTM Baseline To Use?
1 project | reddit.com/r/LanguageTechnology | 17 Jan 2022
[D] SentencePiece, WordPiece, BPE... Which tokenizer is the best one?
3 projects | reddit.com/r/MachineLearning | 27 Dec 2021
[P] OSLO: Open Source framework for Large-scale transformer Optimization
2 projects | reddit.com/r/MachineLearning | 20 Dec 2021
NLP - How to get correlated words?
1 project | reddit.com/r/tensorflow | 16 Dec 2021
[P] CodeParrot 🦜: Train your own CoPilot from scratch!
2 projects | reddit.com/r/MachineLearning | 10 Dec 2021