machine-learning-articles
returnn-experiments
machine-learning-articles | returnn-experiments | |
---|---|---|
5 | 2 | |
3,143 | 152 | |
- | 1.3% | |
4.1 | 6.4 | |
2 months ago | 7 months ago | |
Python | ||
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
machine-learning-articles
- CNN binary classification validation accuracy reached %77, yet performing poorly on test set?
- Guide to creating a VAE?
-
Minimal PyTorch re-implementation of GPT
For anyone else who was new to the phrase "isotropic model":
https://github.com/christianversloot/machine-learning-articl...
-
CNN Multiclass classification - What happens if in the last layer you use more units than classes?
Of course! I read the documentation and also usage examples and it seems like it should be used with softmax, receiving the class prediction
-
How Vanishing/Exploding Gradient is solved by doing proper Weight Initialization?
Xavier initialization (tanh): https://cs230.stanford.edu/section/4/ He initialization (ReLU): https://github.com/christianversloot/machine-learning-articles/blob/main/he-xavier-initialization-activation-functions-choose-wisely.md
returnn-experiments
-
Show HN: WhisperFusion – Ultra-low latency conversations with an AI chatbot
The code is all released already. You find it here: https://github.com/rwth-i6/returnn-experiments/tree/master/2...
This is TensorFlow-based. But I also have another PyTorch-based implementation already, also public (inside our other repo, i6_experiments). It's not so easy currently to set this up, but I'm working on a simpler pipeline in PyTorch.
We don't have the models online yet, but we can upload them later. But I'm not sure how useful they are outside of research, as they are specifically for those research tasks (Librispeech, Tedlium), and probably don't perform too well on other data.
-
Minimal PyTorch re-implementation of GPT
This works for an architecture which has been well tuned and studied before, like LSTM or Transformer.
Once you do research on the model, testing out things, it often tends to become such kwarg monster in many frameworks.
Having everything (relevant) in one file (even in the config file itself with hyper params) allows you to copy the file for every experiment and modify it inplace. This avoids the kwargs mess. But then the config files are very complex, and can become messy in other ways (esp for research projects). Example: https://github.com/rwth-i6/returnn-experiments/blob/master/2...
Such approach makes it much more flexible and does not mess with the baseline code. As you say, it's more like an evolutionary DNA-like approach, where you then tend to do crossovers with other evolved good-performing configs, etc.
What are some alternatives?
iris - Transformers are Sample-Efficient World Models. ICLR 2023, notable top 5%.
minGPT - A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
embedding-encoder - Scikit-Learn compatible transformer that turns categorical variables into dense entity embeddings.
WhisperFusion - WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
awesome-adaptive-computation - A curated reading list of research in Adaptive Computation, Dynamic Compute & Mixture of Experts (MoE).