alpa
datasets
alpa | datasets | |
---|---|---|
4 | 15 | |
2,986 | 18,443 | |
0.8% | 1.0% | |
5.1 | 9.5 | |
5 months ago | 2 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
alpa
-
How to Train Large Models on Many GPUs?
- Alpa does training and serving with 175B parameter models https://github.com/alpa-projects/alpa
-
how much does it actually cost in terms of computer power for open AI to respond
alpa.ai states "You will need at least 350GB GPU memory on your entire cluster to serve the OPT-175B model. For example, you can use 4 x AWS p3.16xlarge instances, which provide 4 (instance) x 8 (GPU/instance) x 16 (GB/GPU) = 512 GB memory."
- Alpa: Auto-parallelizing large model training and inference (by UC Berkeley)
-
Alpa: Automated Model-Parallel Deep Learning
GitHub code: https://github.com/alpa-projects/alpa
datasets
- 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑💻 🥇
- Mastering ROUGE Matrix: Your Guide to Large Language Model Evaluation for Summarization with Examples
-
How to Train Large Models on Many GPUs?
https://github.com/huggingface/datasets
https://github.com/huggingface/transformers
-
[D] Can we use Ray for distributed training on vertex ai ? Can someone provide me examples for the same ? Also which dataframe libraries you guys used for training machine learning models on huge datasets (100 gb+) (because pandas can't handle huge data).
https://huggingface.co/docs/datasets backed with an Arrow file or buffer
- Need help with a data science project
-
Is there a text evaluation metric that does not need reference text?
I'm looking for an automatic evaluation metric that can score the first text higher (since it's more grammatically correct/better for other reasons). All the metrics for NLG I found require some reference text to match the generated text with, which I don't have.
-
FauxPilot – an open-source GitHub Copilot server
And then pass that my_code.json as the dataset name.
[1] https://github.com/huggingface/datasets
-
Hugging Face Introduces ‘Datasets’: A Lightweight Community Library For Natural Language Processing (NLP)
Code for https://arxiv.org/abs/2109.02846 found: https://github.com/huggingface/datasets
Quick Read | Paper | Github
- Datasets: A Community Library for Natural Language Processing
What are some alternatives?
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
sentence-transformers - Multilingual Sentence & Image Embeddings with BERT
hivemind - Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
datumaro - Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.
determined - Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
cypress-realworld-app - A payment application to demonstrate real-world usage of Cypress testing methods, patterns, and workflows.
FedML - FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, FEDML Nexus AI (https://fedml.ai) is your generative AI platform at scale.
edex-ui - A cross-platform, customizable science fiction terminal emulator with advanced monitoring & touchscreen support.
awesome-tensor-compilers - A list of awesome compiler projects and papers for tensor computation and deep learning.
first-contributions - 🚀✨ Help beginners to contribute to open source projects
adaptdl - Resource-adaptive cluster scheduler for deep learning training.
frankmocap - A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator