dcai-lab
dcai-course
dcai-lab | dcai-course | |
---|---|---|
10 | 3 | |
400 | 87 | |
3.0% | - | |
5.4 | 7.6 | |
5 months ago | about 1 month ago | |
Jupyter Notebook | CSS | |
GNU Affero General Public License v3.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dcai-lab
-
Resources to learn practical/industry-focused ML (preferably using TensorFlow)?
Data-Centric AI honestly if you've been working on ML pipelines this might be familiar to you
-
Andrew NG, github courses
Another great resource inspired by the Andrew Ng data-centric AI movement is the Introduction to Data-Centric AI course taught this past semester at MIT by PhDs.
-
Good Beginner Courses for ML?
Data-centric AI course. Brand new, taught the 1st time a few months ago by MIT PhD grads. This covers how to ensure good data quality for your models. More data science havy.
-
[P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions.
Thanks for the kind words! Make sure to check out the current open MIT course if you are just starting out: https://dcai.csail.mit.edu/
-
The Missing Semester of Your CS Education
Introduction to Data-Centric AI https://dcai.csail.mit.edu
- Introduction to Data-Centric AI
-
MIT Introduction to Data-Centric AI
Course homepage | Lecture videos on YouTube | Lab Assignments
dcai-course
-
MIT Introduction to Data-Centric AI
Announcing the first-ever course on Data-Centric AI. Learn how to train better ML models by improving the data.
Hi HN! I’m back with another “what they don’t teach you in school” style course that I’d love to share with the community. (A couple years ago, I was part of the team that taught Missing Semester, an IAP class that taught programmer tools that weren’t covered in any CS courses at MIT: https://news.ycombinator.com/item?id=22226380.)
MIT, like most universities, has many courses on machine learning (6.036, 6.867, and many others). Those classes teach techniques to produce effective models for a given dataset, and the classes focus heavily on the mathematical details of models rather than practical applications. However, in real-world applications of ML, the dataset is not fixed, and focusing on improving the data often gives better results than improving the model. We’ve personally seen this time and time again in our applied ML work as well as our research.
Data-Centric AI (DCAI) is an emerging science that studies techniques to improve datasets in a systematic/algorithmic way — given that this topic wasn’t covered in the standard curriculum, we (a group of PhD candidates and grads) thought that we should put together a new class! We taught this intensive 2-week course in January over MIT’s IAP term, and we’ve just published all the course material, including lecture videos, lecture notes, hands-on lab assignments, and lab solutions, in hopes that people outside the MIT community would find these resources useful.
We’d be happy to answer any questions related to the class or DCAI in general, and we’d love to hear any feedback on how we can improve the course material. Introduction to Data-Centric AI is open-source opencourseware, so feel free to make improvements directly: https://github.com/dcai-course/dcai-course.
What are some alternatives?
snorkel - A system for quickly generating training data with weak supervision
Gpt4All-webui - A web user interface for GPT4All
cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
refinery - The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
BotLibre - An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.
llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
deodel - A mixed attributes predictive algorithm implemented in Python.
chordviz - A convolutional neural network trained using PyTorch to predict the next chord (as tablature) on a guitar based on image data. Includes labeling software for the image data as well as an iOS app for hosting and running the model.
UBB-INFO - All projects from university.
nodevectors - Fastest network node embeddings in the west
dgl - Python package built to ease deep learning on graph, on top of existing DL frameworks.