dataqa
imodels
Our great sponsors
dataqa | imodels | |
---|---|---|
7 | 7 | |
245 | 1,288 | |
- | - | |
6.2 | 8.6 | |
almost 2 years ago | 17 days ago | |
JavaScript | Jupyter Notebook | |
GNU General Public License v3.0 only | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dataqa
-
[D] Looking for open source projects to contribute
Hey, I am the creator and (only contributor today) of open-source https://github.com/dataqa/dataqa, a Python library to explore and annotate documents. It uses weak supervision, is based on spacy, and has a lot of opportunities to add more deep learning and ML functionality. I can guide you through it :-). This would be a great opportunity to be first and lead contributor of an open-source library (outside the creator).
-
[P]: Extract and label data from Wikipedia with DataQA
I recently added a new feature to DataQA (https://github.com/dataqa/dataqa) to be able to extract entities from Wikipedia. All you need to do is upload a file with Wikipedia urls:
-
Show HN: DataQA – now possible to link entities to large ontologies
The open-source project is here: https://github.com/dataqa/dataqa. I have just released a feature which I have been working on for a while to solve a problem which I've seen a lot in industry: how to map entities found in text to large knowledge base ontologies.
-
[P] Using rules to speed up labelling by 2x
The tool I developed and used for this problem: https://github.com/dataqa/dataqa
-
The First Rule of Machine Learning: Start Without Machine Learning
I have seen first hand at small and large companies how problems have been tackled with ML without trying a simple rule or heuristic first. And then, further down the line, the system has been compared to a few business rules put together, to find that the difference in performance did not explain the deployment of an ML system in the first place.
It's true that if your rules grow in complexity, this might make it harder to maintain, but the good thing about rules is that they tend to be fully explainable, and they can be encoded by domain experts. So the maintenance of such a system does not need to be done exclusively by an ML engineer anymore.
Here is where I insert my plug: I have developed a tool to create rules to solve NLP problems: https://github.com/dataqa/dataqa
- Show HN: Rules-based labelling tool for NLP
-
DataQA: the new Python app to do rules-based text annotation
After working in ML for more than a decade, I became frustrated over time with the lack of tools to create baselines using simple rules and heuristics. It is well known that most business problems out there can achieve decent baselines using only heuristics. This is why I have developed DataQA (https://github.com/dataqa/dataqa), which uses NLP rules to do common NLP annotation tasks, such as multiclass classification or named entity recognition.
imodels
-
[D] Have researchers given up on traditional machine learning methods?
- all domains requiring high interpretability absolutely ignore deep learning at all, and put all their research into traditional ML; see e.g. counterfactual examples, important interpretability methods in finance, or rule-based learning, important in medical or law applications
-
What would be my best approach given the data I have?
Next, this variable will be your target and you can use various supervised learning models to answer your question. Since interpretation is key, you can use something from here: https://github.com/csinva/imodels or do some black box models and use shab to understand which features contributed most.
-
Random Forest Estimation Question
Option 2) fit a model from https://github.com/csinva/imodels on the predicted values of the RF
-
UC Berkeley Researchers Introduce ‘imodels: A Python Package For Fitting Interpretable Machine Learning Models
Despite recent breakthroughs in the formulation and fitting of interpretable models, implementations are frequently challenging to locate, utilize, and compare. imodels solves this void by offering a single interface and implementation for a wide range of state-of-the-art interpretable modeling techniques, especially rule-based methods. imodels is basically a Python tool for predictive modeling that is simple, transparent, and accurate. It gives users a straightforward way to fit and use state-of-the-art interpretable models, all of which are compatible with scikit-learn (Pedregosa et al., 2011). These models can frequently replace black-box models while boosting interpretability and computing efficiency without compromising forecast accuracy. Continue Reading
-
[D] Looking for open source projects to contribute
Our package imodels is expanding our sklearn-compatible set of interpretable models and always looking for new contributors!
- imodels: a package extending sklearn with state-of-the-art models for interpretable data science (e.g. Bayesian Rule Lists, RuleFit)
- imodels: a package extending sklearn with state-of-the-art interpretable models (e.g. Bayesian Rule Lists, RuleFit) from BAIR [P]
What are some alternatives?
diffgram - The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
pycaret - An open-source, low-code machine learning library in Python
argilla - Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
interpret - Fit interpretable models. Explain blackbox machine learning.
general
shap - A game theoretic approach to explain the output of any machine learning model.
docarray - Represent, send, store and search multimodal data
linear-tree - A python library to build Model Trees with Linear Models at the leaves.
poutyne - A simplified framework and utilities for PyTorch
vosk-api - Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Mathematics-for-Machine-Learning-and-Data-Science-Specialization-Coursera - Mathematics for Machine Learning and Data Science Specialization - Coursera - deeplearning.ai - solutions and notes