imbalanced-learn
sweetviz
imbalanced-learn | sweetviz | |
---|---|---|
1 | 1 | |
6,703 | 2,841 | |
0.5% | - | |
7.5 | 6.7 | |
about 1 month ago | 5 months ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
imbalanced-learn
-
What’s your approach to highly imbalanced data sets?
There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!
sweetviz
-
Automated Data Profiling and Attribute Clustering using unsupervised ML techniques
Take a look at this package which computes associations between variables and other viz and can infer some types https://github.com/fbdesignpro/sweetviz
What are some alternatives?
ydata-synthetic - Synthetic data generators for tabular and time-series data
dataprep - Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
general_class_balancer - Data matching algorithm for categorical and continuous variables
ydata-profiling - 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
deodel - A mixed attributes predictive algorithm implemented in Python.
Optimus - :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
scikit-learn - scikit-learn: machine learning in Python
popmon - Monitor the stability of a Pandas or Spark dataframe ⚙︎
dtale-desktop - Build a data visualization dashboard with simple snippets of python code
mlgauge - A simple library to benchmark the performance of machine learning methods across different datasets.
lux - 👾 Fast and simple video download library and CLI tool written in Go
pandera - A light-weight, flexible, and expressive statistical data testing library