What’s your approach to highly imbalanced data sets?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ydata-synthetic

60 1,286 7.6 Jupyter Notebook

Synthetic data generators for tabular and time-series data

There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!

general_class_balancer

1 3 10.0 Python

Data matching algorithm for categorical and continuous variables

Multivariate data matching. I wrote a function to do this in grad school: https://github.com/mleming/general_class_balancer

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
deodel

13 5 6.3 Python

A mixed attributes predictive algorithm implemented in Python.

Just to mention that there is also a new algorithm that is immune to the imbalance of data. An implementation in python is available at: - https://github.com/c4pub/deodel

imbalanced-learn

1 6,697 7.4 Python

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project