Kaggle

Top 23 Kaggle Open-Source Projects

  • data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • LightGBM

    A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

    Project mention: SIRUS.jl: Interpretable Machine Learning via Rule Extraction | /r/Julia | 2023-06-29

    SIRUS.jl is a pure Julia implementation of the SIRUS algorithm by Bénard et al. (2021). The algorithm is a rule-based machine learning model meaning that it is fully interpretable. The algorithm does this by firstly fitting a random forests and then converting this forest to rules. Furthermore, the algorithm is stable and achieves a predictive performance that is comparable to LightGBM, a state-of-the-art gradient boosting model created by Microsoft. Interpretability, stability, and predictive performance are described in more detail below.

  • Pytorch-UNet

    PyTorch implementation of the U-Net for image semantic segmentation with high quality images

  • catboost

    A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

    Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05
  • kaggle-solutions

    🏅 Collection of Kaggle Solutions and Ideas 🏅

  • Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials

    A comprehensive list of Deep Learning / Artificial Intelligence and Machine Learning tutorials - rapidly expanding into areas of AI/Deep Learning / Machine Vision / NLP and industry specific areas such as Climate / Energy, Automotives, Retail, Pharma, Medicine, Healthcare, Policy, Ethics and more.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • pytorch-toolbelt

    PyTorch extensions for fast R&D prototyping and Kaggle farming

  • MLBox

    MLBox is a powerful Automated Machine Learning python library.

  • fastdup

    fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.

    Project mention: Visualize your dataset using DINOv2 embedding | news.ycombinator.com | 2023-05-02

    Visualizing your dataset (especially large ones) in a low-dimensional embedding space can tell you a lot about the patterns and clusters in your dataset.

    We recently release a notebook showing how you can visualize your dataset using DINOv2 models by running it on your CPU.

    Yes! No GPUs needed.

    We used it to find clusters of similar images, duplicates, and outliers in a subset of the LAION dataset

    Try it on your own dataset:

    Colab notebook - https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/dinov2_notebook.ipynb

    GitHub repo - https://github.com/visual-layer/fastdup

  • dfdc_deepfake_challenge

    A prize winning solution for DFDC challenge

    Project mention: How are deepfakes different from beauty face filters? | /r/computervision | 2023-05-27

    For example I used a scanner using this model https://github.com/selimsef/dfdc_deepfake_challenge/blob/master/README.md

  • upgini

    Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs

    Project mention: The fastest way to improve quality of ML model on tabular data | /r/learnmachinelearning | 2023-06-18

    web: https://upgini.com

  • benchmarks

    Comparison tools (by catboost)

  • crypto

    Cryptocurrency Historical Market Data R Package (by JesseVent)

  • xgboost_ray

    Distributed XGBoost on Ray

  • Hello-Kaggle

    For someone who is new at Kaggle

  • deepfake-detection

    DeepFake Detection: Detect the video is fake or not using InceptionResNetV2. (by xinyooo)

  • kaggle-courses

    Courses on Kaggle

  • Paper-Recommendation-System

    Web interface to search ArXiv papers using NLP Sentence-Transformers, Faiss and Streamlit

  • apple-appstore-apps

    Apple AppStore Apps dataset. (1.2 million App Data) and 21 attributes

  • kaggle-look-alike

    Kaggle Data Explorer look-alike.

  • YouTubers-saying-things

    Dataset containing popular YouTuber channel's video subtitles

  • YouTube-thumbnail-dataset

    Most versatile dataset of YouTube thumbnails.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-05.

Kaggle related posts

Index

What are some of the best open-source Kaggle projects? This list will help you:

Project Stars
1 data-science-ipython-notebooks 26,438
2 d2l-en 21,564
3 LightGBM 16,025
4 Pytorch-UNet 8,265
5 catboost 7,716
6 kaggle-solutions 3,737
7 Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials 3,634
8 pytorch-toolbelt 1,474
9 MLBox 1,472
10 fastdup 1,395
11 dfdc_deepfake_challenge 670
12 upgini 288
13 benchmarks 163
14 crypto 141
15 xgboost_ray 131
16 Hello-Kaggle 78
17 deepfake-detection 75
18 kaggle-courses 46
19 Paper-Recommendation-System 19
20 apple-appstore-apps 12
21 kaggle-look-alike 9
22 YouTubers-saying-things 7
23 YouTube-thumbnail-dataset 4
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com