Jupyter Notebook Big Data

Open-source Jupyter Notebook projects categorized as Big Data | Edit details

Top 3 Jupyter Notebook Big Data Projects

  • H2O

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

    Project mention: [PAID] Looking for Phaser.js game developer | reddit.com/r/INAT | 2021-12-09

    Built and founded various web3 projects for last 2 years such as OpenArt and 8RealmDojo for last 2 years as well as being high performing student in CTU in Prague and SeoulTech. Was offered internships in Amazon and H2O.ai. Created robots assistants using robots from SoftBank.

  • cortx

    CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

    Project mention: The Wiretrustee SATA Pi Board Is a True SATA NAS | news.ycombinator.com | 2021-06-10

    I keep hoping some day the drives will have their own networking built in. Kioxia, a Toshiba spin off, announced a network-attached NVMe-oF drive last September[1], and I seem to recall one of the major drive players had similar intents a bit back... ah yes, the Seagate Kinetic drives with dual 1Gbit[2] & an object storage OS built in to the drive. These days Seagate seems to be pushing a software platform CORTX[3], which I hope some day perhaps has hardware products too (but right now seems to be for classic linux-based network appliances)

    Ideally we start using 5 or 10Gbit ethernet for these cases. We could continue to treat these drives like they are direct attached, even though they are network attached, and either have one computer running RAID, or have Ceph and a bunch of computers running it's distributed system to tap the drives.

    Ideally though, we need new clustered file-systems, where any computer can read the drives. That is, I'd guess, a long way off. Legacy devices (home media players) would need to go through some kind of legacy gateway.

    [1] https://business.kioxia.com/en-us/news/2020/ssd-20200922-2.h...

    [2] https://www.snia.org/sites/default/files/MayurShelty_Seagate...

    [3] https://github.com/Seagate/cortx

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • 100DaysofMLCode

    My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge. Now supported by bright developers adding their learnings :+1:

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-12-09.

Jupyter Notebook Big Data related posts

Index

What are some of the best open-source Big Data projects in Jupyter Notebook? This list will help you:

Project Stars
1 H2O 5,691
2 cortx 537
3 100DaysofMLCode 203
Find remote jobs at our new job board 99remotejobs.com. There are 29 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
OPS - Build and Run Open Source Unikernels
Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
github.com/nanovms