Top 13 Jupyter Notebook feature-engineering Projects
-
tpot
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Project mention: Evolve Your Machine Learning: Automate the Process of Model Selection through TPOT. | dev.to | 2024-07-06Resources: TPOT Documentation Genetic Programming
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Project mention: Show HN: I built an open-source data pipeline tool in Go | news.ycombinator.com | 2024-12-17I always thought Hamilton [1] does a good job of giving enough visual hooks that draw you in.
I also noticed this pattern where library authors sometimes do a bit extra in terms of discussing and even promoting their competitors, and it makes me trust them more. A “heres why ours is better and everyone else sucks …” section always comes across as the infomercial character who is having quite a hard time peeling an apple to the point you wonder if this the first time they’ve used hands.
One thing wish for is a tool that’s essentially just Celery that doesn’t require a message broker (and can just use a database), and which is supported on Windows. There’s always a handful of edge cases where we’re pulling data from an old 32-bit system on Windows. And basically every system has some not-quite-ergonomic workaround that’s as much work as if you’d just built it yourself.
It seems like it’s just sending a JSON message over a queue or HTTP API and the worker receives it and runs the task. Maybe it’s way harder than I’m envisioning (but I don’t think so because I’ve already written most of it).
I guess that’s one thing I’m not clear on with Bruin, can I run workers if different physical locations and have them carry out the tasks in the right order? Or is this more of a centralized thing (meaning even if its K8s or Dask or Ray, those are all run in a cluster which happens to be distributed, but they’re all machines sitting in the same subnet, which isn’t the definition of a “distributed task” I’m going for.
[1] https://github.com/DAGWorks-Inc/hamilton
-
SGX-Full-OrderBook-Tick-Data-Trading-Strategy
Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.
-
Deep_Learning_Machine_Learning_Stock
Deep Learning and Machine Learning stocks represent promising opportunities for both long-term and short-term investors and traders.
-
serverless-ml-course
Serverless Machine Learning Course for building AI-enabled Prediction Services from models and features
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
getml-community
Fast, high-quality forecasts on relational and multivariate time-series data powered by new feature learning algorithms and automated ML.
-
-
Spotify_Song_Recommender
This project leverages spotify's api and provided user playlists to create and tune a neural network model that generates song recommendations based off of song data in provided playlists.
-
StravaKudos
:running: :dart: Predicting Strava Kudos on my own activities using the given activity's attributes.
-
lockdowndates
Retrieve the dates of the restrictions imposed by governments in countries around the world during the covid-19 pandemic.
-
CSGO-Pro-Gear-Performance-and-EDA
Modeling Professional (CS:GO) Gamer's Accuracy Performance Based on Gear and Settings, and Exploratory Data Analysis.
Jupyter Notebook feature-engineering discussion
Jupyter Notebook feature-engineering related posts
Index
What are some of the best open-source feature-engineering projects in Jupyter Notebook? This list will help you:
# | Project | Stars |
---|---|---|
1 | tpot | 9,898 |
2 | hamilton | 2,128 |
3 | SGX-Full-OrderBook-Tick-Data-Trading-Strategy | 2,000 |
4 | Deep_Learning_Machine_Learning_Stock | 1,308 |
5 | serverless-ml-course | 588 |
6 | deltapy | 543 |
7 | feature-engineering-tutorials | 285 |
8 | getml-community | 115 |
9 | anovos | 76 |
10 | Spotify_Song_Recommender | 30 |
11 | StravaKudos | 12 |
12 | lockdowndates | 6 |
13 | CSGO-Pro-Gear-Performance-and-EDA | 1 |