The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 Python tabular-data Projects
-
vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
-
[4] "Is it possible to "flatten" structured data (like JSON?)": https://github.com/saulpw/visidata/discussions/1605
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Project mention: pip install remyxai - easiest way to create custom vision models | /r/computervision | 2023-04-25
This seems not very convincing. There are other popular frameworks that provide AutoML with existing datasets (eg https://github.com/autogluon/autogluon)
-
-
Project mention: [Project] AMLTK: A framework for building your own AutoML (AutoSklearn authors) | /r/MachineLearning | 2023-12-09
We took some of the lessons learned while building AutoSklearn and AutoPytorch, the good, the bad and the ugly and made a library that to enable the next generation of open-source AutoML tools, to allow them to be research-able but also efficient and scalable. We have some future plans and on-going work with this and we'd like to gather any feedback the community might have!
-
We've made a lot of data tooling things based on LLMs, and are in the process of rebranding and launching our main product.
1. sketch (in notebook, ai for pandas) https://github.com/approximatelabs/sketch
2. datadm (open source, "chat with data", with support for the open source LLMs (https://github.com/approximatelabs/datadm)
3. Our main product: julyp. https://julyp.com/ (currently under very active rebrand and cleanup) -- but a "chat with data" style app, with a lot of specialized features. I'm also streaming me using it (and sometimes building it) every weekday on twitch to solve misc data problems (https://www.twitch.tv/bluecoconut)
For your next question, about the stack and deploy:
-
Project mention: Exploring Open-Source Alternatives to Landing AI for Robust MLOps | dev.to | 2023-12-13
Numerous tools exist for detecting anomalies in time series data, but Alibi Detect stood out to me, particularly for its capabilities and its compatibility with both TensorFlow and PyTorch backends.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Project mention: LongRoPE: Extending LLM Context Window Beyond 2M Tokens | news.ycombinator.com | 2024-02-22
It's been possible to skip tokenization for a long time, my team and I did it here - https://github.com/capitalone/DataProfiler
For what it's worth, we actually were working with LSTMs with nearly a billion params back in 2016-2017 area. Transformers made it far more effective to train and execute, but ultimately LSTMs are able to achieve similar results, though slow & require more training data.
-
pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
-
Project mention: Ctgan: Generating synthetic data in Python using GANs | news.ycombinator.com | 2024-02-05
-
Transformers4Rec
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
-
-
tab-transformer-pytorch
Implementation of TabTransformer, attention network for tabular data, in Pytorch
-
Multimodal-Toolkit
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
-
-
-
saint
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training
-
synthcity
A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
-
TabFormer
Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)
Project mention: Time-based splitting performing significantly worse than random splitting | /r/learnmachinelearning | 2023-05-20Hi, I am currently working on a basic binary classifier for a transaction dataset, to predict which transaction is fraudulent (Dataset: https://github.com/IBM/TabFormer). The following is a quick summary of the dataset:
-
-
-
-
tabular-dl-tabr
The implementation of "TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning"
Project mention: [R] New Tabular DL model: "TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning" | /r/MachineLearning | 2023-07-28- Paper: link
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python tabular-data related posts
- Ctgan: Generating synthetic data in Python using GANs
- [Project] AMLTK: A framework for building your own AutoML (AutoSklearn authors)
- Time series data into a CNN
- Looking to switch from full-time open source to an early stage startup
- Benchmark synthetic tabular data generators using Syunthcity
- Tired of synthetic corgi?Check out Synthcity,a tool for synthetic tabular data
- [P] Stable Diffusion in Tensorflow / Keras
-
A note from our sponsor - WorkOS
workos.com | 18 Apr 2024
Index
What are some of the best open-source tabular-data projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | vaex | 8,171 |
2 | visidata | 7,388 |
3 | autogluon | 7,050 |
4 | tabnet | 2,474 |
5 | Auto-PyTorch | 2,269 |
6 | sketch | 2,188 |
7 | alibi-detect | 2,074 |
8 | DataProfiler | 1,357 |
9 | pytorch-widedeep | 1,232 |
10 | CTGAN | 1,130 |
11 | Transformers4Rec | 1,021 |
12 | rows | 860 |
13 | tab-transformer-pytorch | 696 |
14 | Multimodal-Toolkit | 553 |
15 | Copulas | 501 |
16 | carefree-learn | 401 |
17 | saint | 366 |
18 | synthcity | 350 |
19 | TabFormer | 295 |
20 | tableQA | 282 |
21 | SDGym | 241 |
22 | ExtractTable-py | 236 |
23 | tabular-dl-tabr | 210 |