Top 23 Python tabular-data Projects

vaex

7 8,171 6.0 Python

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
visidata

36 7,388 9.8 Python

A terminal spreadsheet multitool for discovering and arranging data

Project mention: Fx – Terminal JSON Viewer | news.ycombinator.com | 2023-09-19

[4] "Is it possible to "flatten" structured data (like JSON?)": https://github.com/saulpw/visidata/discussions/1605
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
autogluon

8 7,050 9.6 Python

AutoGluon: Fast and Accurate ML in 3 Lines of Code

Project mention: pip install remyxai - easiest way to create custom vision models | /r/computervision | 2023-04-25

This seems not very convincing. There are other popular frameworks that provide AutoML with existing datasets (eg https://github.com/autogluon/autogluon)
tabnet

8 2,474 4.8 Python

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
Auto-PyTorch

4 2,269 0.0 Python

Automatic architecture search and hyperparameter optimization for PyTorch

Project mention: [Project] AMLTK: A framework for building your own AutoML (AutoSklearn authors) | /r/MachineLearning | 2023-12-09

We took some of the lessons learned while building AutoSklearn and AutoPytorch, the good, the bad and the ugly and made a library that to enable the next generation of open-source AutoML tools, to allow them to be research-able but also efficient and scalable. We have some future plans and on-going work with this and we'd like to gather any feedback the community might have!
sketch

20 2,188 4.4 Python

AI code-writing assistant that understands data content

Project mention: Ask HN: What have you built with LLMs? | news.ycombinator.com | 2024-02-05

We've made a lot of data tooling things based on LLMs, and are in the process of rebranding and launching our main product.
1. sketch (in notebook, ai for pandas) https://github.com/approximatelabs/sketch
2. datadm (open source, "chat with data", with support for the open source LLMs (https://github.com/approximatelabs/datadm)
3. Our main product: julyp. https://julyp.com/ (currently under very active rebrand and cleanup) -- but a "chat with data" style app, with a lot of specialized features. I'm also streaming me using it (and sometimes building it) every weekday on twitch to solve misc data problems (https://www.twitch.tv/bluecoconut)
For your next question, about the stack and deploy:
alibi-detect

9 2,074 7.6 Python

Algorithms for outlier, adversarial and drift detection

Project mention: Exploring Open-Source Alternatives to Landing AI for Robust MLOps | dev.to | 2023-12-13

Numerous tools exist for detecting anomalies in time series data, but Alibi Detect stood out to me, particularly for its capabilities and its compatibility with both TensorFlow and PyTorch backends.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
DataProfiler

61 1,357 6.3 Python

What's in your data? Extract schema, statistics and entities from datasets

Project mention: LongRoPE: Extending LLM Context Window Beyond 2M Tokens | news.ycombinator.com | 2024-02-22

It's been possible to skip tokenization for a long time, my team and I did it here - https://github.com/capitalone/DataProfiler
For what it's worth, we actually were working with LSTMs with nearly a billion params back in 2016-2017 area. Transformers made it far more effective to train and execute, but ultimately LSTMs are able to achieve similar results, though slow & require more training data.
pytorch-widedeep

7 1,232 8.5 Python

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
CTGAN

2 1,130 7.9 Python

Conditional GAN for generating synthetic tabular data.

Project mention: Ctgan: Generating synthetic data in Python using GANs | news.ycombinator.com | 2024-02-05
Transformers4Rec

4 1,021 7.2 Python

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
rows

1 860 5.1 Python

A common, beautiful interface to tabular data, no matter the format
tab-transformer-pytorch

1 696 4.5 Python

Implementation of TabTransformer, attention network for tabular data, in Pytorch
Multimodal-Toolkit

2 553 7.6 Python

Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
Copulas

1 501 7.3 Python

A library to model multivariate data using copulas.
carefree-learn

1 401 9.8 Python

Deep Learning ❤️ PyTorch
saint

1 366 1.8 Python

The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training
synthcity

4 350 7.3 Python

A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
TabFormer

10 295 0.0 Python

Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

Project mention: Time-based splitting performing significantly worse than random splitting | /r/learnmachinelearning | 2023-05-20

Hi, I am currently working on a basic binary classifier for a transaction dataset, to predict which transaction is fraudulent (Dataset: https://github.com/IBM/TabFormer). The following is a quick summary of the dataset:
tableQA

6 282 2.6 Python

AI Tool for querying natural language on tabular data.
SDGym

1 241 7.5 Python

Benchmarking synthetic data generation methods.
ExtractTable-py

1 236 0.0 Python

Python library to extract tabular data from images and scanned PDFs
tabular-dl-tabr

3 210 4.2 Python

The implementation of "TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning"

Project mention: [R] New Tabular DL model: "TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning" | /r/MachineLearning | 2023-07-28

- Paper: link
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-22.

Python tabular-data related posts

Ctgan: Generating synthetic data in Python using GANs
1 project | news.ycombinator.com | 5 Feb 2024
[Project] AMLTK: A framework for building your own AutoML (AutoSklearn authors)
2 projects | /r/MachineLearning | 9 Dec 2023
Time series data into a CNN
1 project | /r/learnmachinelearning | 1 Mar 2023
Looking to switch from full-time open source to an early stage startup
3 projects | /r/datascience | 25 Jan 2023
Benchmark synthetic tabular data generators using Syunthcity
1 project | /r/learnmachinelearning | 23 Jan 2023
Tired of synthetic corgi?Check out Synthcity,a tool for synthetic tabular data
2 projects | news.ycombinator.com | 19 Jan 2023
[P] Stable Diffusion in Tensorflow / Keras
1 project | /r/MachineLearning | 19 Sep 2022
A note from our sponsor - WorkOS
workos.com | 18 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source tabular-data projects in Python? This list will help you:

	Project	Stars
1	vaex	8,171
2	visidata	7,388
3	autogluon	7,050
4	tabnet	2,474
5	Auto-PyTorch	2,269
6	sketch	2,188
7	alibi-detect	2,074
8	DataProfiler	1,357
9	pytorch-widedeep	1,232
10	CTGAN	1,130
11	Transformers4Rec	1,021
12	rows	860
13	tab-transformer-pytorch	696
14	Multimodal-Toolkit	553
15	Copulas	501
16	carefree-learn	401
17	saint	366
18	synthcity	350
19	TabFormer	295
20	tableQA	282
21	SDGym	241
22	ExtractTable-py	236
23	tabular-dl-tabr	210