Python tabular-data

Open-source Python projects categorized as tabular-data

Top 23 Python tabular-data Projects

  • vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

  • visidata

    A terminal spreadsheet multitool for discovering and arranging data

    Project mention: Fx – Terminal JSON Viewer | news.ycombinator.com | 2023-09-19

    [4] "Is it possible to "flatten" structured data (like JSON?)": https://github.com/saulpw/visidata/discussions/1605

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • autogluon

    AutoGluon: Fast and Accurate ML in 3 Lines of Code

    Project mention: pip install remyxai - easiest way to create custom vision models | /r/computervision | 2023-04-25

    This seems not very convincing. There are other popular frameworks that provide AutoML with existing datasets (eg https://github.com/autogluon/autogluon)

  • tabnet

    PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

  • Auto-PyTorch

    Automatic architecture search and hyperparameter optimization for PyTorch

    Project mention: [Project] AMLTK: A framework for building your own AutoML (AutoSklearn authors) | /r/MachineLearning | 2023-12-09

    We took some of the lessons learned while building AutoSklearn and AutoPytorch, the good, the bad and the ugly and made a library that to enable the next generation of open-source AutoML tools, to allow them to be research-able but also efficient and scalable. We have some future plans and on-going work with this and we'd like to gather any feedback the community might have!

  • sketch

    AI code-writing assistant that understands data content

    Project mention: Ask HN: What have you built with LLMs? | news.ycombinator.com | 2024-02-05

    We've made a lot of data tooling things based on LLMs, and are in the process of rebranding and launching our main product.

    1. sketch (in notebook, ai for pandas) https://github.com/approximatelabs/sketch

    2. datadm (open source, "chat with data", with support for the open source LLMs (https://github.com/approximatelabs/datadm)

    3. Our main product: julyp. https://julyp.com/ (currently under very active rebrand and cleanup) -- but a "chat with data" style app, with a lot of specialized features. I'm also streaming me using it (and sometimes building it) every weekday on twitch to solve misc data problems (https://www.twitch.tv/bluecoconut)

    For your next question, about the stack and deploy:

  • alibi-detect

    Algorithms for outlier, adversarial and drift detection

    Project mention: Exploring Open-Source Alternatives to Landing AI for Robust MLOps | dev.to | 2023-12-13

    Numerous tools exist for detecting anomalies in time series data, but Alibi Detect stood out to me, particularly for its capabilities and its compatibility with both TensorFlow and PyTorch backends.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • DataProfiler

    What's in your data? Extract schema, statistics and entities from datasets

    Project mention: LongRoPE: Extending LLM Context Window Beyond 2M Tokens | news.ycombinator.com | 2024-02-22

    It's been possible to skip tokenization for a long time, my team and I did it here - https://github.com/capitalone/DataProfiler

    For what it's worth, we actually were working with LSTMs with nearly a billion params back in 2016-2017 area. Transformers made it far more effective to train and execute, but ultimately LSTMs are able to achieve similar results, though slow & require more training data.

  • pytorch-widedeep

    A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

  • CTGAN

    Conditional GAN for generating synthetic tabular data.

    Project mention: Ctgan: Generating synthetic data in Python using GANs | news.ycombinator.com | 2024-02-05
  • Transformers4Rec

    Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.

  • rows

    A common, beautiful interface to tabular data, no matter the format

  • tab-transformer-pytorch

    Implementation of TabTransformer, attention network for tabular data, in Pytorch

  • Multimodal-Toolkit

    Multimodal model for text and tabular data with HuggingFace transformers as building block for text data

  • Copulas

    A library to model multivariate data using copulas.

  • carefree-learn

    Deep Learning ❤️ PyTorch

  • saint

    The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

  • synthcity

    A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.

  • TabFormer

    Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

    Project mention: Time-based splitting performing significantly worse than random splitting | /r/learnmachinelearning | 2023-05-20

    Hi, I am currently working on a basic binary classifier for a transaction dataset, to predict which transaction is fraudulent (Dataset: https://github.com/IBM/TabFormer). The following is a quick summary of the dataset:

  • tableQA

    AI Tool for querying natural language on tabular data.

  • SDGym

    Benchmarking synthetic data generation methods.

  • ExtractTable-py

    Python library to extract tabular data from images and scanned PDFs

  • tabular-dl-tabr

    The implementation of "TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning"

    Project mention: [R] New Tabular DL model: "TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning" | /r/MachineLearning | 2023-07-28

    - Paper: link

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-22.

Python tabular-data related posts

Index

What are some of the best open-source tabular-data projects in Python? This list will help you:

Project Stars
1 vaex 8,171
2 visidata 7,388
3 autogluon 7,050
4 tabnet 2,474
5 Auto-PyTorch 2,269
6 sketch 2,188
7 alibi-detect 2,074
8 DataProfiler 1,357
9 pytorch-widedeep 1,232
10 CTGAN 1,130
11 Transformers4Rec 1,021
12 rows 860
13 tab-transformer-pytorch 696
14 Multimodal-Toolkit 553
15 Copulas 501
16 carefree-learn 401
17 saint 366
18 synthcity 350
19 TabFormer 295
20 tableQA 282
21 SDGym 241
22 ExtractTable-py 236
23 tabular-dl-tabr 210
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com