Open-source projects categorized as tabular-data

Top 12 tabular-data Open-Source Projects

  • GitHub repo react-virtualized

    React components for efficiently rendering large lists and tabular data

    Project mention: Data driven Web Frontends....looking at React and beyond for CRUD | reddit.com/r/datascience | 2021-04-21

    > GraphQL and React seem to be really popular The combo Apollo + React works till you have less than 10K data points / 1 request. Afterwards you have to invent a ways how to a) reduce bandwidth; b) optimize performance in browser. React already has quite some ways to deal with in-browser performance, e.g. https://github.com/bvaughn/react-virtualized. As about Apollo... I have had some epic troubles with it when there are many JOINs / big payload, and ended up with Websockets in some cases and with REST in some other cases.

  • GitHub repo vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀

    Project mention: I wrote one of the fastest DataFrame libraries | news.ycombinator.com | 2021-03-13
  • GitHub repo visidata

    A terminal spreadsheet multitool for discovering and arranging data

    Project mention: `uq is a simple, user-friendly alternative to `sort | uniq`. | reddit.com/r/commandline | 2021-04-15

    Run vd (VisiData on the file, press Shift+F, instant unique lines sorted by number of uses. Like sort | uniq -c | sort -n in one go.

  • GitHub repo miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

    Project mention: Consultare un databate XML, JSON, CVS o RDF | reddit.com/r/ItalyInformatica | 2021-03-31
  • GitHub repo tsv-utils

    eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

    Project mention: Return 1 to N results from a large (19MM line) CSV | reddit.com/r/commandline | 2021-04-17

    May well be overkill for your needs, but I'm a fan of tsv-utils It's fast and enormously flexible, and seems to me a "best of breed" toolset for data mining CSV files (that is what it was written for). https://github.com/eBay/tsv-utils

  • GitHub repo DataFrames.jl

    In-memory tabular data in Julia

    Project mention: Polars (Rust DataFrame library) join algorithm fastest in db-benchmark | reddit.com/r/rust | 2021-03-12

    Looks like it's single threaded according to this open issue: https://github.com/JuliaData/DataFrames.jl/issues/2626

  • GitHub repo tabnet

    PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

    Project mention: [D] Why Neural Networks for tabular data are bad? | reddit.com/r/MachineLearning | 2021-03-07
  • GitHub repo ktrain

    ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

    Project mention: Increase accuracy of NLP sentiment analysis? | reddit.com/r/datascience | 2020-12-29

    Could be the last resort: pre-trained transformers through ktrain (ktrain has very detailed tutorials and examples)

  • GitHub repo CTGAN

    Conditional GAN for generating synthetic tabular data.

    Project mention: Weekly Entering & Transitioning Thread | 28 Mar 2021 - 04 Apr 2021 | reddit.com/r/datascience | 2021-03-29
  • GitHub repo SwiftyTextTable

    A lightweight library for generating text tables.

  • GitHub repo tableQA

    AI Tool for querying natural language on tabular data.

    Project mention: TableQA -Query your tabular data with natural language | dev.to | 2020-11-28


  • GitHub repo Multimodal-Toolkit

    Multimodal model for text and tabular data with HuggingFace transformers as building block for text data

    Project mention: Classification problem with text and numerical features | reddit.com/r/LanguageTechnology | 2021-04-15

    A quick and dirty way of trying this is using this framework:https://github.com/georgian-io/Multimodal-Toolkit

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-04-21.


What are some of the best open-source tabular-data projects? This list will help you:

Project Stars
1 react-virtualized 21,540
2 vaex 6,045
3 visidata 3,739
4 miller 2,710
5 tsv-utils 1,224
6 DataFrames.jl 976
7 tabnet 943
8 ktrain 802
9 CTGAN 321
10 SwiftyTextTable 258
11 tableQA 110
12 Multimodal-Toolkit 95