tabular-data

Top 23 tabular-data Open-Source Projects

  • react-virtualized

    React components for efficiently rendering large lists and tabular data

  • Project mention: The Secret Weapon of Top Developers: 7 React JS Libraries You Can't Afford to Ignore | dev.to | 2024-02-21

    You may increase the rendering efficiency of tabular and huge list data by using the React Virtualized module. React apps perform better overall when the quantity of requests and DOM elements is limited. React Virtualized is comparable to many other tools; however, what sets it apart from the competition is the sheer volume of features and excellent upkeep.

  • miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

  • Project mention: Qsv: Efficient CSV CLI Toolkit | news.ycombinator.com | 2023-12-22
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

  • visidata

    A terminal spreadsheet multitool for discovering and arranging data

  • Project mention: Fx – Terminal JSON Viewer | news.ycombinator.com | 2023-09-19

    [4] "Is it possible to "flatten" structured data (like JSON?)": https://github.com/saulpw/visidata/discussions/1605

  • autogluon

    AutoGluon: Fast and Accurate ML in 3 Lines of Code

  • Project mention: pip install remyxai - easiest way to create custom vision models | /r/computervision | 2023-04-25

    This seems not very convincing. There are other popular frameworks that provide AutoML with existing datasets (eg https://github.com/autogluon/autogluon)

  • FLAML

    A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.

  • Project mention: AutoGen: Enabling Next-Gen GPT-X Applications | news.ycombinator.com | 2023-08-22

    I really like the simplicity of this framework, and they hit on a lot of common problems found in other agent-based frameworks. Most intrigued by the RAG improvements.

    Seems like Microsoft was frustrated with the pace of movement in this space and the shitty results of agents (which admittedly kept my interest turned away from agents for the last few months). I'm interested again because it makes practical sense, and from looking at the example notebooks, seems fairly easy to integrate into existing applications.

    Maybe this is the 'low code' approach that might actually work, and bridge together engineering and non-engineering resources.

    This example was what caught my eye: https://github.com/microsoft/FLAML/blob/main/notebook/autoge...

  • tad

    A desktop application for viewing and analyzing tabular data

  • Project mention: Show HN: Open-source, browser-local data exploration using DuckDB-WASM and PRQL | news.ycombinator.com | 2024-03-15

    Very impressive project and vision! Love the demo!

    I am also ex-GS and worked on what I am fairly sure is the table display tool you're describing. I tried to carry the essential aspects of that work (multi-level pivots, with drill-down to the leaf level, and all interactive events and analytics supported by db queries) to Tad (https://www.tadviewer.com/, https://github.com/antonycourtney/tad), another open source project powered by DuckDb.

    An embeddable version of Tad, powered by DuckDb WASM, is used as the results viewer in the MotherDuck Web UI (https://app.motherduck.com/).

    If you're interested in embedding Tad in Pretzel, or leveraging pieces of it in your work, or collaborating on other aspects of DuckDb WASM powered UIs, please get in touch!

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • tabnet

    PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

  • Alpaca-CoT

    We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台,我们欢迎开源爱好者发起任何有意义的pr!

  • Auto-PyTorch

    Automatic architecture search and hyperparameter optimization for PyTorch

  • Project mention: [Project] AMLTK: A framework for building your own AutoML (AutoSklearn authors) | /r/MachineLearning | 2023-12-09

    We took some of the lessons learned while building AutoSklearn and AutoPytorch, the good, the bad and the ugly and made a library that to enable the next generation of open-source AutoML tools, to allow them to be research-able but also efficient and scalable. We have some future plans and on-going work with this and we'd like to gather any feedback the community might have!

  • sketch

    AI code-writing assistant that understands data content

  • Project mention: Ask HN: What have you built with LLMs? | news.ycombinator.com | 2024-02-05

    We've made a lot of data tooling things based on LLMs, and are in the process of rebranding and launching our main product.

    1. sketch (in notebook, ai for pandas) https://github.com/approximatelabs/sketch

    2. datadm (open source, "chat with data", with support for the open source LLMs (https://github.com/approximatelabs/datadm)

    3. Our main product: julyp. https://julyp.com/ (currently under very active rebrand and cleanup) -- but a "chat with data" style app, with a lot of specialized features. I'm also streaming me using it (and sometimes building it) every weekday on twitch to solve misc data problems (https://www.twitch.tv/bluecoconut)

    For your next question, about the stack and deploy:

  • alibi-detect

    Algorithms for outlier, adversarial and drift detection

  • Project mention: Exploring Open-Source Alternatives to Landing AI for Robust MLOps | dev.to | 2023-12-13

    Numerous tools exist for detecting anomalies in time series data, but Alibi Detect stood out to me, particularly for its capabilities and its compatibility with both TensorFlow and PyTorch backends.

  • tidy-viewer

    📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

  • Project mention: Csvlens: Command line CSV file viewer. Like less but made for CSV | news.ycombinator.com | 2024-01-06
  • DataFrames.jl

    In-memory tabular data in Julia

  • tsv-utils

    eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

  • Project mention: Frawk: An efficient Awk-like programming language. (2021) | news.ycombinator.com | 2024-04-21

    If you need just csv/tsv parsing, you can also take a look at https://github.com/eBay/tsv-utils

  • DataProfiler

    What's in your data? Extract schema, statistics and entities from datasets

  • Project mention: LongRoPE: Extending LLM Context Window Beyond 2M Tokens | news.ycombinator.com | 2024-02-22

    It's been possible to skip tokenization for a long time, my team and I did it here - https://github.com/capitalone/DataProfiler

    For what it's worth, we actually were working with LSTMs with nearly a billion params back in 2016-2017 area. Transformers made it far more effective to train and execute, but ultimately LSTMs are able to achieve similar results, though slow & require more training data.

  • pytorch-widedeep

    A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

  • ktrain

    ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

  • CTGAN

    Conditional GAN for generating synthetic tabular data.

  • Project mention: Ctgan: Generating synthetic data in Python using GANs | news.ycombinator.com | 2024-02-05
  • Transformers4Rec

    Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.

  • virtua

    A zero-config, fast and small (~3kB) virtual list (and grid) component for React, Vue and Solid.

  • Project mention: Show HN: Virtua – zero-config virtualization components for React | news.ycombinator.com | 2023-07-17
  • rows

    A common, beautiful interface to tabular data, no matter the format

  • tab-transformer-pytorch

    Implementation of TabTransformer, attention network for tabular data, in Pytorch

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

tabular-data related posts

Index

What are some of the best open-source tabular-data projects? This list will help you:

Project Stars
1 react-virtualized 25,936
2 miller 8,553
3 vaex 8,173
4 visidata 7,409
5 autogluon 7,091
6 FLAML 3,671
7 tad 3,013
8 tabnet 2,476
9 Alpaca-CoT 2,463
10 Auto-PyTorch 2,274
11 sketch 2,194
12 alibi-detect 2,082
13 tidy-viewer 2,020
14 DataFrames.jl 1,690
15 tsv-utils 1,396
16 DataProfiler 1,357
17 pytorch-widedeep 1,234
18 ktrain 1,210
19 CTGAN 1,136
20 Transformers4Rec 1,025
21 virtua 933
22 rows 860
23 tab-transformer-pytorch 698

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com