SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 tabular-data Open-Source Projects
-
miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Alpaca-CoT
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台,我们欢迎开源爱好者发起任何有意义的pr!
-
tidy-viewer
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
-
tsv-utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
-
pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
-
ktrain
ktrain is a Python library that makes deep learning and AI more accessible and easier to apply
-
Transformers4Rec
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
-
virtua
A zero-config, fast and small (~3kB) virtual list (and grid) component for React, Vue and Solid.
-
tab-transformer-pytorch
Implementation of TabTransformer, attention network for tabular data, in Pytorch
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: The Secret Weapon of Top Developers: 7 React JS Libraries You Can't Afford to Ignore | dev.to | 2024-02-21You may increase the rendering efficiency of tabular and huge list data by using the React Virtualized module. React apps perform better overall when the quantity of requests and DOM elements is limited. React Virtualized is comparable to many other tools; however, what sets it apart from the competition is the sheer volume of features and excellent upkeep.
[4] "Is it possible to "flatten" structured data (like JSON?)": https://github.com/saulpw/visidata/discussions/1605
Project mention: pip install remyxai - easiest way to create custom vision models | /r/computervision | 2023-04-25This seems not very convincing. There are other popular frameworks that provide AutoML with existing datasets (eg https://github.com/autogluon/autogluon)
I really like the simplicity of this framework, and they hit on a lot of common problems found in other agent-based frameworks. Most intrigued by the RAG improvements.
Seems like Microsoft was frustrated with the pace of movement in this space and the shitty results of agents (which admittedly kept my interest turned away from agents for the last few months). I'm interested again because it makes practical sense, and from looking at the example notebooks, seems fairly easy to integrate into existing applications.
Maybe this is the 'low code' approach that might actually work, and bridge together engineering and non-engineering resources.
This example was what caught my eye: https://github.com/microsoft/FLAML/blob/main/notebook/autoge...
Project mention: Show HN: Open-source, browser-local data exploration using DuckDB-WASM and PRQL | news.ycombinator.com | 2024-03-15Very impressive project and vision! Love the demo!
I am also ex-GS and worked on what I am fairly sure is the table display tool you're describing. I tried to carry the essential aspects of that work (multi-level pivots, with drill-down to the leaf level, and all interactive events and analytics supported by db queries) to Tad (https://www.tadviewer.com/, https://github.com/antonycourtney/tad), another open source project powered by DuckDb.
An embeddable version of Tad, powered by DuckDb WASM, is used as the results viewer in the MotherDuck Web UI (https://app.motherduck.com/).
If you're interested in embedding Tad in Pretzel, or leveraging pieces of it in your work, or collaborating on other aspects of DuckDb WASM powered UIs, please get in touch!
Project mention: [Project] AMLTK: A framework for building your own AutoML (AutoSklearn authors) | /r/MachineLearning | 2023-12-09We took some of the lessons learned while building AutoSklearn and AutoPytorch, the good, the bad and the ugly and made a library that to enable the next generation of open-source AutoML tools, to allow them to be research-able but also efficient and scalable. We have some future plans and on-going work with this and we'd like to gather any feedback the community might have!
We've made a lot of data tooling things based on LLMs, and are in the process of rebranding and launching our main product.
1. sketch (in notebook, ai for pandas) https://github.com/approximatelabs/sketch
2. datadm (open source, "chat with data", with support for the open source LLMs (https://github.com/approximatelabs/datadm)
3. Our main product: julyp. https://julyp.com/ (currently under very active rebrand and cleanup) -- but a "chat with data" style app, with a lot of specialized features. I'm also streaming me using it (and sometimes building it) every weekday on twitch to solve misc data problems (https://www.twitch.tv/bluecoconut)
For your next question, about the stack and deploy:
Project mention: Exploring Open-Source Alternatives to Landing AI for Robust MLOps | dev.to | 2023-12-13Numerous tools exist for detecting anomalies in time series data, but Alibi Detect stood out to me, particularly for its capabilities and its compatibility with both TensorFlow and PyTorch backends.
Project mention: Csvlens: Command line CSV file viewer. Like less but made for CSV | news.ycombinator.com | 2024-01-06
Project mention: Frawk: An efficient Awk-like programming language. (2021) | news.ycombinator.com | 2024-04-21If you need just csv/tsv parsing, you can also take a look at https://github.com/eBay/tsv-utils
Project mention: LongRoPE: Extending LLM Context Window Beyond 2M Tokens | news.ycombinator.com | 2024-02-22It's been possible to skip tokenization for a long time, my team and I did it here - https://github.com/capitalone/DataProfiler
For what it's worth, we actually were working with LSTMs with nearly a billion params back in 2016-2017 area. Transformers made it far more effective to train and execute, but ultimately LSTMs are able to achieve similar results, though slow & require more training data.
Project mention: Ctgan: Generating synthetic data in Python using GANs | news.ycombinator.com | 2024-02-05
Project mention: Show HN: Virtua – zero-config virtualization components for React | news.ycombinator.com | 2023-07-17
tabular-data related posts
- Frawk: An efficient Awk-like programming language. (2021)
- Ctgan: Generating synthetic data in Python using GANs
- [Project] AMLTK: A framework for building your own AutoML (AutoSklearn authors)
- What is the best library for processing table data contained within a PDF?
- Ask HN: What's a good library/command line tool to extract tables from PDFs?
- Building a database to search Excel files
- Julia's latency: Past, present and future
-
A note from our sponsor - SaaSHub
www.saashub.com | 24 Apr 2024
Index
What are some of the best open-source tabular-data projects? This list will help you:
Project | Stars | |
---|---|---|
1 | react-virtualized | 25,936 |
2 | miller | 8,553 |
3 | vaex | 8,173 |
4 | visidata | 7,409 |
5 | autogluon | 7,091 |
6 | FLAML | 3,671 |
7 | tad | 3,013 |
8 | tabnet | 2,476 |
9 | Alpaca-CoT | 2,463 |
10 | Auto-PyTorch | 2,274 |
11 | sketch | 2,194 |
12 | alibi-detect | 2,082 |
13 | tidy-viewer | 2,020 |
14 | DataFrames.jl | 1,690 |
15 | tsv-utils | 1,396 |
16 | DataProfiler | 1,357 |
17 | pytorch-widedeep | 1,234 |
18 | ktrain | 1,210 |
19 | CTGAN | 1,136 |
20 | Transformers4Rec | 1,025 |
21 | virtua | 933 |
22 | rows | 860 |
23 | tab-transformer-pytorch | 698 |
Sponsored