Dataframes

Top 21 Dataframe Open-Source Projects

  • polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

  • Project mention: Why Python's Integer Division Floors (2010) | news.ycombinator.com | 2024-02-28

    This is because 0.1 is in actuality the floating point value value 0.1000000000000000055511151231257827021181583404541015625, and thus 1 divided by it is ever so slightly smaller than 10. Nevertheless, fpround(1 / fpround(1 / 10)) = 10 exactly.

    I found out about this recently because in Polars I defined a // b for floats to be (a / b).floor(), which does return 10 for this computation. Since Python's correctly-rounded division is rather expensive, I chose to stick to this (more context: https://github.com/pola-rs/polars/issues/14596#issuecomment-...).

  • pandera

    A light-weight, flexible, and expressive statistical data testing library

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • TileDB

    The Universal Storage Engine

  • Project mention: Ask HN: Who is hiring? (May 2024) | news.ycombinator.com | 2024-05-01

    TileDB, Inc. | Full-Time | REMOTE | USA, Greece/EU | [https://tiledb.com](https://tiledb.com/)

    TileDB has recently announced a $34 million Series B fund-raise and is actively hiring for engineers across a range of roles (SRE, backend/distributed systems, database internals, and more). You will have the opportunity to work on innovative technology that creates impact for challenging problems in genomics, geospatial, machine learning, distributed systems, and many other areas.

    TileDB Cloud is the modern database, allowing developers and scientists to capture, analyze, and share any data with any tool. We build on a broad foundation of open source, maintaining the TileDB storage engine, libraries for genomics (single-cell and population), geospatial (raster, point clouds, and more), a TileDB visualization engine extending Babylon.js, and much more ([github.com/TileDB-Inc/TileDB](http://github.com/TileDB-Inc/TileDB))

    With TileDB, all data — tables, genomics, images, videos, location, time-series — is captured as multi-dimensional arrays. To supercharge this data, TileDB Cloud implements a serverless infrastructure delivering query execution, access control, data and code sharing, and distributed computing at global scale — eliminating cluster management, minimizing TCO, and promoting scientific collaboration and reproducibility.

    Website: [https://tiledb.com](https://tiledb.com/) | GitHub: https://github.com/TileDB-Inc/TileDB | Blog: https://tiledb.com/blog

    We are actively hiring for several roles including:

    - Site Reliability Engineer (k8s, Terraform, automation, Prometheus, CloudWatch, GitOps; Golang, Python)

  • DataFrames.jl

    In-memory tabular data in Julia

  • dataframe-go

    DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

  • Project mention: packages similar to Pandas | /r/golang | 2023-05-10

    Numpy functionality is largely covered by https://www.gonum.org/ but for pandas I'm not sure if there is an equivalent as widely accepted. However, you might try https://github.com/rocketlaunchr/dataframe-go which I have not tried but it looks like it covers some of what you're looking for

  • explorer

    Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

  • Project mention: Polars | news.ycombinator.com | 2024-01-08

    The Explorer library [0] in Elixir uses Polars underneath it.

    [0] https://github.com/elixir-explorer/explorer

  • pdpipe

    Easy pipelines for pandas DataFrames.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • eland

    Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

  • DataFramesMeta.jl

    Metaprogramming tools for DataFrames

  • Project mention: Pandas vs. Julia – cheat sheet and comparison | news.ycombinator.com | 2023-05-17
  • datacompy

    Pandas and Spark DataFrame comparison for humans and more!

  • Project mention: How to Check 2 SQL Tables Are the Same | news.ycombinator.com | 2023-07-26
  • riptable

    64bit multithreaded python data analytics tools for numpy arrays and datasets

  • rumble

    ⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)

  • dataframe_sql

    A Python package that parses SQL and interprets it as methods that act upon existing pandas (or other types of) DataFrames that have been declared and registered

  • dataframe-api

    RFC document, tooling and other content related to the dataframe API standard

  • Project mention: Introducing seaborn-polars, a package allowing to use Polars DataFrames and LazyFrames with Seaborn | /r/Python | 2023-05-15

    Yes, with the upcoming dataframe api protocol the implementation and API will be separated for libraries that adopt that protocol.

  • red_amber

    A dataframe library for Rubyists.

  • sql_to_ibis

    A Python package that parses sql and converts it to ibis expressions

  • DLMReader.jl

    High-performance delimited-file reader and writer for Julia

  • Project mention: Best alternative for python | /r/deeplearning | 2023-12-06

    Julia is great https://github.com/JuliaData or https://github.com/sl-solution/DLMReader.jl might be a good startingpoint

  • heidi

    heidi : tidy data in Haskell

  • TableIO.jl

    A glue package for reading and writing tabular data. It aims to provide a uniform api for reading and writing tabular data from and to multiple sources.

  • mainframe

    mainframe - a lightweight dataframe library for C++

  • FloridaPropertyData

    A Python-based tool for retrieving and processing property data for specific counties in Florida using Parcel ID numbers. Simplifies data retrieval and offers customization options for real estate agents, investors, and government officials.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Dataframes related posts

Index

What are some of the best open-source Dataframe projects? This list will help you:

Project Stars
1 polars 26,378
2 pandera 3,012
3 TileDB 1,764
4 DataFrames.jl 1,696
5 dataframe-go 1,112
6 explorer 977
7 pdpipe 715
8 eland 611
9 DataFramesMeta.jl 472
10 datacompy 386
11 riptable 346
12 rumble 207
13 dataframe_sql 96
14 dataframe-api 95
15 red_amber 61
16 sql_to_ibis 50
17 DLMReader.jl 26
18 heidi 25
19 TableIO.jl 13
20 mainframe 3
21 FloridaPropertyData 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com