datatable
vinum
Our great sponsors
datatable | vinum | |
---|---|---|
9 | 5 | |
1,790 | 65 | |
0.8% | - | |
6.1 | 0.0 | |
5 months ago | almost 3 years ago | |
C++ | C++ | |
Mozilla Public License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
datatable
-
Cheat Sheets for data.table to Python's pandas syntax?
Aside from that, there is a Python translation of data.table (see documentation here), which might be worth looking into. However, it hasn't had any major updates in a while: the last release 2 years ago ...
- Any advice on using Pandas as a data analyst?
-
Alternative to Pandas
There's datatable. I haven't used it much, but the R version (data.table) is phenomenal.
-
Need advice on whether to store data set for regression model in SQL database or by using Python modules like Pickle or Parquet
just use HDF5 or Parquet, or CSV + https://github.com/h2oai/datatable to speed up the file reading.
- Massive R analysis of Data Science Language and Job Trends 2022
-
Scikit-Learn Version 1.0
> For me I had with pandas the most issues using it's multiindex.
Yessss. I loathe indices, and have never been in a situation where I was better off with them than without them.
> Regarding fast you have something like Vaex on python sid
I've never used Vaex, but I've used datatable (https://github.com/h2oai/datatable) and polars (https://github.com/pola-rs/polars). Polars is my favorite API, but datatable was faster at reading data (Polars was faster in execution). I'll have to give Vaex a try at some point.
- Show HN: Sheet2dict – simple Python XLSX/CSV reader/to dictionary converter
-
Hey Reddit, here's my comprehensive course on Python Pandas, for free.
Yep. I think this is the downside to a package being entirely maintained by volunteers. In any case, Pandas is still the leading data wrangling package for Python. (I'm excited to see how datatable evolves.)
-
Ditching Excel for Python in a Legacy Industry (Reinsurance)
h2o's data.table clone is fine
https://github.com/h2oai/datatable
vinum
-
Practical SQL for Data Analysis(what you can do without Pandas)
Following similar observations I was wondering if one can actually execute SQL queries inside of Python process with the access to native Python functions and Numpy as UDFs. Thanks to Apache Arrow one can mix C++ and Python operators without need to copy the data and essentially combine DataFrame API with SQL, all while within the confines of the same Python process.
https://github.com/dmitrykoval/vinum
Vinum allows users to write queries which may invoke any Numpy or Python functions as UDFs available to the interpreter.
- Vinum is a SQL query processor for Python, designed for data analysis workflows
- dmitrykoval/vinum Vinum is a SQL query processor for Python, designed for data analysis workflows and in-memory analytics.
- Vinum – SQL Processor for Python with native Python, Numpy UDF support.
- Show HN: Vinum – SQL Processor for Python with Native Python, Numpy UDF Support
What are some alternatives?
polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust
siuba - Python library for using dplyr like syntax with pandas and SQL
DataFrame - C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
NumCpp - C++ implementation of the Python Numpy library
db-benchmark - reproducible benchmark of database-like ops
turbodbc - Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.
scientific-visualization-book - An open access book on scientific visualization using python and matplotlib
duckdb - DuckDB is an in-process SQL OLAP Database Management System
sktime - A unified framework for machine learning with time series
q - q - Run SQL directly on delimited files and multi-file sqlite databases
faiss - A library for efficient similarity search and clustering of dense vectors.
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration