vinum
datatable
vinum | datatable | |
---|---|---|
5 | 9 | |
65 | 1,790 | |
- | 0.5% | |
0.0 | 6.1 | |
almost 3 years ago | 5 months ago | |
C++ | C++ | |
BSD 3-clause "New" or "Revised" License | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
vinum
-
Practical SQL for Data Analysis(what you can do without Pandas)
Following similar observations I was wondering if one can actually execute SQL queries inside of Python process with the access to native Python functions and Numpy as UDFs. Thanks to Apache Arrow one can mix C++ and Python operators without need to copy the data and essentially combine DataFrame API with SQL, all while within the confines of the same Python process.
https://github.com/dmitrykoval/vinum
Vinum allows users to write queries which may invoke any Numpy or Python functions as UDFs available to the interpreter.
- Vinum is a SQL query processor for Python, designed for data analysis workflows
- dmitrykoval/vinum Vinum is a SQL query processor for Python, designed for data analysis workflows and in-memory analytics.
- Vinum – SQL Processor for Python with native Python, Numpy UDF support.
- Show HN: Vinum – SQL Processor for Python with Native Python, Numpy UDF Support
datatable
-
Cheat Sheets for data.table to Python's pandas syntax?
Aside from that, there is a Python translation of data.table (see documentation here), which might be worth looking into. However, it hasn't had any major updates in a while: the last release 2 years ago ...
- Any advice on using Pandas as a data analyst?
-
Alternative to Pandas
There's datatable. I haven't used it much, but the R version (data.table) is phenomenal.
-
Need advice on whether to store data set for regression model in SQL database or by using Python modules like Pickle or Parquet
just use HDF5 or Parquet, or CSV + https://github.com/h2oai/datatable to speed up the file reading.
- Massive R analysis of Data Science Language and Job Trends 2022
-
Scikit-Learn Version 1.0
> For me I had with pandas the most issues using it's multiindex.
Yessss. I loathe indices, and have never been in a situation where I was better off with them than without them.
> Regarding fast you have something like Vaex on python sid
I've never used Vaex, but I've used datatable (https://github.com/h2oai/datatable) and polars (https://github.com/pola-rs/polars). Polars is my favorite API, but datatable was faster at reading data (Polars was faster in execution). I'll have to give Vaex a try at some point.
- Show HN: Sheet2dict – simple Python XLSX/CSV reader/to dictionary converter
-
Hey Reddit, here's my comprehensive course on Python Pandas, for free.
Yep. I think this is the downside to a package being entirely maintained by volunteers. In any case, Pandas is still the leading data wrangling package for Python. (I'm excited to see how datatable evolves.)
-
Ditching Excel for Python in a Legacy Industry (Reinsurance)
h2o's data.table clone is fine
https://github.com/h2oai/datatable
What are some alternatives?
siuba - Python library for using dplyr like syntax with pandas and SQL
polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust
NumCpp - C++ implementation of the Python Numpy library
DataFrame - C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
turbodbc - Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.
db-benchmark - reproducible benchmark of database-like ops
duckdb - DuckDB is an in-process SQL OLAP Database Management System
scientific-visualization-book - An open access book on scientific visualization using python and matplotlib
q - q - Run SQL directly on delimited files and multi-file sqlite databases
sktime - A unified framework for machine learning with time series
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
faiss - A library for efficient similarity search and clustering of dense vectors.