Working with more than 10gb csv

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

sqlitestudio

3 4,171 9.2 C

A free, open source, multi-platform SQLite database manager.

https://sqlitestudio.pl is awesome, super easy to set up and pull in CSVs
modin

2 9,453 9.6 Python

Modin: Scale your Pandas workflows by changing a single line of code

Modin should fit. It implements Pandas APIs with e.g. Ray as backend. https://github.com/modin-project/modin
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
spyql

1 902 0.0 Jupyter Notebook

Query data on the command line with SQL-like SELECTs powered by Python expressions

You can import the data into a PostgreSQL/MySQL/SQLite/... database and then query the database. However, even with the right choice of indexes, it might take a while to run queries on a table with hundreds of millions of records. You can easily import your data to these databases with SpyQL: $ spyql "SELECT * FROM csv TO sql(table=my_table_name) | sqlite3 my.db" (you would need to create the table my_table_name before running the command).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project