Working with more than 10gb csv

This page summarizes the projects mentioned and recommended in the original post on /r/datascience

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. sqlitestudio

    A free, open source, multi-platform SQLite database manager.

    https://sqlitestudio.pl is awesome, super easy to set up and pull in CSVs

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. modin

    Modin: Scale your Pandas workflows by changing a single line of code

    Modin should fit. It implements Pandas APIs with e.g. Ray as backend. https://github.com/modin-project/modin

  4. spyql

    Query data on the command line with SQL-like SELECTs powered by Python expressions

    You can import the data into a PostgreSQL/MySQL/SQLite/... database and then query the database. However, even with the right choice of indexes, it might take a while to run queries on a table with hundreds of millions of records. You can easily import your data to these databases with SpyQL: $ spyql "SELECT * FROM csv TO sql(table=my_table_name) | sqlite3 my.db" (you would need to create the table my_table_name before running the command).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • ConnectorX: Accelerating Data Loading From Databases to Dataframes

    1 project | news.ycombinator.com | 17 Mar 2025
  • I used multiprocessing and multithreading at the same time to drop the execution time of my code from 155+ seconds to just over 2+ seconds

    1 project | /r/Python | 29 May 2023
  • Go and SQLite in the Cloud

    9 projects | news.ycombinator.com | 7 Dec 2022
  • DoltLab v0.2.0

    5 projects | news.ycombinator.com | 11 Feb 2022
  • How To Secure APIs from SQL Injection Vulnerabilities

    3 projects | dev.to | 19 Mar 2025

Did you know that C is
the 6th most popular programming language
based on number of references?