Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more! Learn more →
Top 23 Python CSV Projects
-
pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
-
Sevalla
Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
-
Project mention: XAN: A Modern CSV-Centric Data Manipulation Toolkit for the Terminal | news.ycombinator.com | 2025-03-27
I used to use q for this sort of thing. Not sure if there are better choices now as it have been a few years.
https://harelba.github.io/q/
-
I've been using LLM-assistance for my larger open source projects - https://github.com/simonw/datasette https://github.com/simonw/llm and https://github.com/simonw/sqlite-utils - for a couple of years now.
Also literally hundreds of smaller plugins and libraries and CLI tools, see https://github.com/simonw?tab=repositories (now at 880 repos) and https://pypi.org/user/simonw/ (340 published packages).
Unlike my tools.simonwillison.net stuff the vast majority of those products are covered by automated tests and usually have comprehensive documentation too.
-
-
csvkit
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
Project mention: Sqawk: A fusion of SQL and Awk: Applying SQL to text-based data files | news.ycombinator.com | 2025-05-26I wonder how this compares to csvkit [1].
[1]: https://csvkit.readthedocs.io/
-
datamodel-code-generator
Pydantic model and dataclasses.dataclass generator for easy conversion of JSON, OpenAPI, JSON Schema, and YAML data sources.
-
django-import-export
Django application and library for importing and exporting data with admin integration.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
ethereum-etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
-
pygraphistry
PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
Nice!
It's interesting from the perspective of maintenance too. You can bet most constants like warp sizes will change, so you get into things like having profiles, autotuners, or not sweating the small stuff.
We went more extreme, and nowadays focus on several layers up: By accepting the (high!) constant overheads of tools like RAPIDS cuDF , we get in exchange the ability to easily crank code with good saturation on the newest GPUs and that any data scientist can edit and extend. Likewise, they just need to understand basics like data movement and columnar analytics data reps to make GPU pipelines. We have ~1 CUDA kernel left and many years of higher-level.
As an example, this is one of the core methods of our new graph query language (think cypher on pandas/spark), and it gets Graph500 level performance on cheapo GPUs just by being data parallel with high saturation per step: https://github.com/graphistry/pygraphistry/blob/master/graph... . Despite ping-ponging a ton because cudf doesn't (yet) coalesce GPU kernel calls, it still places well, and is easy to maintain & extend.
-
Project mention: Show HN: Scraper for job listings directly from company websites | news.ycombinator.com | 2024-12-07
jobfunnel is FOSS and accepting contributions: https://github.com/PaulMcInnis/JobFunnel
Currently supports indeed, in the past supported glassdoor and others.
-
python-benedict
:blue_book: dict subclass with keylist/keypath support, built-in I/O operations (base64, csv, html, ini, json, pickle, plist, query-string, toml, xls, xml, yaml), s3 support and many utilities.
View the Project on GitHub
-
-
extract_otp_secrets
Extract one time password (OTP) secrets from QR codes exported by two-factor authentication (2FA) apps such as "Google Authenticator". The exported QR codes from authentication apps can be captured by camera, read from images, or read from text files. The secrets can be exported to JSON or CSV, or printed as QR codes to console.
- that opened a new need for "safe TOTP replication with offline access", and that's how I ended-up running my own vaultwarden instance and using the bitwarden clients across devices.
I'm glad I did, and I can't recommend it more. IIRC, this¹ helped tremendously along the way.
¹: https://github.com/scito/extract_otp_secrets
-
CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
-
-
-
-
-
-
-
rainbow_csv
🌈Rainbow CSV - Vim plugin: Highlight columns in CSV and TSV files and run queries in SQL-like language
-
pytablewriter
pytablewriter is a Python library to write a table in various formats: AsciiDoc / CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python CSV discussion
Python CSV related posts
-
Sqawk: A fusion of SQL and Awk: Applying SQL to text-based data files
-
A Tool I Built for Synthetic Datasets
-
XAN: A Modern CSV-Centric Data Manipulation Toolkit for the Terminal
-
Show HN: Fuzzy deduplicate any CSV using vector embeddings
-
Developing a CKAN Handler for MindsDB: Bridging Open Data and Machine Learning
-
Export data from Django Admin to CSV
-
Show HN: Django-import-export v4 is out
-
A note from our sponsor - Sevalla
sevalla.com | 2 Sep 2025
Index
What are some of the best open-source CSV projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | pandas-ai | 21,924 |
2 | q | 10,312 |
3 | datasette | 10,296 |
4 | visidata | 8,429 |
5 | csvkit | 6,242 |
6 | datamodel-code-generator | 3,419 |
7 | django-import-export | 3,242 |
8 | ethereum-etl | 3,069 |
9 | pygraphistry | 2,323 |
10 | JobFunnel | 2,063 |
11 | python-benedict | 1,577 |
12 | DataProfiler | 1,511 |
13 | extract_otp_secrets | 1,428 |
14 | CleverCSV | 1,305 |
15 | pyexcel | 1,256 |
16 | municipios-brasileiros | 1,141 |
17 | finviz | 1,124 |
18 | csvs-to-sqlite | 912 |
19 | URS | 909 |
20 | rows | 880 |
21 | rainbow_csv | 672 |
22 | pytablewriter | 633 |
23 | test-lists | 489 |