SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python CSV Projects
-
pandas-ai
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
In this blog, we will build a powerful IDE agent for PandasAI using Dash Agent. Then later on, we'll understand how using RAG can significantly improve LLM responses.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
-
I wrote an async wrapper around SQLite in Python - I'm using a thread pool: https://github.com/simonw/datasette/blob/main/datasette/data...
I have multiple threads for reads and a single dedicated thread for writes, which I send operations to via a queue. That way I avoid ever having two writes against the same connection at the same time.
-
Project mention: Data Science at the Command Line, 2nd Edition (2021) | news.ycombinator.com | 2024-05-06
I'd like to call out one of my favorite pieces of software from the past 10 years: VisiData [1] has completely changed the way I do ad-hoc data processing, and is now my go-to for pretty much all use cases that I previously used spreadsheets for, and about half of those I previously used databases for.
It's a TUI application, not strictly CLI, but scriptable, and I figure anyone building pipelines using tools like jq, q, awk, grep, etc. to process tabular data will find it extremely useful.
----
[1]: https://visidata.org
-
csvkit
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
-
django-import-export
Django application and library for importing and exporting data with admin integration.
This is where the django-import-export library comes in handy. It provides an easy way to import and export data in various formats, such as CSV, xlsx and more.
-
ethereum-etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Project mention: Blockchain transactions decoding: making wallet activity understandable | dev.to | 2023-10-27Event is a log entity which EVM smart contracts can emit during transaction execution. Events are very good at signalling that an some action has taken place on-chain. Applications can subscribe and listen to events to trigger some off-chain logic or they can index, transform and store events in some off-chain storage (look at The Graph protocol or Ethereum ETL).
-
InfluxDB
Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
-
datamodel-code-generator
Pydantic model and dataclasses.dataclass generator for easy conversion of JSON, OpenAPI, JSON Schema, and YAML data sources.
Project mention: Datamodel-code-generator: Pydantic model/dataclass from OpenAPI, JSON, YAML | news.ycombinator.com | 2023-11-16 -
pygraphistry
PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
Extra fun: We find most enterprise/gov graph analytics work only requires 1-2 attributes to go along with the graph index, and those attributes often are already numeric (time, $, ...) or can be dictionary-encoded as discussed here (categorical, ID, ...)... so even 'tough' billion scale graphs are fine on 1 gpu.
Early, but that's been the basic thinking into our new GFQL system: slice into the columns you want, and then do all the in-GPU traversals you want. In our V1, we keep things dataframe-native include the in-GPU data representation, and are already working on the first extensions to support switching to more graph-native indexing for steps as needed.
Ex: https://github.com/graphistry/pygraphistry/blob/master/demos...
-
-
python-benedict
:blue_book: dict subclass with keylist/keypath support, built-in I/O operations (base64, csv, html, ini, json, pickle, plist, query-string, toml, xls, xml, yaml), s3 support and many utilities.
-
Project mention: LongRoPE: Extending LLM Context Window Beyond 2M Tokens | news.ycombinator.com | 2024-02-22
It's been possible to skip tokenization for a long time, my team and I did it here - https://github.com/capitalone/DataProfiler
For what it's worth, we actually were working with LSTMs with nearly a billion params back in 2016-2017 area. Transformers made it far more effective to train and execute, but ultimately LSTMs are able to achieve similar results, though slow & require more training data.
-
CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
-
You could try and write some simple python using the pyexcel and pandas libraries. I created a tool as a consultant with these packages that parsed spreadsheets with data from factories from all around the world. They did not lock down the Excel files used to submit data and it made it so much harder. If you go this route, I would recommend starting by putting your data into a SQLite database. Once you have your data in a database, you unlock the power of SQL for pulling reports. Also, you can port the data into a proper database if you ever need to. ChatGPT can probably get you a good chunk of the way there.
-
extract_otp_secrets
Extract one time password (OTP) secrets from QR codes exported by two-factor authentication (2FA) apps such as "Google Authenticator". The exported QR codes from authentication apps can be captured by camera, read from images, or read from text files. The secrets can be exported to JSON or CSV, or printed as QR codes to console.
Project mention: Show HN: AuthWin – Authenticator App for Windows | news.ycombinator.com | 2024-03-03This library uses the GPL v3 license: https://github.com/scito/extract_otp_secrets?tab=GPL-3.0-1-o...
Your options are to either go open-source or remove the library.
-
-
-
-
-
If they don't want you to use their API just respect their wishes and scrape Reddit. https://github.com/JosephLai241/URS it's the only moral thing we can do.
-
rainbow_csv
🌈Rainbow CSV - Vim plugin: Highlight columns in CSV and TSV files and run queries in SQL-like language
-
pytablewriter
pytablewriter is a Python library to write a table in various formats: AsciiDoc / CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python CSV discussion
Python CSV related posts
-
Export data from Django Admin to CSV
-
Show HN: Django-import-export v4 is out
-
Plotille: Plot in the terminal using Braille dots
-
Friends don't let friends export to CSV
-
And I thought amazing fics suddenly being deleted was a myth
-
Advice on ETL and Data Sharing work process
-
CSV2Notion Neo - Upload & Merge CSV Data with Images to Notion Database.
-
A note from our sponsor - SaaSHub
www.saashub.com | 17 Sep 2024
Index
What are some of the best open-source CSV projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | pandas-ai | 12,553 |
2 | q | 10,183 |
3 | datasette | 9,389 |
4 | visidata | 7,790 |
5 | csvkit | 5,956 |
6 | django-import-export | 2,999 |
7 | ethereum-etl | 2,920 |
8 | datamodel-code-generator | 2,616 |
9 | pygraphistry | 2,119 |
10 | JobFunnel | 1,770 |
11 | python-benedict | 1,485 |
12 | DataProfiler | 1,414 |
13 | CleverCSV | 1,249 |
14 | pyexcel | 1,198 |
15 | extract_otp_secrets | 1,113 |
16 | municipios-brasileiros | 1,088 |
17 | finviz | 1,052 |
18 | csvs-to-sqlite | 872 |
19 | rows | 865 |
20 | URS | 780 |
21 | rainbow_csv | 614 |
22 | pytablewriter | 606 |
23 | sterraxcyl | 532 |