Python CSV

Open-source Python projects categorized as CSV

Top 23 Python CSV Projects

  • q

    q - Run SQL directly on delimited files and multi-file sqlite databases (by harelba)

    Project mention: I wrote this iCalendar (.ics) command-line utility to turn common calendar exports into more broadly compatible CSV files. | /r/commandline | 2023-03-24

    CSV utilities (still haven't pick a favorite one...): https://github.com/harelba/q https://github.com/BurntSushi/xsv https://github.com/wireservice/csvkit https://github.com/johnkerl/miller

  • datasette

    An open source multi-tool for exploring and publishing data

    Project mention: Little Data: How do we query personal data? (2013) | news.ycombinator.com | 2024-03-01

    I'm a fan on simonw's datasette/dogsheep ecosystem https://datasette.io/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • visidata

    A terminal spreadsheet multitool for discovering and arranging data

    Project mention: Fx – Terminal JSON Viewer | news.ycombinator.com | 2023-09-19

    [4] "Is it possible to "flatten" structured data (like JSON?)": https://github.com/saulpw/visidata/discussions/1605

  • csvkit

    A suite of utilities for converting to and working with CSV, the king of tabular file formats.

    Project mention: I wrote this iCalendar (.ics) command-line utility to turn common calendar exports into more broadly compatible CSV files. | /r/commandline | 2023-03-24

    CSV utilities (still haven't pick a favorite one...): https://github.com/harelba/q https://github.com/BurntSushi/xsv https://github.com/wireservice/csvkit https://github.com/johnkerl/miller

  • django-import-export

    Django application and library for importing and exporting data with admin integration.

    Project mention: Import or load a json into a database | /r/django | 2023-04-19

    django-import-export provides a sophisticated framework for importing data. Good if you need to do this on a regular basis and need to do some work on the data before writing to the database.

  • ethereum-etl

    Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

    Project mention: Blockchain transactions decoding: making wallet activity understandable | dev.to | 2023-10-27

    Event is a log entity which EVM smart contracts can emit during transaction execution. Events are very good at signalling that an some action has taken place on-chain. Applications can subscribe and listen to events to trigger some off-chain logic or they can index, transform and store events in some off-chain storage (look at The Graph protocol or Ethereum ETL).

  • datamodel-code-generator

    Pydantic model and dataclasses.dataclass generator for easy conversion of JSON, OpenAPI, JSON Schema, and YAML data sources.

    Project mention: Datamodel-code-generator: Pydantic model/dataclass from OpenAPI, JSON, YAML | news.ycombinator.com | 2023-11-16
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • pygraphistry

    PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

    Project mention: The "missing" graph datatype already exists. It was invented in the '70s | news.ycombinator.com | 2024-03-05

    If you enjoy this kind of thinking, we recently released GFQL for dataframe-native graph querying & compute

    Imagine Neo4j Cypher, except no need for a database -- just import it -- and automatically vectorizes for significantly faster CPU+GPU performance. This is fundamentally similar to the kinds of implementations a datalog approach enables. (And indeed one of the alternative interfaces we were considering!)

    We've run it on 100M+ edge graphs on some of the cheapest GPUs you can get, and are getting ready for the next rev with aggregate compute: https://github.com/graphistry/pygraphistry/blob/master/demos...

  • JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

  • python-benedict

    :blue_book: dict subclass with keylist/keypath support, built-in I/O operations (base64, csv, html, ini, json, pickle, plist, query-string, toml, xls, xml, yaml), s3 support and many utilities.

  • DataProfiler

    What's in your data? Extract schema, statistics and entities from datasets

    Project mention: LongRoPE: Extending LLM Context Window Beyond 2M Tokens | news.ycombinator.com | 2024-02-22

    It's been possible to skip tokenization for a long time, my team and I did it here - https://github.com/capitalone/DataProfiler

    For what it's worth, we actually were working with LSTMs with nearly a billion params back in 2016-2017 area. Transformers made it far more effective to train and execute, but ultimately LSTMs are able to achieve similar results, though slow & require more training data.

  • CleverCSV

    CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

    Project mention: Parquet: more than just "Turbo CSV" | /r/programming | 2023-04-03

    There’s things like this, but I consider the existence of messy, non standard CSV files (backed by a decade of experience dealing with the problem) a strong reason to not use the format ever.

  • pyexcel

    Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

    Project mention: Advice on ETL and Data Sharing work process | /r/ETL | 2023-11-07

    You could try and write some simple python using the pyexcel and pandas libraries. I created a tool as a consultant with these packages that parsed spreadsheets with data from factories from all around the world. They did not lock down the Excel files used to submit data and it made it so much harder. If you go this route, I would recommend starting by putting your data into a SQLite database. Once you have your data in a database, you unlock the power of SQL for pulling reports. Also, you can port the data into a proper database if you ever need to. ChatGPT can probably get you a good chunk of the way there.

  • municipios-brasileiros

    :house_with_garden: Dados relacionados aos municípios brasileiros

  • finviz

    Unofficial API for finviz.com

    Project mention: Scraping Realtime Data from finviz | /r/algotrading | 2023-03-23

    https://github.com/mariostoev/finviz may be helpful to you

  • extract_otp_secrets

    Extract one time password (OTP) secrets from QR codes exported by two-factor authentication (2FA) apps such as "Google Authenticator". The exported QR codes from authentication apps can be captured by camera, read from images, or read from text files. The secrets can be exported to JSON or CSV, or printed as QR codes to console.

    Project mention: Show HN: AuthWin – Authenticator App for Windows | news.ycombinator.com | 2024-03-03

    This library uses the GPL v3 license: https://github.com/scito/extract_otp_secrets?tab=GPL-3.0-1-o...

    Your options are to either go open-source or remove the library.

  • rows

    A common, beautiful interface to tabular data, no matter the format

  • csvs-to-sqlite

    Convert CSV files into a SQLite database

  • URS

    Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

    Project mention: Nitter Shutting Down | news.ycombinator.com | 2024-01-27

    If they don't want you to use their API just respect their wishes and scrape Reddit. https://github.com/JosephLai241/URS it's the only moral thing we can do.

  • pytablewriter

    pytablewriter is a Python library to write a table in various formats: AsciiDoc / CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.

  • rainbow_csv

    🌈Rainbow CSV - Vim plugin: Highlight columns in CSV and TSV files and run queries in SQL-like language

    Project mention: Looking for two plugins for Log Analysis | /r/neovim | 2023-04-26

    Probably not an exact fit, but this plugin came to mind: rainbow_csv

  • sterraxcyl

    Instagram OSINT tool to export and analyse followers | following with their details

    Project mention: Tool to see mutual followers of several Instagram pages? | /r/OSINT | 2023-11-17
  • test-lists

    URL testing lists intended for discovering website censorship

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-05.

Python CSV related posts

Index

What are some of the best open-source CSV projects in Python? This list will help you:

Project Stars
1 q 10,092
2 datasette 8,764
3 visidata 7,328
4 csvkit 5,776
5 django-import-export 2,835
6 ethereum-etl 2,792
7 datamodel-code-generator 2,196
8 pygraphistry 2,022
9 JobFunnel 1,709
10 python-benedict 1,385
11 DataProfiler 1,342
12 CleverCSV 1,197
13 pyexcel 1,168
14 municipios-brasileiros 1,048
15 finviz 996
16 extract_otp_secrets 919
17 rows 859
18 csvs-to-sqlite 854
19 URS 709
20 pytablewriter 591
21 rainbow_csv 564
22 sterraxcyl 467
23 test-lists 394
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com