Python SQL

Open-source Python projects categorized as SQL

Top 23 Python SQL Projects

  • GitHub repo devops-exercises

    Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

    Project mention: Questions you would get asked on an interview? | reddit.com/r/devops | 2021-01-28

    I think the link you're looking for is https://github.com/bregman-arie/devops-exercises

  • GitHub repo q

    q - Run SQL directly on CSV or TSV files (by harelba)

    Project mention: Q – Run SQL Directly on CSV or TSV Files | reddit.com/r/patient_hackernews | 2021-06-07
  • GitHub repo modin

    Modin: Speed up your Pandas workflows by changing a single line of code

    Project mention: How to Speed Up Pandas with 1 Line of Code | reddit.com/r/Python | 2021-03-03
  • GitHub repo datasette

    An open source multi-tool for exploring and publishing data

    Project mention: Help me find a specific HackerNews article | reddit.com/r/node | 2021-06-17

    Not sure for the HN article but the guy is called Simon Willison and the tool is Datasette: https://datasette.io/

  • GitHub repo dataset

    Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

  • GitHub repo SQLAlchemy

    The Database Toolkit for Python

    Project mention: How can I update data on my live website using Python? | reddit.com/r/learnpython | 2021-06-10

    See https://www.sqlalchemy.org/

  • GitHub repo Flask-AppBuilder

    Simple and rapid application development framework, built on top of Flask. includes detailed security, auto CRUD generation for your models, google charts and much more. Demo (login with guest/welcome) - http://flaskappbuilder.pythonanywhere.com/

    Project mention: Splitting flask app into multiple files | reddit.com/r/flask | 2021-03-12

    You can also try FlaskAppbuilder: https://github.com/dpgaspar/Flask-AppBuilder. Some of the large project like Apache Airflow, Apache Superset are built on top of it.

  • GitHub repo django-sql-explorer

    Easily share data across your company via SQL queries. From Grove Collab.

    Project mention: Show HN: Django SQL Dashboard | news.ycombinator.com | 2021-05-10

    Very cool! I wrote Django SQL Explorer[0], and this looks very similar in spirit, but an emphasis on visualization that Explorer does not have (to the extent it has a focus, it's more on providing a reasonable way to write complex queries and re-use them).

    These types of tools are extremely handy.

    [0] https://github.com/groveco/django-sql-explorer

  • GitHub repo Cubes

    Light-weight Python OLAP framework for multi-dimensional data analysis

    Project mention: Building data analysis apps | reddit.com/r/Python | 2021-04-16

    I'm looking for materials and tools to learn. I'm reading up on OLAP and cubes. I found cubes python package but it hasn't been updated in years. Could you give me some tips on what to learn in 2021?

  • GitHub repo djongo

    Django and MongoDB database connector

    Project mention: How to properly set djongo timeout | reddit.com/r/django | 2021-06-22
  • GitHub repo PyPika

    PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

    Project mention: Migrating to SQLAlchemy 2.0 | news.ycombinator.com | 2021-02-18

    There is a middle-ground between writing SQL statement strings in your code, and a full-blown ORM: query builders. At least in my experience with small to medium projects, these have far fewer footguns while keeping the code composable and readable. Here's one for Python: https://github.com/kayak/pypika

  • GitHub repo ethereum-etl

    Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

    Project mention: Trying To Recover Old ETH | reddit.com/r/ethereum | 2021-01-01

    You can use https://github.com/blockchain-etl/ethereum-etl

  • GitHub repo jet-bridge

    Jet Bridge – Admin Panel Framework for your application

  • GitHub repo siuba

    Python library for using dplyr like syntax with pandas and SQL

    Project mention: R / Tidyverse User -> Python | How to Make it Hurt Less | reddit.com/r/rprogramming | 2021-05-21

    Check out siuba

  • GitHub repo finviz

    Unofficial API for finviz.com

    Project mention: A few Github repositories that I 'm planning to go through | reddit.com/r/RKSP | 2021-04-18

    Unofficial finviz

  • GitHub repo Preql

    An interpreted relational query language that compiles to SQL.

    Project mention: Preql: A relational language that compiles to SQL | reddit.com/r/SQL | 2021-03-25

    Hi everyone, I'm happy to introduce Preql.

  • GitHub repo rainbow_csv

    🌈Rainbow CSV - Vim plugin: Highlight columns in CSV and TSV files and run queries in SQL-like language

    Project mention: Any recommendations for a cli CSV editor? | reddit.com/r/commandline | 2021-03-24

    Rainbow CSV, for visual color highlighting and has an SQL-like language for running data queries.

  • GitHub repo fastapi-crudrouter

    A dynamic FastAPI router that automatically creates CRUD routes for your models

    Project mention: FastAPI framework, high perf, easy to learn, fast to code, ready for production | news.ycombinator.com | 2021-02-01

    Thanks, that's a really helpful example.

    Where I think this could be taken to the next level of reusability is in modularising the front-end into API-specific components. For example, the login behaviour could depend on FastAPI-Users, with a sibling frontend library containing components that implement the same login flow. Adding user behaviour is then a matter of using the same third-party library on the front and back end.

    This approach could be extended to other components such as an admin panel (perhaps using https://github.com/awtkns/fastapi-crudrouter), or a blogging component.

  • GitHub repo jaydebeapi

    JayDeBeApi module allows you to connect from Python code to databases using Java JDBC. It provides a Python DB-API v2.0 to that database.

    Project mention: Bulk load Pandas DataFrames into SQL databases using Jaydebeapi | dev.to | 2021-05-09

    Loading Pandas DataFrames into SQL databases of all names is a common task between all developers working on building data pipelines for their environments or trying to automate ETL jobs generally. And for that, Pandas DataFrame class has the built-in method pandas.DataFrame.to_sql that allows to do so very quickly, for SQLite and all the databases supported by SQLAlchemy library, and when it comes to those who doesn’t have a good support by it ( in my case was IBM DB2 ), developers find themselves forced to think twice for some work around in order to get the job done. Jaydebeapi introduces himself as a good alternative, and it’s particularly seen thus by all developers coming from a Java background and having some familiarities of working with JDBC API to access the database. Let’s start first by creating the database connection. for that reason I will be creating a simple function that takes in params all the informations required and it will give a connection to DB2 as a return.

  • GitHub repo django-sql-dashboard

    Django app for building dashboards using raw SQL queries

    Project mention: Why is uncoupled documentation bad? | news.ycombinator.com | 2021-06-06

    I use documentation systems that publish the documentation from the repo to a website. Most of my projects use Sphinx and reStructuredText for this, but I recently tried MyST (Markdown for Sphinx) and I like that a lot.

    Some examples:

    - https://docs.datasette.io serves documentation from https://github.com/simonw/datasette - which has documentation unit tests here: https://github.com/simonw/datasette

    - https://django-sql-dashboard.datasette.io/ serves from markdown in https://github.com/simonw/django-sql-dashboard - I don't have documentation unit tests for that yet

  • GitHub repo splitgraph

    Splitgraph command line client and python library

    Project mention: Cloudera taken private for $5.3b, acquires Datacoral and Cazena | news.ycombinator.com | 2021-06-01

    The data industry continues to hype this idea of “multi-cloud,” but then the “modern data stack” is centralized around a single warehouse and nobody sees any irony in that.

    The big bet we’re making at Splitgraph [0] is that the next wave of data engineering will take a more decentralized, “data mesh” type approach to enterprise architecture. “Data gravity” really does exist -expensive to move, in terms of both cost and operational complexity. So instead of bringing the data to the query, why not bring the query to the data? All we need for that is a set of read only credentials.

    Cloudera mentions they bought DataCoral to help with data integration and connectors. They’ve correctly identified the problem - data sprawl and fragmentation will inevitably grow - but I’m not sure they have the right solution.

    Data integration is important, but it’s a moving target, which is why it calls for a collaborative open source solution. This is why so many new startups, like AirByte most recently, are coalescing around the Singer taps that Stitch left behind after its acquisition by Talend.

    We also support using Singer taps to ingest data into versioned Splitgraph images [1], so we’re excited to see more collaboration on maintenance of taps. For us it’s a useful feature, but it should be just that — a feature. Is there really a need to replicate all of your data before you can even query it? Or would you rather experiment by directly querying its source?

    [0] https://www.splitgraph.com

    [1] unreleased and undocumented atm, but it does work. We’re hiring, especially on the frontend if you want to help build the web UI. See profile.

  • GitHub repo snowflake-connector-python

    Snowflake Connector for Python

    Project mention: Loading a file-like object to Snowflake via Python? | reddit.com/r/snowflake | 2021-03-26

    I actually just found a useful comment on https://github.com/snowflakedb/snowflake-connector-python/issues/317 saying that this isn't supported yet. Its fine, I can just go through S3 in the meantime.

  • GitHub repo django-pgviews

    Fork of django-postgres that focuses on maintaining and improving support for Postgres SQL Views.

    Project mention: How to speed up Django when querying large data? | reddit.com/r/django | 2021-03-25

    Ive used https://github.com/mypebble/django-pgviews for the same purpose. I like the idea that it adds the sql into git directly. Only works for postgres though.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-06-22.

Index

What are some of the best open-source SQL projects in Python? This list will help you:

Project Stars
1 devops-exercises 8,603
2 q 8,380
3 modin 6,120
4 datasette 5,178
5 dataset 4,043
6 SQLAlchemy 3,875
7 Flask-AppBuilder 3,368
8 django-sql-explorer 1,863
9 Cubes 1,408
10 djongo 1,317
11 PyPika 1,257
12 ethereum-etl 1,063
13 jet-bridge 966
14 siuba 641
15 finviz 574
16 Preql 368
17 rainbow_csv 353
18 fastapi-crudrouter 281
19 jaydebeapi 255
20 django-sql-dashboard 253
21 splitgraph 228
22 snowflake-connector-python 214
23 django-pgviews 154