Open-source projects categorized as SQL

Top 23 SQL Open-Source Projects

  • GitHub repo Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

    Project mention: On explaining technical stuff in a non-technical way — (Py)Spark | dev.to | 2021-04-23

    The homework example illustrates, as I understand it, the over-simplified basic thinking behind Apache Spark (and many similar frameworks and systems, e.g. horizontal or vertical data “sharding”), splitting the data into reasonable groups (called “partitions” in Spark’s case), given the fact that you know what kind of tasks you have to perform on the data, so that you are efficient, and distribute those partitions to ideally equal number of workers (or as many workers as your system can provide). These workers can be in the same machine or in different ones, e.g. each worker on one machine (node). There must be a coordinator of all this effort, to collect all the necessary information that is needed to perform the task and to redistribute the load in case of failure. It is also necessary to have a (network) connection between the coordinator and the workers to communicate and exchange data and information. Or even re-partition the data in case of either failure or when the computations require it (e.g. we need to calculate something on each row of data independently but then we need to group those rows by a key). There is also the concept of doing things in a “lazy” way and use caching to keep track of intermediate results and not having to calculate everything from scratch all the time.

  • GitHub repo tidb

    TiDB is an open source distributed HTAP database compatible with the MySQL protocol

    Project mention: TiGraph: 8,700x Computing Performance Achieved by Combining Graphs + the RDBMS Syntax | dev.to | 2021-04-05

    The three hackers on the TiGraph team are all top developers in the TiDB community:

  • GitHub repo Sequelize

    An easy-to-use multi SQL dialect ORM tool for Node.js

    Project mention: Debugging Chronicles: Serverless offline + Sequelize | dev.to | 2021-05-02

    We immediately check the dependencies update and indeed Sequelize had been updated, then we also found an issue on their Github and some questions on StackOverflow mentioning a similar error. Nevertheless, reverting to the previous working had no effect at all. The error was still happening and we had no clue what other dependency could have something to do with an error so deep in Sequelize codebase.

  • GitHub repo cockroach

    CockroachDB - the open source, cloud-native distributed SQL database.

    Project mention: #30DaysofAppwrite : Appwrite’s building blocks | dev.to | 2021-05-03

    Appwrite uses MariaDB as the default database for project collections, documents, and all other metadata. Appwrite is agnostic to the database you use under the hood and support for more databases like Postgres, CockroachDB, MySQL and MongoDB is currently under active development! 😊

  • GitHub repo OSQuery

    SQL powered operating system instrumentation, monitoring, and analytics.

    Project mention: Is there a way to scan a network for computers running specific software (Java in this case) | reddit.com/r/sysadmin | 2021-04-26

    Many options exist. OSQuery is one, and it's free, and it can be used to grab a bunch of other system information which might be useful at a later date. https://osquery.io/

  • GitHub repo ClickHouse

    ClickHouse® is a free analytics DBMS for big data

    Project mention: Little Analyst in a Big Data Pond | reddit.com/r/datascience | 2021-04-29

    As many have already mentioned this is more of data engineering than Data Science one. Try to build ETL pipelines for storing the data to a data lake or data warehouse(more organised). Make sure the pipelines are reliable and fall back mechanism to ensure consistency. Check out open source DBs https://clickhouse.tech (open-source OLAP database management system) or you can get started with Postgres as well. https://airbyte.io is an open source project which provides data integrations/pipelines.

  • GitHub repo MyBatis

    MyBatis SQL mapper framework for Java

  • GitHub repo Knex

    A query builder for PostgreSQL, MySQL and SQLite3, designed to be flexible, portable, and fun to use.

    Project mention: Generate TypeScript definitions from PostgreSQL | dev.to | 2021-04-14

    I've been enjoying using Knex.js database client for quite some time when implementing GraphQL API backends. One thing that it currently lucks though, is the ability to generate strongly typed (TypeScript) models from the actual database schema.

  • GitHub repo shardingsphere

    Distributed Database Ecosphere

    Project mention: Weekly Developer Roundup #23 - Sun Nov 22 2020 | dev.to | 2020-11-21

    apache/shardingsphere (Java): Distributed database middleware

  • GitHub repo Dapper

    Dapper - a simple object mapper for .Net (by DapperLib)

    Project mention: Is it possible to convert sql string to LINQ expression? | reddit.com/r/csharp | 2021-04-26

    Are you maybe searching for sometrhing like Dapper?

  • GitHub repo Presto

    The official home of the Presto distributed SQL query engine for big data

    Project mention: Inside Presto Optimizer | dev.to | 2021-04-19

    We will use the Presto Foundation fork version 0.245 for this blog post.

  • GitHub repo TimescaleDB

    An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

    Project mention: TimescaleDB Raises $40M | news.ycombinator.com | 2021-05-05

    Fair point about adaptive chunking. You sound like a long-term user!

    There is always a trade-off between getting features to users quickly to experiment and incrementally improve, versus doing it always very conservatively.

    When we launched adaptive chunking (introduced in 0.11, deprecated in 1.2), we explicitly marked it as beta and default off, to hopefully reflect that. [1]

    The approach we are now taking with Timescale Analytics [2] is to have an explicit distinction between experimental features (which will be part of a distinct"experimental" schema in the database, and must be expressly turned on with appropriate warnings) and stable features. Hopefully this can help find a good balance between stability and velocity, but feedback welcome!

    [1] https://github.com/timescale/timescaledb/releases/tag/0.11.0

    [2] https://github.com/timescale/timescale-analytics/tree/main/e...

  • GitHub repo go-sql-driver/mysql

    Go MySQL Driver is a MySQL driver for Go's (golang) database/sql package (by go-sql-driver)

    Project mention: Web Development in Go: Middleware, Templating, Databases & Beyond | dev.to | 2021-01-27

    For example, here's how to use the MySQL driver package with database/sql:

  • GitHub repo dolt

    Dolt – It's Git for Data

    Project mention: Git as a NoSql Database | news.ycombinator.com | 2021-04-05

    I've been very curious to explore this type of use case with askgit (https://github.com/augmentable-dev/askgit) which was designed for running simple "slice and dice" queries and aggregations on git history (and change stats) for basic analytical purposes. I've been curious about how this could be applied to a small text+git based "db". Say, for a regular json or CSV dumps.

    This also reminds me of Dolt: https://github.com/dolthub/dolt which I believe has been on HN a couple times

  • GitHub repo sql.js

    A javascript library to run SQLite on the web.

    Project mention: SQLite Compiled to JavaScript | news.ycombinator.com | 2021-05-03
  • GitHub repo devops-exercises

    Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

    Project mention: Questions you would get asked on an interview? | reddit.com/r/devops | 2021-01-28

    I think the link you're looking for is https://github.com/bregman-arie/devops-exercises

  • GitHub repo rqlite

    The lightweight, distributed relational database built on SQLite

    Project mention: Is it possible to distribute a Sqlite database across several servers? | reddit.com/r/sqlite | 2021-05-05
  • GitHub repo q

    q - Run SQL directly on CSV or TSV files (by harelba)

    Project mention: Practical SQL for Data Analysis(what you can do without Pandas) | news.ycombinator.com | 2021-05-03


    q "SELECT COUNT(*) FROM ./clicks_file.csv WHERE c3 > 32.3"

    It uses sqlite under the hood.

  • GitHub repo Bitwarden

    The core infrastructure backend (API, database, Docker, etc). (by bitwarden)

    Project mention: What are some excellent Github projects that really showcase best practices and great architecture and design? | reddit.com/r/csharp | 2021-05-05

    I really enjoy reading https://github.com/bitwarden/server

  • GitHub repo diesel

    A safe, extensible ORM and Query Builder for Rust

    Project mention: diesel.exe - Application Error | reddit.com/r/rust | 2021-04-20

    I managed to install the diesel cli like they showed on the getting started page, but when I try to run the diesel commands from command promt I get an error box that pops up saying: "The application was unable to start correctly (0xc000007b). Click OK to close the application." Apparently there was a similar issue previously (https://github.com/diesel-rs/diesel/issues/2034) but they just said its probably some missing DLLs but how do I know what DLLs are missing? Any ideas on how to fix this issue?

  • GitHub repo migrate

    Database migrations. CLI and Golang library.

    Project mention: 🎉 The Create Go App project has grown to v2, but is still easier, better, faster & stronger | dev.to | 2021-05-06

    postgres — configured PostgreSQL container with apply migrations (by golang-migrate/migrate tool) for backend.

  • GitHub repo azuredatastudio

    Azure Data Studio is a data management tool that enables working with SQL Server, Azure SQL DB and SQL DW from Windows, macOS and Linux. (by microsoft)

    Project mention: Drawbridge: What SQL Server on Linux is built on | news.ycombinator.com | 2021-01-13

    Cool! How do I enable MySQL support?

    This issue led me to believe it's not implemented yet: https://github.com/Microsoft/azuredatastudio/issues/4904

    And search for MySQL or MariaDB on extensions marketplace nets zero results.

  • GitHub repo usql

    Universal command-line interface for SQL databases

    Project mention: Reading database metadata (schema) | reddit.com/r/golang | 2021-04-29

    A few months ago I started working on adding \d* commands to usql that would allow to list and describe various database objects, like tables, views, indexes, etc. I started looking for existing solutions in Go and stumbled upon this issue: https://github.com/golang/go/issues/7408

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-05-06.


What are some of the best open-source SQL projects? This list will help you:

Project Stars
1 Apache Spark 29,532
2 tidb 27,630
3 Sequelize 24,231
4 cockroach 20,456
5 OSQuery 17,887
6 ClickHouse 15,754
7 MyBatis 15,533
8 Knex 14,097
9 shardingsphere 13,734
10 Dapper 13,664
11 Presto 11,968
12 TimescaleDB 10,826
13 go-sql-driver/mysql 10,806
14 dolt 8,668
15 sql.js 8,532
16 devops-exercises 8,229
17 rqlite 8,202
18 q 8,129
19 Bitwarden 7,712
20 diesel 6,765
21 migrate 6,349
22 azuredatastudio 6,321
23 usql 6,290