SQL

Open-source projects categorized as SQL | Edit details

Top 23 SQL Open-Source Projects

  • GitHub repo Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

    Project mention: Show HN: Box – Data Transformation Pipelines in Rust DataFusion | news.ycombinator.com | 2021-11-30

    A while ago I posted a link to [Arc](https://news.ycombinator.com/item?id=26573930) a declarative method for defining repeatable data pipelines which execute against [Apache Spark](https://spark.apache.org/).

    Today I would like to present a proof-of-concept implementation of the [Arc declarative ETL framework](https://arc.tripl.ai) against [Apache Datafusion](https://arrow.apache.org/datafusion/) which is an Ansi SQL (Postgres) execution engine based upon Apache Arrow and built with Rust.

    The idea of providing a declarative 'configuration' language for defining data pipelines was planned from the beginning of the Arc project to allow changing execution engines without having to rewrite the base business logic (the part that is valuable to your business). Instead, by defining an abstraction layer, we can change the execution engine and run the same logic with different execution characteristics.

    The benefit of the DataFusion over Apache Spark is a significant increase in speed and reduction in execution resource requirements. Even through a Docker-for-Mac inefficiency layer the same job completes in ~4 seconds with DataFusion vs ~24 seconds with Apache Spark (including JVM startup time). Without Docker-for-Mac layer end-to-end execution times of 0.5 second for the same example job (TPC-H) is possible. * the aim is not to start a benchmarking flamewar but to provide some indicative data *.

    The purpose of this post is to gather feedback from the community whether you would use a tool like this, what features would be required for you to use it (MVP) or whether you would be interested in contributing to the project. I would also like to highlight the excellent work being done by the DataFusion/Arrow (and Apache) community for providing such amazing tools to us all as open source projects.

  • GitHub repo tidb

    TiDB is an open source distributed HTAP database compatible with the MySQL protocol

    Project mention: Comparing Nginx Performance in Bare Metal and Virtual Environments | news.ycombinator.com | 2021-10-29

    I do agree with you in that regard, however, that's also a dangerous line of thinking.

    There are attempts to provide horizontal scalability for RDBMSes in a transparent way, like TiDB https://pingcap.com/ (which is compatible with the MySQL 5.7 drivers), however, the list of functionality that's sacrificed to achieve easily extensible clusters is a long one: https://docs.pingcap.com/tidb/stable/mysql-compatibility

    There are other technologies, like MongoDB, which sometimes are more successful at a clustered configuration, however most of the traditional RDBMSes work best in a leader-follower type of replication scenario, because even those aforementioned systems oftentimes have data consistency issues that may eventually pop up.

    Essentially, my argument is that the lack of good horizontally scalable databases or other data storage solutions is easily explainable by the fact that the problem itself isn't solvable in any easy way, apart from adopting eventual consistency, which is probably going to create more problems than it will solve in case of any pre-existing code that makes assumptions about what ways it'll be able to access data and operate on it: https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...

    To that end, i'd perhaps like to suggest an alternative: use a single vertically scalable RDBMS instance when possible, with a hot standby if you have the resources for that. Let the architecture around it be horizontally scalable instead, and let it deal with the complexities of balancing the load and dealing with backpressure - introduce a message queue if you must, maybe even an in-memory one for simplicity's sake, or consider an event based architecture where "what needs to be done" is encapsulated within a data structure that can be passed around and applied whenever possible. In my eyes, such solutions can in many cases be better than losing the many benefits of having a single source of truth.

    Alternatively, consider sharding as a possibility, or, alternatively, do some domain driven design, figure out where to draw some boundaries and split your service into multiple ones that cover the domain with which you need to work with. Then you have one DB for sales, one for account management, one for reports and so on, all separated by something as simple as REST interfaces and with rate limits or any of the other mechanisms.

    If, however, neither of those two groups of approaches don't seem to be suitable for the loads that you're dealing with, then you probably have a team of very smart people and a large amount of resources to figure out what will work best.

    To sum up, if there are no good solutions in the space, perhaps that's because the problems themselves haven't been solved yet. Thus, sooner or later, they'll need to be sidestepped and their impact mitigated in whatever capacity is possible.

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo Sequelize

    An easy-to-use and promise-based multi SQL dialects ORM tool for Node.js

    Project mention: What is the consensus about using ORM in node js applications? | reddit.com/r/node | 2021-11-28
  • GitHub repo dbeaver

    Free universal database tool and SQL client

    Project mention: How to Change Another User's Password? | reddit.com/r/dbeaver | 2021-10-06
  • GitHub repo cockroach

    CockroachDB - the open source, cloud-native distributed SQL database.

    Project mention: Composing generic data structures in go | dev.to | 2021-11-30

    Recently a colleague, Nathan, reflecting on CockroachDB, remarked (paraphrased from memory) that the key data structure is the interval btree. The story of Nathan’s addition of the first interval btree to cockroach and the power of copy-on-write data structures is worthy of its own blog post for another day. It’s Nathan’s hand-specialization of that data structure that provided the basis (and tests) for the generalization I’ll be presenting here. The reason for this specialization was as much for the performance wins of avoiding excessive allocations, pointer chasing, and cost of type assertions when using interface boxing.

  • GitHub repo ClickHouse

    ClickHouse® is a free analytics DBMS for big data

    Project mention: Stream Processing Database | reddit.com/r/Database | 2021-11-28

    There's ksqldb (open source, built with java) and materialize (there's standalone edition), both need to use Kafka/RedPanda, also Clickhouse (open source, with materialize view with specific engine, but need to buffer the inserts using proxy like KittenHouse or buffering library like ch-timed-buffer), is there any other alternative to those 3 (that similarly doesn't do full scan to do aggregation)?

  • GitHub repo devops-exercises

    Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

    Project mention: What language should I learn after Java and Python and how should I use my knowledge to learn about their applications in the real world? | reddit.com/r/computerscience | 2021-11-21

    What about the applications in the real world? Github are the answer. Some weeks ago I found a repository where a list of ideas to create your technologies show: link. If you are lost, try to guide yourself through roadmaps. Or just search for exercises on GitHub about your profession. Example: DevOps Exercises.

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo OSQuery

    SQL powered operating system instrumentation, monitoring, and analytics.

    Project mention: Open Source Tanium Alternative (Cannot Remember It's Name) | reddit.com/r/sysadmin | 2021-11-22

    You might be thinking of osquery?

  • GitHub repo MyBatis

    MyBatis SQL mapper framework for Java

    Project mention: 20 years of Hibernate | reddit.com/r/java | 2021-05-24

    How about batch insert, updates, and deletes? I had to fix a broken MyBatis project recently and was surprised that this feature doesn't even seem to be implemented, at least according to this GitHub issue.

  • GitHub repo Knex

    A query builder for PostgreSQL, MySQL, CockroachDB, SQL Server, SQLite3 and Oracle, designed to be flexible, portable, and fun to use.

    Project mention: What database should i use with node. | reddit.com/r/node | 2021-11-26

    Knex.js works better than node-postgres at least in the cases where you need to dynamically build a SQL query based on (URL) arguments. For example:

  • GitHub repo shardingsphere

    Build criterion and ecosystem above multi-model databases

    Project mention: Updates and FAQ — Your 1 Minute Quick Start Guide to ShardingSphere | dev.to | 2021-11-30

    shardingsphere-example is an independent Maven project. It’s preserved in the “examples” file of Apache ShardingSphere. [Link]:(https://github.com/apache/shardingsphere/tree/master/examples)

  • GitHub repo Dapper

    Dapper - a simple object mapper for .Net

    Project mention: Dapper & CQRS | dev.to | 2021-08-10

    The "legacy stack" as it came to be known was written in .Net 4.5.3 using Entity Framework and the classic repository / unit of work pattern. As we worked with Entity Framework Core in .Net Core we found that we were not improving our query speeds. They were still slow. Usually in the area of 250ms on up based on the query. Entity Framework wasn't going to cut it for this "new stack" code. We decided to try Dapper. Along with Dapper we decided to adopt a different pattern with how our back end code would be structured and data would be delivered. After reading quite a bit about CQRS (Command Query Responsibility Segregation) and finding some great examples online we settled on this pattern. This article in particular was very useful. Though we didn't follow it exactly, we stole quite a few ideas from it.

  • GitHub repo Presto

    The official home of the Presto distributed SQL query engine for big data

    Project mention: Let's write a compiler, part 5: A code generator | news.ycombinator.com | 2021-08-19
  • GitHub repo TimescaleDB

    An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

    Project mention: Entity Relationship Diagram for an Equities Database (finance/stocks) | reddit.com/r/Database | 2021-11-10

    you probably want a time series database like https://www.timescale.com/

  • GitHub repo go-sql-driver/mysql

    Go MySQL Driver is a MySQL driver for Go's (golang) database/sql package (by go-sql-driver)

    Project mention: Help a Go Lang neophyte become a veteran | reddit.com/r/golang | 2021-12-01

    For DB connection, you'll need something like https://github.com/Go-SQL-Driver/MySQL/ for mysql

  • GitHub repo dolt

    Dolt – It's Git for Data

    Project mention: Dolt Is Git for Data | news.ycombinator.com | 2021-09-27
  • GitHub repo sql.js

    A javascript library to run SQLite on the web.

    Project mention: DuckDB-WASM: Efficient Analytical SQL in the Browser | news.ycombinator.com | 2021-10-29
  • GitHub repo rqlite

    The lightweight, distributed relational database built on SQLite

    Project mention: Cloudflare Durable Objects Are Now Generally Available | news.ycombinator.com | 2021-11-15
  • GitHub repo q

    q - Run SQL directly on CSV or TSV files (by harelba)

    Project mention: Compile Python applications into stand-alone executables | news.ycombinator.com | 2021-12-03

    How did you go about developing against the Starlark API, any IDE support?

    [0] https://github.com/harelba/q/blob/master/pyoxidizer.bzl

  • GitHub repo Bitwarden

    The core infrastructure backend (API, database, Docker, etc). (by bitwarden)

    Project mention: Account hacked [java] | reddit.com/r/MinecraftHelp | 2021-12-03

    That's really bad security. Every password should be unique. Then, if someone gets one password, they don't have access to everything. If you have trouble remembering passwords, use a password manager like BitWarden

  • GitHub repo beekeeper-studio

    Modern and easy to use SQL client for MySQL, Postgres, SQLite, SQL Server, and more. Linux, MacOS, and Windows.

    Project mention: Any developer tools/subscriptions/packages that are worth buying paid/pro plan? | reddit.com/r/webdev | 2021-11-25

    In case anyone is looking for an alternative, Beekeeper Studio is decent (and free). It's not flawless, and it doesn't have as many features yet, but it's faster and less buggy on Windows at least.

  • GitHub repo diesel

    A safe, extensible ORM and Query Builder for Rust

    Project mention: Database | dev.to | 2021-11-26

    A web application wont be complete without one, using a nifty tool called diesel we'll be able to create database tables, schemas and make queries in Rust but first we need a database installed, ill be using postgresql, also make sure you install just the postgresql feature of diesel.

  • GitHub repo migrate

    Database migrations. CLI and Golang library.

    Project mention: Golang for backend | reddit.com/r/golang | 2021-12-01

    Migrations: migrate

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-12-03.

SQL related posts

Index

What are some of the best open-source SQL projects? This list will help you:

Project Stars
1 Apache Spark 31,485
2 tidb 29,718
3 Sequelize 25,332
4 dbeaver 23,407
5 cockroach 22,567
6 ClickHouse 20,779
7 devops-exercises 19,638
8 OSQuery 18,424
9 MyBatis 16,556
10 Knex 14,990
11 shardingsphere 14,919
12 Dapper 14,268
13 Presto 12,902
14 TimescaleDB 12,140
15 go-sql-driver/mysql 11,690
16 dolt 9,800
17 sql.js 9,547
18 rqlite 9,085
19 q 8,765
20 Bitwarden 8,716
21 beekeeper-studio 7,860
22 diesel 7,632
23 migrate 7,585
Find remote jobs at our new job board 99remotejobs.com. There are 32 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com