SQLite Internals: How the Most Used Database Works

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • sqlite-parser

    An ANTLR4 grammar for SQLite statements. (by bkiers)

  • > ...than it would be to learn the exact syntax and quirks and possibly bugs of someone else's implementation...

    Yup. Also, having deep knowledge of the language is required.

    SQLite's grammar is neat. Creating a compatible parser would make a fun project. Here's a pretty good example: https://github.com/bkiers/sqlite-parser (Actual ANTLR 4 grammar: https://github.com/bkiers/sqlite-parser/blob/master/src/main... )

    Postgres, which tries to be compliant with the latest standards, however...

    SQL-2016 is a beast. Not to mention all the dialects.

    I'm updating my personal (soon to be FOSS) grammar from ANTLR 3 LL(k) to ANTLR 4 ALL().

    I've long had a working knowledge of SQL-92, with some SQL-1999 (eg common table expressions).

    But the new structures and extensions are a bit overwhelming.

    Fortunately, ANTLR project has ~dozen FOSS grammars to learn from. https://github.com/antlr/grammars-v4/tree/master/sql

    They mostly mechanically translate BNFs to LL(k) with some ALL(). Meaning few take advantage of left-recursion. https://github.com/antlr/antlr4/blob/master/doc/left-recursi...

    Honestly, I struggled to understand these grammars. Plus, not being conversant with the SQL-2016 was a huge impediment. Just finding a succinct corbis of test cases was a huge hurdle for me.

    Fortunately, the H2 Database project is a great resource. https://github.com/h2database/h2database/tree/master/h2/src/...

    Now for the exciting conclusion...

    My ANTLR grammar which passes all of H2's tests looks nothing like any of the official or product specific BNFs.

    Further, I found discrepancy between the product specific BNFs and their implementations.

    So a lot of trial & error is required for a "real world" parser. Which would explain why the professional SQL parsing tools charge money.

    I still think creating a parser for SQLite is a great project.

  • grammars-v4

    Grammars written for ANTLR v4; expectation that the grammars are free of actions.

  • > ...than it would be to learn the exact syntax and quirks and possibly bugs of someone else's implementation...

    Yup. Also, having deep knowledge of the language is required.

    SQLite's grammar is neat. Creating a compatible parser would make a fun project. Here's a pretty good example: https://github.com/bkiers/sqlite-parser (Actual ANTLR 4 grammar: https://github.com/bkiers/sqlite-parser/blob/master/src/main... )

    Postgres, which tries to be compliant with the latest standards, however...

    SQL-2016 is a beast. Not to mention all the dialects.

    I'm updating my personal (soon to be FOSS) grammar from ANTLR 3 LL(k) to ANTLR 4 ALL().

    I've long had a working knowledge of SQL-92, with some SQL-1999 (eg common table expressions).

    But the new structures and extensions are a bit overwhelming.

    Fortunately, ANTLR project has ~dozen FOSS grammars to learn from. https://github.com/antlr/grammars-v4/tree/master/sql

    They mostly mechanically translate BNFs to LL(k) with some ALL(). Meaning few take advantage of left-recursion. https://github.com/antlr/antlr4/blob/master/doc/left-recursi...

    Honestly, I struggled to understand these grammars. Plus, not being conversant with the SQL-2016 was a huge impediment. Just finding a succinct corbis of test cases was a huge hurdle for me.

    Fortunately, the H2 Database project is a great resource. https://github.com/h2database/h2database/tree/master/h2/src/...

    Now for the exciting conclusion...

    My ANTLR grammar which passes all of H2's tests looks nothing like any of the official or product specific BNFs.

    Further, I found discrepancy between the product specific BNFs and their implementations.

    So a lot of trial & error is required for a "real world" parser. Which would explain why the professional SQL parsing tools charge money.

    I still think creating a parser for SQLite is a great project.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • ANTLR

    ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

  • > ...than it would be to learn the exact syntax and quirks and possibly bugs of someone else's implementation...

    Yup. Also, having deep knowledge of the language is required.

    SQLite's grammar is neat. Creating a compatible parser would make a fun project. Here's a pretty good example: https://github.com/bkiers/sqlite-parser (Actual ANTLR 4 grammar: https://github.com/bkiers/sqlite-parser/blob/master/src/main... )

    Postgres, which tries to be compliant with the latest standards, however...

    SQL-2016 is a beast. Not to mention all the dialects.

    I'm updating my personal (soon to be FOSS) grammar from ANTLR 3 LL(k) to ANTLR 4 ALL().

    I've long had a working knowledge of SQL-92, with some SQL-1999 (eg common table expressions).

    But the new structures and extensions are a bit overwhelming.

    Fortunately, ANTLR project has ~dozen FOSS grammars to learn from. https://github.com/antlr/grammars-v4/tree/master/sql

    They mostly mechanically translate BNFs to LL(k) with some ALL(). Meaning few take advantage of left-recursion. https://github.com/antlr/antlr4/blob/master/doc/left-recursi...

    Honestly, I struggled to understand these grammars. Plus, not being conversant with the SQL-2016 was a huge impediment. Just finding a succinct corbis of test cases was a huge hurdle for me.

    Fortunately, the H2 Database project is a great resource. https://github.com/h2database/h2database/tree/master/h2/src/...

    Now for the exciting conclusion...

    My ANTLR grammar which passes all of H2's tests looks nothing like any of the official or product specific BNFs.

    Further, I found discrepancy between the product specific BNFs and their implementations.

    So a lot of trial & error is required for a "real world" parser. Which would explain why the professional SQL parsing tools charge money.

    I still think creating a parser for SQLite is a great project.

  • H2

    H2 is an embeddable RDBMS written in Java.

  • > ...than it would be to learn the exact syntax and quirks and possibly bugs of someone else's implementation...

    Yup. Also, having deep knowledge of the language is required.

    SQLite's grammar is neat. Creating a compatible parser would make a fun project. Here's a pretty good example: https://github.com/bkiers/sqlite-parser (Actual ANTLR 4 grammar: https://github.com/bkiers/sqlite-parser/blob/master/src/main... )

    Postgres, which tries to be compliant with the latest standards, however...

    SQL-2016 is a beast. Not to mention all the dialects.

    I'm updating my personal (soon to be FOSS) grammar from ANTLR 3 LL(k) to ANTLR 4 ALL().

    I've long had a working knowledge of SQL-92, with some SQL-1999 (eg common table expressions).

    But the new structures and extensions are a bit overwhelming.

    Fortunately, ANTLR project has ~dozen FOSS grammars to learn from. https://github.com/antlr/grammars-v4/tree/master/sql

    They mostly mechanically translate BNFs to LL(k) with some ALL(). Meaning few take advantage of left-recursion. https://github.com/antlr/antlr4/blob/master/doc/left-recursi...

    Honestly, I struggled to understand these grammars. Plus, not being conversant with the SQL-2016 was a huge impediment. Just finding a succinct corbis of test cases was a huge hurdle for me.

    Fortunately, the H2 Database project is a great resource. https://github.com/h2database/h2database/tree/master/h2/src/...

    Now for the exciting conclusion...

    My ANTLR grammar which passes all of H2's tests looks nothing like any of the official or product specific BNFs.

    Further, I found discrepancy between the product specific BNFs and their implementations.

    So a lot of trial & error is required for a "real world" parser. Which would explain why the professional SQL parsing tools charge money.

    I still think creating a parser for SQLite is a great project.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts