SQLite Internals: How the Most Used Database Works

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • LearnThisRepo.com - Learn 300+ open source libraries for free using AI.
  • WorkOS - The modern API for authentication & user identity.
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • sqlite-parser

    An ANTLR4 grammar for SQLite statements. (by bkiers)

    > ...than it would be to learn the exact syntax and quirks and possibly bugs of someone else's implementation...

    Yup. Also, having deep knowledge of the language is required.

    SQLite's grammar is neat. Creating a compatible parser would make a fun project. Here's a pretty good example: https://github.com/bkiers/sqlite-parser (Actual ANTLR 4 grammar: https://github.com/bkiers/sqlite-parser/blob/master/src/main... )

    Postgres, which tries to be compliant with the latest standards, however...

    SQL-2016 is a beast. Not to mention all the dialects.

    I'm updating my personal (soon to be FOSS) grammar from ANTLR 3 LL(k) to ANTLR 4 ALL().

    I've long had a working knowledge of SQL-92, with some SQL-1999 (eg common table expressions).

    But the new structures and extensions are a bit overwhelming.

    Fortunately, ANTLR project has ~dozen FOSS grammars to learn from. https://github.com/antlr/grammars-v4/tree/master/sql

    They mostly mechanically translate BNFs to LL(k) with some ALL(). Meaning few take advantage of left-recursion. https://github.com/antlr/antlr4/blob/master/doc/left-recursi...

    Honestly, I struggled to understand these grammars. Plus, not being conversant with the SQL-2016 was a huge impediment. Just finding a succinct corbis of test cases was a huge hurdle for me.

    Fortunately, the H2 Database project is a great resource. https://github.com/h2database/h2database/tree/master/h2/src/...

    Now for the exciting conclusion...

    My ANTLR grammar which passes all of H2's tests looks nothing like any of the official or product specific BNFs.

    Further, I found discrepancy between the product specific BNFs and their implementations.

    So a lot of trial & error is required for a "real world" parser. Which would explain why the professional SQL parsing tools charge money.

    I still think creating a parser for SQLite is a great project.

  • grammars-v4

    Grammars written for ANTLR v4; expectation that the grammars are free of actions.

    > ...than it would be to learn the exact syntax and quirks and possibly bugs of someone else's implementation...

    Yup. Also, having deep knowledge of the language is required.

    SQLite's grammar is neat. Creating a compatible parser would make a fun project. Here's a pretty good example: https://github.com/bkiers/sqlite-parser (Actual ANTLR 4 grammar: https://github.com/bkiers/sqlite-parser/blob/master/src/main... )

    Postgres, which tries to be compliant with the latest standards, however...

    SQL-2016 is a beast. Not to mention all the dialects.

    I'm updating my personal (soon to be FOSS) grammar from ANTLR 3 LL(k) to ANTLR 4 ALL().

    I've long had a working knowledge of SQL-92, with some SQL-1999 (eg common table expressions).

    But the new structures and extensions are a bit overwhelming.

    Fortunately, ANTLR project has ~dozen FOSS grammars to learn from. https://github.com/antlr/grammars-v4/tree/master/sql

    They mostly mechanically translate BNFs to LL(k) with some ALL(). Meaning few take advantage of left-recursion. https://github.com/antlr/antlr4/blob/master/doc/left-recursi...

    Honestly, I struggled to understand these grammars. Plus, not being conversant with the SQL-2016 was a huge impediment. Just finding a succinct corbis of test cases was a huge hurdle for me.

    Fortunately, the H2 Database project is a great resource. https://github.com/h2database/h2database/tree/master/h2/src/...

    Now for the exciting conclusion...

    My ANTLR grammar which passes all of H2's tests looks nothing like any of the official or product specific BNFs.

    Further, I found discrepancy between the product specific BNFs and their implementations.

    So a lot of trial & error is required for a "real world" parser. Which would explain why the professional SQL parsing tools charge money.

    I still think creating a parser for SQLite is a great project.

  • LearnThisRepo.com

    Learn 300+ open source libraries for free using AI. LearnThisRepo lets you learn 300+ open source repos including Postgres, Langchain, VS Code, and more by chatting with them using AI!

  • ANTLR

    ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

    > ...than it would be to learn the exact syntax and quirks and possibly bugs of someone else's implementation...

    Yup. Also, having deep knowledge of the language is required.

    SQLite's grammar is neat. Creating a compatible parser would make a fun project. Here's a pretty good example: https://github.com/bkiers/sqlite-parser (Actual ANTLR 4 grammar: https://github.com/bkiers/sqlite-parser/blob/master/src/main... )

    Postgres, which tries to be compliant with the latest standards, however...

    SQL-2016 is a beast. Not to mention all the dialects.

    I'm updating my personal (soon to be FOSS) grammar from ANTLR 3 LL(k) to ANTLR 4 ALL().

    I've long had a working knowledge of SQL-92, with some SQL-1999 (eg common table expressions).

    But the new structures and extensions are a bit overwhelming.

    Fortunately, ANTLR project has ~dozen FOSS grammars to learn from. https://github.com/antlr/grammars-v4/tree/master/sql

    They mostly mechanically translate BNFs to LL(k) with some ALL(). Meaning few take advantage of left-recursion. https://github.com/antlr/antlr4/blob/master/doc/left-recursi...

    Honestly, I struggled to understand these grammars. Plus, not being conversant with the SQL-2016 was a huge impediment. Just finding a succinct corbis of test cases was a huge hurdle for me.

    Fortunately, the H2 Database project is a great resource. https://github.com/h2database/h2database/tree/master/h2/src/...

    Now for the exciting conclusion...

    My ANTLR grammar which passes all of H2's tests looks nothing like any of the official or product specific BNFs.

    Further, I found discrepancy between the product specific BNFs and their implementations.

    So a lot of trial & error is required for a "real world" parser. Which would explain why the professional SQL parsing tools charge money.

    I still think creating a parser for SQLite is a great project.

  • H2

    H2 is an embeddable RDBMS written in Java.

    > ...than it would be to learn the exact syntax and quirks and possibly bugs of someone else's implementation...

    Yup. Also, having deep knowledge of the language is required.

    SQLite's grammar is neat. Creating a compatible parser would make a fun project. Here's a pretty good example: https://github.com/bkiers/sqlite-parser (Actual ANTLR 4 grammar: https://github.com/bkiers/sqlite-parser/blob/master/src/main... )

    Postgres, which tries to be compliant with the latest standards, however...

    SQL-2016 is a beast. Not to mention all the dialects.

    I'm updating my personal (soon to be FOSS) grammar from ANTLR 3 LL(k) to ANTLR 4 ALL().

    I've long had a working knowledge of SQL-92, with some SQL-1999 (eg common table expressions).

    But the new structures and extensions are a bit overwhelming.

    Fortunately, ANTLR project has ~dozen FOSS grammars to learn from. https://github.com/antlr/grammars-v4/tree/master/sql

    They mostly mechanically translate BNFs to LL(k) with some ALL(). Meaning few take advantage of left-recursion. https://github.com/antlr/antlr4/blob/master/doc/left-recursi...

    Honestly, I struggled to understand these grammars. Plus, not being conversant with the SQL-2016 was a huge impediment. Just finding a succinct corbis of test cases was a huge hurdle for me.

    Fortunately, the H2 Database project is a great resource. https://github.com/h2database/h2database/tree/master/h2/src/...

    Now for the exciting conclusion...

    My ANTLR grammar which passes all of H2's tests looks nothing like any of the official or product specific BNFs.

    Further, I found discrepancy between the product specific BNFs and their implementations.

    So a lot of trial & error is required for a "real world" parser. Which would explain why the professional SQL parsing tools charge money.

    I still think creating a parser for SQLite is a great project.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts