gh-ost

GitHub's Online Schema Migrations for MySQL (by github)

Gh-ost Alternatives

Similar projects and alternatives to gh-ost

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better gh-ost alternative or higher similarity.

Suggest an alternative to gh-ost

Reviews and mentions

Posts with mentions or reviews of gh-ost. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-01-07.
  • In MySQL, use utf8mb4 when you mean to work with utf8 in your programming language.
    1 project | reddit.com/r/programming | 13 Jan 2022
    We use github’s ghost tool for the same reason. Relatively happy with it, but would appreciate your thoughts if you’ve used both.
  • Zero-downtime schema migrations in Postgres using Reshape
    4 projects | news.ycombinator.com | 7 Jan 2022
  • Update MySQL production database schema
    1 project | reddit.com/r/mysql | 19 Dec 2021
    gh-ost is one of the better ones.
  • Leaving MySQL
    7 projects | news.ycombinator.com | 5 Dec 2021
    Online schema change solutions have been around for over the past decade and are commonly used to ALTER TABLE with no downtime (or with minimal interruption) on the largest deployments of MySQL today.

    The two most common solutions are pt-online-schema-change and gh-ost, and if you are running MySQL today and still running direct ALTER TABLE suffering outage, then you're in for a pleasant change.

    On top of that, most MySQL ALTER TABLE operations with InnoDB tables support non-blocking, lockless operation as well. My main concern with these is that they're still replicated sequentially leading to replication lags.

    MySQL is also slowly adding "Instant DDL", currently still limited to just a few types of changes.

    Disclosure: I authored gh-ost (at GitHub), oak-online-alter-table (the original schema change tool) and am a maintainer for Vitess and working on online schema changes in Vitess.

    Links:

    - https://www.percona.com/doc/percona-toolkit/3.0/pt-online-sc...

    - https://github.com/github/gh-ost

    - Past HN discussion: https://news.ycombinator.com/item?id=16982986

    - https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-op...

    - https://vitess.io/docs/user-guides/schema-changes/

  • GitHub downtime root cause analysis
    4 projects | reddit.com/r/programming | 3 Dec 2021
    No it's the actual run, see https://github.com/github/gh-ost
    4 projects | reddit.com/r/programming | 3 Dec 2021
    No you didn't. They're doing what is often referred as "online schema change" using https://github.com/github/gh-ost (but the concept is the same than percona's pt-online-schema-change, or https://github.com/soundcloud/lhm.
  • PlanetScale Is Now GA
    4 projects | news.ycombinator.com | 16 Nov 2021
    - gh-ost

    I authored the original schema change tool, oak-online-alter-table https://shlomi-noach.github.io/openarkkit/oak-online-alter-t..., which is no longer supported, but thankfully I did invest some time in documenting how it works. Similarly, I co-designed and was the main author for gh-ost, https://github.com/github/gh-ost, as part of the database infrastructure team at GitHub. We developed gh-ost because the existing schema change tools could not cope with our particular workloads. Read this engineering blog: https://github.blog/2016-08-01-gh-ost-github-s-online-migrat... to get better sense of what gh-ost is and how it works. I in particular suggest reading these:

    - https://github.com/github/gh-ost/blob/master/doc/cheatsheet....

    - https://github.com/github/gh-ost/blob/master/doc/cut-over.md

    - https://github.com/github/gh-ost/blob/master/doc/subsecond-l...

    - https://github.com/github/gh-ost/blob/master/doc/throttle.md

    - https://github.com/github/gh-ost/blob/master/doc/why-trigger...

    At PlanetScale I also integrated VReplication into the Online DDL flow. This comment is far too short to explain how VReplication works, but thankfully we again have some docs:

    - https://vitess.io/docs/user-guides/schema-changes/ddl-strate... (and really see entire page, there's comparison between the different tools)

    - https://vitess.io/docs/design-docs/vreplication/

    - or see this self tracking issue: https://github.com/vitessio/vitess/issues/8056#issue-8771509...

    Not to leave you with only a bunch of reading material, I'll answer some questions here:

    > Can you elaborate? How? Do they run on another servers? Or are they waiting on a queue change waiting to be applied? If they run on different servers, what they run there, since AFAIK the migration is only DDL, there's no data?

    The way all schema change tools mentioned above work is by creating a shadow aka ghost table on the same primary server where your original table is located. By carefully both copying data from original table as well as tracking ongoing changes to the table (whether by utilizing triggers or by tailing the binary logs), and using different techniques to mitigate conflicts between the two, the tools populate the shadow table with up-to-date data from your original table.

    This can take a long time, and requires an extra amount of space to accommodate the shadow table (both time and space are also required by "natural" ALTER TABLE implementations in DBs I'm aware of).

    With non-trigger solutions, such as gh-ost and VReplication, the tooling have almost ocmplete control over the pace. Given load on the primary server or given increasing replication lag, they can choose to throttle or completely halt execution, to resume later on when load has subsided. We have used this technique specifically at GitHub to run the largest migrations on our busiest tables at any time of the week, including at peak traffic, and this has show to pose little to no impact to production. Again, these techniques are universally used today by almost all large scale MySQL players, including Facebook, Shopify, Slack, etc.

    > who will throttle, the migration? But what is the migration? Let's use my example: a column type change requires a table rewrite. So the table rewrite will throttle, i.e. slow down? But where is this table rewrite running, on the main server (apparently not) or on a shadow server (apparently either since migrations have no data)? Actually you mention "when your production traffic gets too high". What is "high", can you quantify?

    The tool (or Vitess if you will, or PlanetScale in our discussion) will throttle based on continuously collecting metrics. The single most important metric is replication lag, and we found that it predicts load more than any other matric, by far. We throttle at 1sec replication lag. A secondary metric is the number of concurrent executing threads on the primary; this is mroe improtant for pt-online-schema-change, but for gh-ost and VReplication, given their nature of single-thread writes, we found that the metric is not very important to throttle on. It is also trickier since the threshold to throttle at depends on your time of day, particular expected workload etc.

    > We run customers that do dozens to thousands of transactions per second. Is this high enough?

    The tooling are known to work well with these transaction rates. VReplication and gh-ost will add one more transaction at a time (well, two really, but 2nd one is book-keeping and so low volume that we can neglect it); the transactions are intentionally kept small so as to not overload the transaction log or the MVCC mechanism; rule of thumb is to only copy 100 rows at a time, so exepect possibly millions of sequential such small transaction on a billion row table.

    > Will their migrations ever run, or will wait for very long periods of time, maybe forever?

    Some times, if the load is so very high, migrations will throttle more. At other times, they will push as fast as they can while still keeping to low replication lag threshold. In my experience a gh-ost or vreplication migration is normally good to run even on the busiest times. If a database system is such that it _always_ has substantial replication lag, such that a migration cannot complete in a timely manner, then I'd say the database system is beyond its own capacity anyway, and should be optimized/sharded/whatever.

    > How is this possible? Where the migration is running, then? A shadow table, shadow server... none?

    So I already mentioned the ghost table. And then, SELECTs are non blocking on the original table.

    > What's cut-over?

    Cut-over is what we call the final step of the migration: flipping the tables. Specifically, moving away your original table, and renaming the ghost table in its place. This requires a metadata lock, and is the single most critical part of the schema migration, for any tooling involved. This is where something as to give. Tooling such as gh-ost and pt-online-schema-change acquire a metadata lock such that queries are blocked momentarily, until cut-over is complete. With very high load the app will feel it. With extremely high load the database may not be able to (or may not be configured to) accommodate so many blocked queries, and app will see rejections. For low volume load apps may not even notice.

    I hope this helps. Obviously this comment cannot accommodate so much more, but hopefully the documentation links I provided are of help.

  • The real reason behind why I switched
    23 projects | reddit.com/r/ProgrammerHumor | 27 Oct 2021
  • Correct approach for simple like system at scale?
    2 projects | reddit.com/r/PostgreSQL | 21 Aug 2021
    Source: I've had to do the same thing on a table with "only" a billion rows. It took weeks and weeks of planning (we had a document with over 200 steps, including all the people we had to notify and all the possible problems), and was under the gun to be done by the holidays. We ran the actual migration in 'screen' because the command took over 2 weeks to run. During that time, the load on our databases got dangerously high several times, and had to be paused for a few hours. (We were using pt-OSC at the time, but later switched to gh-ost which was much safer. We also had smaller migrations that hit a snag and had to be aborted and restarted.)
  • Altering table without downtime, MYSQL
    1 project | reddit.com/r/SQL | 11 May 2021
    If you want to have finer control over how the alter works, there are a few 3rd party tools that will help you do them (gh-host and pt-online-schema-change).
  • GitHub has Degraded Availability
    3 projects | news.ycombinator.com | 12 Mar 2021
    GitHub uses MySQL, not Postgres. They built the best-in-class online schema change tool gh-ost [1], and have a custom declarative schema change execution system built around Skeema [2], which contains a wealth of linters [3].

    Even so, it's always possible for an engineer to submit a schema change which is detrimental to performance. For example, dropping an important index, or changing it such that some necessary column is no longer present. Linters simply cannot catch some classes of these problems. Usually they must be caught in code review, but people make mistakes and could approve a bad change.

    Disclosure: I'm the author of Skeema, but have not worked for or with GitHub in any capacity.

    [1] https://github.com/github/gh-ost

    [2] https://github.blog/2020-02-14-automating-mysql-schema-migra...

    [3] https://www.skeema.io/docs/options/#lint

  • Vimeo Engineering: The great pretender – faster application tests with MySQL simulation
    3 projects | reddit.com/r/PHP | 1 Feb 2021
    As to why no triggers/foreign keys at scale, this is a good comment from GitHub about it. If you have a product that is allowed to have downtime then that scenario probably don't matter to you.
  • Improving how we deploy GitHub
    1 project | news.ycombinator.com | 25 Jan 2021
    Re db migrations: they've built their own DB management tooling (https://github.com/openark/orchestrator) and online migration tooling (https://github.com/github/gh-ost)

Stats

Basic gh-ost repo stats
15
9,558
6.4
10 days ago

github/gh-ost is an open source project licensed under MIT License which is an OSI approved license.

Less time debugging, more time building
Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
scoutapm.com
Find remote Go jobs at our new job board 99remotejobs.com. There are 5 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.