gh-ost
Large Hadron Migrator
Our great sponsors
gh-ost | Large Hadron Migrator | |
---|---|---|
32 | 3 | |
11,934 | 1,811 | |
1.0% | 0.2% | |
7.4 | 0.0 | |
4 days ago | 7 months ago | |
Go | Ruby | |
MIT License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
gh-ost
-
How Modern SQL Databases Are Changing Web Development - #3 Better Developer Experience
I’ve been through multiple incidents where everything worked fine in the testing environment but ended up locking the production database for minutes when deployed. A category of open-source tools called OSC (Online Schema Change) exists to mitigate such pain, like gh-ost used by GitHub and OSC used by Meta. They work by creating a set of "ghost tables" to apply the migrations, copy over old data from the original tables, and catch up with new writes simultaneously. When all old data is migrated, you can trigger a cutover to make the "ghost tables" production. Check the post below for a great introduction and comparison:
-
We migrated to SQL. Our biggest learning? Don't use Prisma
Sounds like it's basically explained in the gh-ost readme https://github.com/github/gh-ost#how
I think it amounts to "use views to decouple access to the table with a fixed interface" and "use triggers for migrating data between tables"
-
Ask HN: Is PostgreSQL better than MySQL?
Gh-ost is the new hotness. Simple to use and lots of great features: https://github.com/github/gh-ost
-
Changing column from longtext to mediumtext taking over 2 hours
As they have said it depends on the size of the table on disk and the number of rows, but an alter in production is not difficult to last from seconds to days. I don't know if you are doing the alter as is but try to check https://docs.percona.com/percona-toolkit/pt-online-schema-change.html or https://github.com/github/gh-ost , they usually simplify a lot the alters.
Not sure which version of MySQL you're using, but one approach would be to use a tool like pt-online-schema-change (from Percona) or g-host -- which will create a duplicate table and then swap it in place of the original table. It's a safer approach when operating in production environments. Here's a good comparison of the tools many people use https://planetscale.com/docs/learn/online-schema-change-tools-comparison
-
Changing Tires at 100mph: A Guide to Zero Downtime Migrations
Actually I never tried but I was scared by the small print of GH not using RDS themselves [1] and Ghost relying on lower-level features that might be not easily available in RDS. Also I had the impression you have to setup a normal non-RDS replica attached to your RDS master?
MySQL has some robust tooling in this space. Some of the tools use triggers to copy to a new table. GitHub's gh-ost[1] is probably the state of the art, and uses the binary log stream to replicate the data.
-
How Retool upgraded its 4 TB main application PostgreSQL database
https://github.com/github/gh-ost/issues/331#issuecomment-266...) it does become a little bit of a "you do not have google problems" type discussion.
(Perhaps you do have such problems, I don't know where you work! But 99%+ of companies don't have such problems and never will.)
-
We lost 54k GitHub stars
GitHub doesn't use foreign keys[1], and there's likely many tables related to all the users, notifications, permissions, etc... that would need to be cleaned up. Without foreign keys they likely have some system process that does this instead of a simple `DELETE FROM` which cascades.
1. https://github.com/github/gh-ost/issues/331#issuecomment-266...
Large Hadron Migrator
-
GitHub downtime root cause analysis
No you didn't. They're doing what is often referred as "online schema change" using https://github.com/github/gh-ost (but the concept is the same than percona's pt-online-schema-change, or https://github.com/soundcloud/lhm.
-
Database... or Goose?
Is there anything similar for MySQL? There is https://github.com/soundcloud/lhm but it's pretty much outdated nowadays
-
Do you use migrations for data manipulations? What are the pro's and con's ?
I may do it from the console or a task if I wanted to modify a large number of records, e.g. something in my Users table. I think you need a sense of how long the update will take - I'm not sure if there's any issue with migrations timing out or such like. If I modify my Users schema it takes 5 minutes or so as it has to make a copy of the table and swap it in and that works fine - https://github.com/soundcloud/lhm
What are some alternatives?
pg-online-schema-change - Easy CLI tool for making zero downtime schema changes and backfills in PostgreSQL [Moved to: https://github.com/shayonj/pg-osc]
doctrine-test-bundle - Symfony bundle to isolate your app's doctrine database tests and improve the test performance
Squasher - Squasher - squash your old migrations in a single command
Lol DBA - lol_dba is a small package of rake tasks that scan your application models and displays a list of columns that probably should be indexed. Also, it can generate .sql migration scripts.
Foreigner - Adds foreign key helpers to migrations and correctly dumps foreign keys to schema.rb
BatchLoader - :zap: Powerful tool for avoiding N+1 DB or HTTP queries
PgHero - A performance dashboard for Postgres
Seedbank - Seedbank gives your seed data a little structure. Create seeds for each environment, share seeds between environments and specify dependencies to load your seeds in order. All nicely integrated with simple rake tasks.
Shiba - Catch bad SQL queries before they cause problems in production
squawk - 🐘 linter for PostgreSQL, focused on migrations
PgDriveBackup - Simple solution to make encrypted with ccrypt PostgreSQL backups and storing on Google Drive API
Polo - Polo travels through your database and creates sample snapshots so you can work with real world data in development.