JavaScript is required to use Bungie.net

Service Alert
Destiny 2 will be temporarily offline tomorrow for scheduled maintenance. Please stay tuned to @BungieHelp for updates.

Forums

1/28/2020 11:38:37 PM
1
My concern is their apparently unshakable commitment to fixing forward. It's a fine strategy for smaller issues if your SDLC is setup for it, but not for showstoppers. I work in business software and being down for the whole day is unacceptable, period. Something of this magnitude would mean you rollback BOTH the database and the software, and then you're running back at a pre-release state ASAP. THEN you fix the bug (on your own time, not your customers') and try the release again later.
English

Posting in language:

 

Play nice. Take a minute to review our Code of Conduct before submitting your post. Cancel Edit Create Fireteam Post

  • When dealing with databases, it's often the only course of action is to fix forward. It's very diffcult to rollback data changes. I think the main issue is that they aren't doing continous delivery, but rather a big batch of things, so when things goes boom, it will bring down the whole thing. Without know more about how the software was designed, i'm thinking maybe they can do smaller releases and incremental rollouts, e.g. a few servers/users here first, then somemore etc. A bit like how google roll out changes.

    Posting in language:

     

    Play nice. Take a minute to review our Code of Conduct before submitting your post. Cancel Edit Create Fireteam Post

  • It's not hard at all to rollback data changes. You have a backup when you bring the system down before release. Then you restore the backup. If you're trying to retain changes from after the release when you rollback, that can be massively complicated, but they've already stated they're not doing that. In the age of DevOps, Docker, VMs and swappable images, if they're not designing their release process w/one eye to rapid rollbacks if something goes wrong, then quite frankly they're doing it wrong.

    Posting in language:

     

    Play nice. Take a minute to review our Code of Conduct before submitting your post. Cancel Edit Create Fireteam Post

  • Edited by Tiny Cabbage: 1/29/2020 12:12:31 AM
    well imagine if the db change involves a merge of 2 columns, if you wanted to roll back the changes, how would you determine how to seperate the data. For example, address line 1 and address line 2 has been merge to 21 dave street, london, london, uk. How would you know before the merge, what address line 1 contains and what address line 2 contained. Restoring a DB backup is a massive change and i'm guessing they only have 1 DB and it's massive, and can't afford to have 2 db of the same size, it'll prob be petabyte size, it requires bringing it down, however fixing forward a DB change can be done on the fly without outage. In previous roll strat i've wrote, it's been to fix forward if possible, timebox it to 1 hr, if it can't be done then get the backups, turn the whole thing off and restore.

    Posting in language:

     

    Play nice. Take a minute to review our Code of Conduct before submitting your post. Cancel Edit Create Fireteam Post

  • They already bring the system down for patches, and brought it down again as soon as the severity of the issue was clear. This isn't Guild Wars 2 trying to run dynamic patching w/o making people even log out of the game. And restoring from a pre-release backup explicitly addresses any schema changes that happened w/the release. You go back to the data architecture exactly as it was before you pushed the button. There's nothing wrong w/trying a limited fix forward before committing to a rollback. I think everybody does that to some degree. Nobody wants to redo a release completely if they can avoid it, particularly not over something that could be fixed in 30 minutes. But 9 hours of downtime is WELL beyond any acceptable limit for a 'quick fix.'

    Posting in language:

     

    Play nice. Take a minute to review our Code of Conduct before submitting your post. Cancel Edit Create Fireteam Post

  • Edited by Tiny Cabbage: 1/29/2020 12:25:42 AM
    This is purely a guess. Their release goes something like, turn the server off. Release software, liquibase goes and make changes to the db, software is patch accross all servers. All done in say 30 minutes, shouldn't take long. However there's been a massive mistake/issue with this one, and it required a restore of DB, which is very unusual. So they carry on trying to fix forward, in the backgroup, they get the second DB server going, get the massive DB from glacier. I'm guessing it's petabytes, it would take hours just to get the backups on to the cluster. The cluster is spin up, the DB is slowly imported into the (Guessing they use AWS aurora), they then have to run a bunch of test against the new DB, all the software / config is then migrate to use the new DB host, this maybe a massive change. Game is tested again, and various user accounts are checked to see if glimmer etc are ok. Don't want to f**k up twice.

    Posting in language:

     

    Play nice. Take a minute to review our Code of Conduct before submitting your post. Cancel Edit Create Fireteam Post

  • I agree....kinda scary when you sit back and think over 2 years of your gaming work is nothing but a row of data in this table linked to a row of data in this table....thank the Lord for Backups and Restores! Hey..a new game type...RAID 0...half your team goes one way..the other half goes the other...if they wipe...you wipe.....RAID 1..you are still split, but ..enemies must be killed in the same order (talk about a RAID mechanic!)...you however are still ok if the other side wipes... :).

    Posting in language:

     

    Play nice. Take a minute to review our Code of Conduct before submitting your post. Cancel Edit Create Fireteam Post

You are not allowed to view this content.
;
preload icon
preload icon
preload icon