Sitecore Release Governance for Better Production Stability
Production issues can range from slightly annoying to downright devastating…. but what if you could prevent the majority of them? Well, you can! It requires some release governance and support from the business, but it can be achieved. The following provides a very basic/simple runbook you can review/modify/incorporate to reduce risk. Remember, when it comes to code deployments, its better to slow down to go faster.
A Word About Release Gating/Gates
As your Sitecore release traverses from Development to Production, it should go through several “gates”. These gates are locked until you have the right key. That “key” is your code/config base passing a clearly defined QC. In the Sample Release Process/Governance below, you will see several gates where you should stop, consider, and open the gates to the next level only if you are confident.
Sample Release Process/Governance
NOTE: The following steps provide several gates and checks to reduce risk and does not guarantee that Production issues won’t occur, but it can certainly reduce them
1 – Ensure Architectural Integrity/Environmental Feature Parity. This is critical since if your Prod is XP1, having lower environments of XM or XP0 is not appropriate as your testing and features need to match. While XP0 may seem a reasonable approach for a lower environment, it actually lacks the separate roles of an XP1
2 – Deploy from local development environments to a consolidated Development environment
3 – Evaluate using Sitecore Content Serialization for merging content items from the disparate local environments (or another appropriate tool) https://doc.sitecore.com/developers/101/developer-tools/en/sitecore-content-serialization.html
4 – Review the logs and resolve any development issues in the development environment
5 – Deploy to the UAT environment
6 – Perform User Acceptance Testing. If issues are discovered, remediate these in the Development environment starting with Step 2 and following all steps/gates in sequence until it passes UAT
7 – Deploy to the Staging environment
8 – Conduct a sustained load test that mirrors peak traffic in Production (ex. 1 hour test)
9 – Review Staging for any errors or configuration changes that need to be made based on load test results
10 – If the are any logged errors, resolve errors or makes appropriate configuration changes in the development environment (Step 2) and proceed back through the Dev, UAT, and Staging steps/gates (including load testing) to validate all is fixed and can manage stress under load while returning an acceptable response rate
11 – Go/No Go Decision for Production release based on testing results in Staging
12 – Deploy to Production
13 – Heightened monitoring of the Production release (24-48 hours) for any immediate items/issuese and rollback your release as appropriate if errors are impactful (highly recommend an APM tool for constant monitoring of Production)
14 – Review any issues discovered in the Production logs and incorporate those into the next release ahead of any newly planned features, as actual “live” load may surface something new since users are the truest test of Production stability