Azure is an incredible “platform of choice” for the Sitecore application. You can scale, use PaaS resources like Redis, Azure SQL, and more. With this power though, comes cost, and running a cost effective Sitecore environment in Azure takes some work. This article dives into varying items to consider in order to maximize your power per dollar.
Stop/Start Virtual Machines (VMs) on Schedule or Ad Hoc
Recommended for lower environments or disaster recovery, you can use Azure automation to stop/start virtual machines on a schedule or ad hoc. This allows you to reduce resource costs when not in use. It will require some governance in that developers will have an operational window and/or a procedure to activate an environment.
Note, that Application Services operate differently than VMs and we cover that in the next section.
Here is a great step by step article to Start/Stop VMs during off-hours
Downsize Application Services for Lower Environments/DR and Upscale When Ready
While Application Services still charge money when stopped, you can alter the Application Service Plan to a lower tier until you are ready to use them. Important: Do not use the free tier, as that can switch your site to 32bit or remove other functionality. S1 should be the lowest to consider when downscaled.
The Set-AzAppServicePlan may be a good PowerShell cmdlet to consider as part of your automation (highly recommend Azure Automation as runbooks can be scheduled). DR downsizing is covered in more detail in the next section.
Evalute DR SLA and Downsizing until Failover
Disaster Recovery can be tricky in that you need to balance the need to failover and cost savings. For some, using Traffic Manager to go to a small Web Page on an Application Service while the main DR is being scaled up is an option. To be clear, you should not be failing over to DR because of a bad code deployment (rollback your code and fix it/don’t “Dev in Prod”), but rather an actual regional or data center type issue has occurred.
When considering downsizing DR, take into calculation your SLA and test how long it takes to bring your site up to the desired power level (whether VMs, App Services, Azure SQL DBs or a combination of all three).
Properly use SessionState to Take Load Off InProc (If Possible)
InProc session state puts load on your server, so its imperative to ensure you are properly utilizing session state per your setup. There are some instances where inProc is required, but if you are running multiple CDs with a load balancer, it is not recommended to use inProc.
View the section “Session state configuration scenarios” of this document to learn which scenario bet suits your environment type: https://doc.sitecore.com/developers/92/platform-administration-and-architecture/en/scaling-and-configuring-session-state.html
Autoscale… But With Caution
Autocaling up or down Application Service Instances can be a powerful feature, but be forewarned that any new instance added by Microsoft Azure does not wait until your application is loaded. In short, any new users who get routed to the new instances will receive a 500 type error until the Sitecore application is warmed up/loaded.
You can use Application Initialization to help address this by having a temporary html page until Sitecore is fully initialized. More details can be found in the “Autofail: A Big Azure Autoscale Limitation and What To Do” article.
Optimize Code and Caching to Reduce Load
A more lightweight site will use less resources… to that end optimize your code and tune your caches to maximize the efficiency of your platform. This takes time and effort, and many organizations upscale infrastructure to mask poor development. Don’t let this happen to you, it creates an endless cycle of spend and opens your platform to more risk. In short, if you have bad code, configs, or caches… fix them and revisit your site features regularly to make them as lightweight as possible.
Confused on where to start? Use GTMetrix to get a read out/site score and get to work optimizing/fixing/refactoring.
Offload Images to CDN
Get those images out of Sitecore and served by a CDN. Many products like CloudFlare or Azure FrontDoor offer a combination of features beyond a CDN to make the price point compelling. Will you pay for a CDN? Yes, but the overall cost reduction of your base infrastructure could outweigh the cost of a CDN (especially if the edge device has more capabilities than just CDN).
Block Unwanted Traffic that Creates Load With a WAF (aka, Bots)
A Web Application Firewall (WAF) is a must have for any serious production Sitecore environment. Depending on the features, this will block or throttle unwanted traffic. Many don’t even realize that their site load, and subsequent cost, is being driven by bot traffic. Not all bots are bad, but the majority of ones that can use up your infrastructure are… There is also an implicit cost saving in blocking bad or unwanted actors in that a site down costs time and money to get back online.
Many WAFs also include CDNs and other edge capabilities, so consider a combo package like Azure FrontDoor or CloudFlare to maximize your investment.
Use Elastic Pools
Elastic pools are the unsung heroes of platform stability for Sitecore sites running Azure SQL. These allow a balance of power and cost effectiveness as you can group databases together with a pooled cost to allow for flexing at a set cost tier. A good recommendation is to have 2 elastic pools + 2 high I/O databases, one for XM databases, another for XP databases, but then move your shard databases to high I/O single databases (such as P1) as shards require increased input/output.
Learn more about elastic pools here: https://docs.microsoft.com/en-us/azure/azure-sql/database/elastic-pool-overview
Regularly Evaluate the Scale of Each VM, AppService, and Azure SQL DBs
Finally, you should regularly evaluate the sizing of your VMs, AppService, and Azure SQL DBs. This includes your production environment as you may need to scale up, scale down, or move databases into differing elastic pools. This is not a static “set it once and forget it” situation as your code base will change your power needs up/down, as well as, an influx or reduction in users (or bots).