This repository was archived by the owner on Jan 14, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 113
This repository was archived by the owner on Jan 14, 2025. It is now read-only.
Spike - IaaS-based Compute #593
Copy link
Copy link
Open
Description
This documents replacing AKS with IaaS as the compute platform used for Azure Mission-Critical. Some specific scenarios might require the use of IaaS VMs instead of PaaS services. Potential reasons are:
- Lack of knowledge and skills
- Legacy workloads that require OS-level access or specific drivers and configurations
- Performance requirements that cannot fullfilled in containers or PaaS services
- Lack of support for 3rd-party workloads
Changes required compared to Mission-Critical-Online:
- Removed AKS and replaced with VMSS
- Requires a replacement for ingress e.g. AppGW (or FD?) - AppGw might make sense here - potentially with a PLS in front to expose it via AFD Premium
- Requires different rollout process for the workload
- Two VMSS one for Frontend (exposed via AppGw) one for Backend - not exposed hosting the backend processing
- Removed ACR
- Added shared image gallery (as global service for now) to store images
Scenarios to address:
- Scalable / stateless workloads -> Virtual Machine Scale Sets
- Static / stateful workloads -> Virtual Machines in an AV-Set
Open questions / findings:
- boot diag storage for vmss does not support zrs
- shared image gallery as a global service or per stamp?
- can stateful workloads hosted in vmss in a meaningful way
- what's the recommended (and most reliable) way to rollout software to (windows) vms?
- where to store application/workload components? (pendant for acr in a more cloud-native scenario) storage accounts?
- how to deal with dependencies like ADDS, WSFC, ..
- database backends (on VMs) in or out of scope?
Recommendations:
- Security
- Disable username / password authentication when using Linux
- Store VMSS credentials in Azure KeyVault
- Compute
- Same Zone considerations apply; spread across zones if possible OR consolidate in less than 3 zones if proximity is required and/or latency is a concern