Executive Summary
When it comes to crisis/incident management in the cloud/digital era, hope is not a strategy. Yet a majority of enterprises continue their digital business operations as usual, hoping major disruptive incidents will never happen to them, even while it happens to their competitors, partners, peers, dependent service providers, cloud platforms, tool providers, and even incident management service providers on a regular basis.
The Information Technology Infrastructure Library (ITIL)–based systems had their day and served their purpose to help IT run smoothly—until recently. Now, most enterprise IT teams are struggling to cope with the newer cloud operations (CloudOps)—demand-based scaling, cloud-native monitoring/observability, and incident management, to name a few. When dealing with critical incidents, raising a ticket and waiting for it to progress through support levels (L1/L2/L3) to reach the proper subject matter expert (SME) to solve that incident can be a disaster waiting to happen.
Most enterprises today are still not set up to handle all the IT-related incidents, or crises, in real time. The classic legacy enterprises are set up to deal with IT incidents in old-fashioned ways, without considering the cloud, software-as-a-service (SaaS) nuances, or the social media venting and demand by customers that puts pressure on enterprises to fix the incidents faster than ever. Newer digital-native companies do not even put much emphasis on digital incident management: They hope to deal with it as it occurs.
Especially with the need and demand for “always-on” services, in modern complex architectures there are more opportunities than ever for things to break, and incidents do not wait for a convenient time. Problems can, and often do, happen on weekends, holidays, or weeknights when no one is paying attention. To be properly prepared when an incident happens, an enterprise must be in the position to immediately identify, assess, manage, solve, and effectively communicate the situation to customers, stakeholders, and (for major incidents) senior management.