IT major incident management process: Real-life examples
What's inside the video
- 4 real life scenarios
- How to turn every service request into an experience to boost your ITSM maturity
- Case study: New hire request - to fill 8000 open positions
- For employee onboarding to function smoothly, how it should work
- KPIs that matter
Take your major incident management (MIM) process up to notch
Case study: major availability incident hits a web performance company, their incident team debugging the situation, major availability incident management framework, major availability incident management framework with servicedesk plus.
- A structured approach to effectively roll out a major change
- Case study: How do you roll out a change effectively and help your company embrace change?
- The SMB embraces change with ServiceDesk Plus
- Is the change process effective?
- Use case: Build a rock solid ITAM strategy and grow your organization's ITAM maturity
- An educational institution has to upgrade from Windows 8 to the latest version
- Major challenges
- How to easily solve these challenges
- Track metrics that matter
Download your free copy of the presentation
Video transcription
Now, let's move to our second scenario, which is handling a major incident and getting your services back online. So, as organizations, we don't really like major incidents, right? We try to steer clear of them, but it's always better to, anticipate the occurrence of these events in advance and have a strategy to deal with them because, if not, it creates just chaos and confusion. So, in this real-life scenario, we look at an organization which did not have an incident management strategy and let's see how well they responded to a major incident.
So this is the case study that we have. We have a web performance and security company that offers CDN, DNS, and DDoS protection to many web sites. As a standard operating process, this company's firewall team regularly deployed new rules in their web application firewall. So this is done to respond to new security vulnerabilities in the internet. So, during one such a routine update, a minor change made by one of their engineers spiked the usage of CPUs across their servers, bringing down half of the web sites around the world. So what customers ended up seeing was a 502 bad gateway error. So, as you can realize, this is a major incident of the highest magnitude.
So let's break down the sequence of events in a timeline and let's see how the organization actually worked on it. So at 13:42, the outage actually happened, and the services fail. As soon as that happens, they receive alerts from different monitoring tools, and different alerts are created such as service down alerts, financial error alerts, etc. Eight minutes into the incident, the SRE team realizes that something has gone wrong, and by that time, 80% of the traffic is already down.
So they speculate of an external attack, and finally, they declare a major incident, realizing the impact of the incident. So, their London engineering team is alerted about the global outage, and throughout this entire time period, their support team is flooded with calls. Phones are ringing off the hook and tickets are being raised a lot. Thirty-three minutes into the incident, an incident response team is formed with members drawn from multiple teams. Yes, let me state it again. At the peak of chaos and confusion, 33 minutes after major incident, an incident response team is being constituted. That's a major bottleneck right over there.
So, this IRT team was under intense pressure from the management, and they have still not identified the route cause. And nearly an hour later, they dismissed the possibility of an external attack and finally figure out that the issue was with the WAF. So a global WAF kill is implemented, and finally, the websites are taken back online. So as you can see, throughout this timeline, there were major roadblocks such as recognizing an incident, putting together a team, communicating with stakeholders, and triaging. So, how do we overcome all these bottlenecks so that your business is not affected?
Here is a best practice workflow that we use in Zoho to combat major incidents. So it starts off with detecting an alert from the monitoring tool and converting it into a ticket in your service desk tool. So, as soon as that happens, you recognize that there is a major incident, and then you communicate with your stakeholders such as your CIOs or your CTOs or managers of IRTs and bring them together to kickstart the process of triaging. So then you assess the impact of the incident, and you choose whether or not to declare a major incident. And by now, your end users would be panicking because they are unable to access critical business services.
So you communicate externally to them, put out an announcement saying that there is an incident and that you're working on it. So by now, you create different tasks, delegate them to appropriate resolver groups who then provide the workaround and ensure that your services are taken back online. And that ends the boundary of incident management.
Now we need to perform a root cause analysis and ensure that a recurrence of this major incident is not happening. For that, we need to create a problem ticket.
So this is how you deal with a major incident effectively. Now, let's see how you can leverage ServiceDesk Plus to do the same.
What you're seeing on your screen right now is OpManager, which is the network monitoring software from ManageEngine. So, you can integrate OpManager with ServiceDesk Plus and ensure that whenever a monitoring alert is created, you could automatically convert it into a ticket in ServiceDesk Plus. So what you're seeing right now is exactly that implementation. So, as soon as the monitoring alert is created, this is how the ticket is reflected in ServiceDesk Plus. As you can see, a brief description of the incident is provided, and pretty much whatever you see is what we saw before in service request.
The next process is to communicate with stakeholders and inform them of this major incident. And for that, we'll make use of automation again, but this time it is the business rules. So business rules are condition-based actions, which ensure that there is no time delay in communicating major incidents. So, as soon as a ticket is logged with the subject as edified, not detected, or website down, these set of actions would be performed such as setting the priority as a major incident and placing it in the appropriate support group so that they could kickstart the process of troubleshooting. You could also send notifications to specific stakeholders, and those notifications could be in the form of an email or an SMS. So, as you can see, that's how simple it is to communicate with stakeholders in real-time. So this eliminates a major bottleneck.
So let me go back to the best practice workflow again and show us where we are. So, as you can see, we detected the major incident, and we communicated immediately with our stakeholders. So the next step is to assess the damage that has happened and declare a major incident. So we saw how there are multiple tickets being created and multiple monitoring alerts being created, and which translated to multiple tickets. So you could link all these tickets together and ensure that you troubleshoot the major incident.
As you can see, on your right side over here, all the affected assets are also associated with this incident tickets. So let me click on these Assets, and as I soon do that, all the asset details are displayed, So, the detailed CI info, you have hardware, software information, and relationships are obtained from the CMDB . So this helps you to ascertain whether major services will be affected or not. As you can see, admin services at a complete geographical location, which is Delhi here, would be affected because this server hosts these two services.
So what we do next is to go ahead and communicate this incident to your stakeholders. So for that, let me go to the homepage of the technician and let's click on Add New Announcement. So, you can create a new announcement over here and ensure that it is displayed within a specific time frame, and you can choose to show it to just the right group of affected users. So, in this case, we saw that there were users of a particular location on a particular department who would be affected. So you can ensure that the rest of the end-users can carry on with their duties, and these end-users are communicated to.
So, let us, again, go back to our best practice workflow and see how far we have arrived. So as you can see, we detected the occurrence of the major incident, we communicated it, we assessed the impact, and we communicated it via announcements. So, all that is left is to delegate different tasks, kickstart the process of troubleshooting, which is providing a workaround. So, for that, we will go back to the incident ticket and click on tasks. So, we discussed a lot about tasks previously in the service request.
So as same as that, you could create different tasks associated to different groups and ensure that you configure the right dependencies. And dependencies here matter a lot because triaging is really necessary, and in a major incident, you need to follow a defined procedure. So, once the tasks have been done and a workaround provided, you need to add it to your resolution. So you could go ahead and add your resolution over here. If it had not occurred previously, you could add it to your knowledge base. So this will help you to combat future occurrences of the same incident.
Now, we have completed the boundary or the domain of incident management. Now, all that is left is to create a problem ticket and initiate a root cause analysis as to what caused this incident in the first place. For this, you could go to Associations over here and click on Create a New Problem. So what happens here is all the details are carried over, you could create a new problem and your technician or an appropriate technician group would perform a root cause analysis. That's how easy it is. Now it looks very easy, right? We have overcome all the bottleneck data web performance company faced. We ensured that we understood the best practice and applied it into your ITSM approach with the right capabilities.
Now, as with the previous service request, we need to keep track of some essential matrixes. This is very important because only then you would know what are the gaps present in your strategy and how well you can bridge them.
So, here you have your ticket volume, your technician productivity, your resolution time, and again, your ticket churn. So I would advise you to create complete unique dashboards for dealing with your major incidents. So, as you can see, I've created a major incident management dashboard over here, which represents in real-time the major incidents defined by category, by technician, and the number of incidents closed by technician. So, that brings us to the end of our second scenario. By now you should be confident of handling any future major incident.
- Previous Employee onboarding examples
- Next Examples of normal IT change
Resources for further reading
Related videos
- Measuring the service desk's contribution to value
- The service desk and the new normal
- The service desk is your best technology coach
Make your ITSM operations future-ready with ServiceDesk Plus.
- Live Demo
- Compare
- Get quote
Major Incident Management
This document discusses best practices for managing major incidents and provides a case study example. It addresses the challenges of poor communication during major incidents and common ineffective approaches. The case study highlights how a digital communications company reduced the time to engage stakeholders in incidents from minutes to seconds using a mobile app for targeted messaging and automated escalations. The document provides questions to start a dialogue on improving preparation, response, and reflection on major incident management processes. Read less
More Related Content
- 1. xMatters Major Incident Management Best practices Real world case studies Confidential and Proprietary
- 2. Hi…my name is Anne Deming [email protected] Confidential and Proprietary 21/18/2019
- 3. How many MIM’s a month? Confidential and Proprietary 31/18/2019
- 4. A recent survey on Major Incidents •32% - once a year or less •42% - at least several times a year •13% - at least once a month •13% - at least once a week Confidential and Proprietary 41/18/2019
- 5. Must haves – we won’t bury the lead •Stakeholder Alignment •Rapid Engagement • Determine who and when to engage Identifying technical vs. management contacts •Intelligent Responses •Multi-modal Support •Integrations with Operations Confidential and Proprietary 51/18/2019
- 6. 6 Thechallengeyou face….(becauseyoursystemsare yourbrand) Confidential and Proprietary Poor communicationduringMajor Incidentscan seriouslyimpact IT
- 7. 7 CommonMajorIncidentCommunicationApproaches Confidential and Proprietary 7 Manual notifications Email and distribution lists mass notification No Precision No Reliability No Accountability“Kitchen Sink” approach results in alert fatigue Critical insights slip through the cracks Limited follow- through to ensure action
- 8. 8 Poor communicationduringa Major Incidenthas a bigimpact Extended downtimes & missed SLA Loss of revenue Employee burnout Tarnished image of IT Frustrated customers Resolvers not held accountable Confidential and Proprietary
- 9. A Case study for a digital communications company •Digital communications company •Over 32 million connections to its network •120 Million calls everyday in the network • Connects people, families & businesses (their customers) • Critical connectivity for tracking / payments on a transit system. Confidential and Proprietary 91/18/2019
- 10. HIGHLIGHTS on using xMatters •We transformed the way they manages incidents, with impressive results: •Reduced the number of clicks to engage stakeholders •2 minutes to send communications, an 88% reduction •3 minutes to engage stakeholders, an 85% reduction Confidential and Proprietary 101/18/2019
- 11. HOW? •A mobile app •Targeted messages •Automated escalations •Communications are easy-to-build, easy-to-send •Templated communications Confidential and Proprietary 111/18/2019
- 12. 1212Confidential and Proprietary There is a better way
- 13. Preparation - Creating effective communication plans •Impact of outsourcing within global teams and 3rd party vendors •Maintaining up-to-date contact info and schedules •Developing standard communication • Defining the communication content • pre-determined templates • pre-determined engagement timelines for communications • pre-determine the blast radius (all or nothing is a bad idea) Confidential and Proprietary 131/18/2019
- 14. Action – fire fighting •Activating the communication plan •Management of the recovery: Leadership roles, technical guidance •Inspiring rapid response from tech teams •Communicating effectively with necessary constituencies •Tracking communication plan status Confidential and Proprietary 141/18/2019
- 15. Reflection – the post mortem •Root Cause Analysis after a Major Incident •What types of information gathered during a major incident can be leveraged to avoid recurrence. •Tracking SLAs •Aggregating data for compliance/audit purposes •Remediating contact info errors •Driving improved performance – identifying weaknesses in the communication plan Confidential and Proprietary 151/18/2019
- 16. A cheat sheet of questions to start the dialogue • Who to contact and how to contact them. • Incident Manager…When you assemble folks how does this happen? • What about getting folks on the call? • Consider the Staffing schedule: skeleton crew, 8-5 or are you 24x7? • What is the timeframe for communication with the business? • What about escalations? How do you handle them? There is MTTE and time to become affective. Some MIMs take 30 hours. • A culture consideration: How to identify the right people to contact? Is it changing behavior or is enforcement an issue? • How do you measure if you are overwhelming one resource vs. another are you tracking? • How many steps are in your MIM process….Commonly we hear a 5 steps process. (Could yours be adjusted?) • How & when will you engage 3rd party (hosting providers or technology vendors you need to engage) in during a Major Incident? • When outsources escalate back to customer….How will you handle SLAs on the customer side? • Have you considered communication for upgrades or downgrades P2 to P1 or P1 to P2? • Is there criteria to go straight to MIM, red alert and bypass triage? Confidential and Proprietary 161/18/2019
- 17. Thank you….questions? Confidential and Proprietary 171/18/2019
Incident management for high-velocity teams
How to run a major incident management process, managing and resolving high-impact incidents.
Major incident management (often known here at Atlassian simply as incident management ) is the process used by DevOps and IT Operations teams to respond to an unplanned event or service interruption and restore the service to its operational state.
What is a major incident?
So, what constitutes a major incident? A major incident is an emergency-level outage or loss of service.
The definition of emergency-level varies across organizations. At Atlassian, we have three severity levels and the top two (SEV 1 and SEV 2) are both considered major incidents.
If a customer-facing service is down for all Atlassian customers, that’s a SEV 1 incident. If the same service is down for a sub-set of customers, that’s SEV 2. Both fall under the heading of major incident and require an immediate response from our incident management teams.
Any issue that does not interfere with essential tasks is considered a SEV 3 and is not a major incident.
Defining your major incident management process
The incident lifecycle (also sometimes known as the incident management process) is the path we take to identify, resolve, understand, and avoid repeating incidents.
Incident management processes vary from company to company, but the key to success for any team is clearly defining and communicating severity levels, priorities, roles, and processes up front — before a major incident arises.
To gain a shared understanding of priorities, roles, and processes, any team that’s starting or revisiting their major incident management process should start by getting clear on the answers to questions like:
- What constitutes a major incident for our company/product?
- How will we define severity and priority levels of incidents? If more than one major incident happens at one time, how will we know what to tackle first?
- Who is responsible for handling major incidents? What roles will team members have? How will those roles be defined and communicated?
- What process will teams follow in the event of a major incident? Is there more than one process, depending on the type of incident?
- How often will we communicate with stakeholders--both internal and external? What is our communication plan?
- What will our on-call schedule look like for major incidents? Who is responsible for an incident at 2 a.m.? On a weekend? Over the holidays?
- When and how should we alert our on-call incident manager--prioritizing quick resolution for major incidents while also avoiding alert fatigue ?
Atlassian’s major incident management process
At Atlassian, our incident management process includes detection, raising a new incident, opening comms, assessing, sending initial comms, escalation, delegation, sending follow-up comms, review, and resolution.
First, an incident is detected either by our technology, customer reports, or personnel. Whoever detects the incident (be it a technician who notices the issue or a customer service rep who gets a call from a frustrated client) is responsible for logging the incident in our system and identifying a severity level.
By the time an incident reaches our teams, it’s already got a SEV 1, 2, or 3 attached. We consider SEV levels 1 and 2 to be major incidents, while a SEV 3 indicates a lower-impact incident.
Raising a new incident
Once an incident ticket is created, a notification goes out to the on-call professional responsible for that service.
The page alert we send out at Atlassian includes information on the severity and priority of the incident, as well as a summary, making it clear—at a glance—whether this is the top priority or can wait if another incident is in progress.
Opening comms
Once the incident manager gets an alert, their first order of business is to communicate that the incident fix is in progress. They change the status of the incident to fixing and set up the team’s communication channels.
It's imperative to offer flexible communication channels throughout the incident response process that allow teams to stay in touch by their preferred method. Jira Service Management integrates multiple communications channels to minimize downtime, such as embeddable status widget, dedicated statuspage, email, chat tools, social media, and SMS.
The incident manager has been alerted and the communication channels are open. Next step: assessing the incident itself.
For our teams, this process starts with a series of questions the team has to answer:
- What’s the impact on Atlassian’s customers and employees?
- What are customers seeing?
- How many customers are affected? (Some? All?)
- When did the incident start?
- How many support cases have been opened about this incident?
- Are there other factors at play that impact the severity level or priority or change the way we approach the incident? (E.g. security concerns, social media PR crises, etc.)
Once we’ve answered those questions, we can confidently move forward with diagnostics and proposed fixes or change the SEV level and priority level of an incident as needed.
Sending initial comms
Once we’ve confirmed that the incident is real, communication with our customers and employees becomes top priority. As we say in our handbook :
“The goal of initial internal communication is to focus the incident response on one place and reduce confusion. The goal of external communication is to tell customers that you know something’s broken and you’re looking into it as a matter of urgency.”
Speedy, accurate communication helps build and keep customer trust.
We have a strategic incident communication plan and provide regular status updates that follow a simple format. We also send an email to a set list of stakeholders that includes our engineering leadership, major incident managers, and other key internal staff. As previously mentioned, all of these communication methods are customizable within Jira Service Management and can be tailored to any organization's incident response plan .
Sometimes, an incident is resolved quickly by the on-call team. But in cases where that doesn’t happen, the next step is to escalate the issue to another expert or team of experts better suited to resolve this specific incident.
In Jira Service Management , responders can group related tickets and add collaborators to the issue to coordinate alerts. Responders can also automatically record all actions with a rich incident timeline and access automation and knowledge base articles to rapidly investigate and remediate incidents.
Once the issue has been escalated to someone new, the incident manager delegates a role to them. At Atlassian, these roles are pre-set, so team members can quickly understand what’s expected of them.
Sometimes major incidents require a single incident manager and a small team. Other times, a situation may call for multiple tech leads or even multiple incident managers. The original incident manager is tasked with figuring out when that’s the case and bringing on the appropriate people.
Sending follow-up comms
As the incident continues to progress, another round of communication outside the tech team will help keep customers and employees calm, trusting, and in the loop. This is easy when collaborators can manage alerts across different communication platforms to stay on top of incident response.
Unfortunately, when it comes to incident resolution, there’s no one-size-fits-all. Which is why at this stage of the process, we take the time to:
- Observe what’s going on, sharing and confirming observations with the team
- Develop theories about why it’s happening (and how we can fix it)
- Develop and execute experiments that prove or disprove our theories
Throughout this process, the incident manager keeps a close eye on how things are going. Are particular team members overtasked? Does someone need a break? Do we need to bring in a fresh set of eyes? More delegation happens as needed.
Our incident handbook defines resolution as “when the current or imminent business impact has ended.”
At this point, the emergency has passed and the team transitions into clean-ups and postmortems.
Postmortems
Our incident lifecycle ends when the incident is resolved, but that isn’t the end of our process at Atlassian. We also want to do everything in our power to ensure an incident doesn’t repeat. Which is why the next step is a blameless postmortem , designed to identify the cause of an incident and help us mitigate our risk in the future.
Use postmortem templates with Jira Service Management to easily create and export post-mortem reports—along with associated incident timelines—to Confluence so responders can continue to collaborate with cross-functional teams to track follow-up actions and avoid similar incidents in the future.
Roles and responsibilities
Roles and responsibilities will vary based on your organization’s culture, team size, on-call schedules , and more. Some common major incident roles include:
Incident manager: The person responsible for overseeing the resolution of the incident.
Tech lead: A senior-level tech pro tasked with figuring out what’s broken and why, determining the best course of action, and running the tech team.
Communications manager: A communications pro (often from the PR or customer support teams) responsible for communicating with internal and external customers impacted by the incident.
Customer support lead: The person in charge of making sure incoming tickets, phone calls, and tweets about the incident get a timely, appropriate response.
Social media lead: A social media pro in charge of communicating about the incident on social channels.
Other common roles include:
Root cause analyst or problem manager: The person responsible for going beyond the incident’s resolution to identify the root cause and any changes that need to be made to avoid the issue in the future.
Major incident investigation board: A group responsible for investigation and change management.
An incident management solution like Jira Service Management will help in each step of the response process, from organizing your on-call schedule and alerting to unifying teams for better collaboration to running incident postmortems.
Learn incident communication with Statuspage
In this tutorial, we’ll show you how to use incident templates to communicate effectively during outages. Adaptable to many types of service interruption.
Incident communication templates and examples
When responding to an incident, communication templates are invaluable. Get the templates our teams use, plus more examples for common incidents.
- Contact us |
Incident Management Case Studies
GREAT CUSTOMERS MAKE GREAT CASE STUDIES
See great incident management case studies using incidentcontrolroom.com®.
Human Vaccine Facility
Case Study – Nationwide Platforms (Lavendon Group plc)
Case Study – Kildare County Fire Service
Dundalk Institute of Technology
University College Cork (UCC)
Case Study: Bausch & Lomb
Latest Blogs:
- 12 things to consider when buying an Incident Response Software
- From to Chalk to Marker to ICR Whiteboards
- 10 Tips – Creating Great Crisis Management Team Workflows
- Crisis Management Teams need to use workflows
- Manage Business Continuity Incidents & Reduce Risk
- Crisis Management in the 21st Century (with Les Allan, Heriot-Watt University)
- Advisera Home
- Compliance in general
Partner Panel
Company Training Academy
AI-Powered Toolkits
Products by framework:
Implementation, maintenance, training, and knowledge products for Information Security Management Systems (ISMS) according to the ISO 27001 standard.
Automate your ISMS implementation and maintenance with the Risk Register, Statement of Applicability, and wizards for all required documents.
All required policies, procedures, and forms to implement an ISMS according to ISO 27001.
Train your key people about ISO 27001 requirements and provide cybersecurity awareness training to all of your employees.
Accredited courses for individuals and security professionals who want the highest-quality training and certification.
Get instant answers to any questions related to ISO 27001 and the ISMS using Advisera’s proprietary AI-powered knowledge base.
Compliance and training products for critical infrastructure organizations for the European Union’s Network and Information Systems cybersecurity directive.
All required policies, procedures, and forms to comply with the NIS 2 cybersecurity directive.
Company-wide training program for employees and senior management to comply with Article 20 of the NIS 2 cybersecurity directive.
Compliance and training products for personal data protection according to the European Union’s General Data Protection Regulation.
All required policies, procedures, and forms to comply with the EU GDPR privacy regulation.
Accredited courses for individuals and privacy professionals who want the highest-quality training and certification.
Implementation, training, and knowledge products for Quality Management Systems (QMS) according to the ISO 9001 standard.
All required policies, procedures, and forms to implement a QMS according to ISO 9001.
Accredited courses for individuals and quality professionals who want the highest-quality training and certification.
Get instant answers to any questions related to ISO 9001 and the QMS using Advisera’s proprietary AI-powered knowledge base.
Implementation, training, and knowledge products for Environmental Management Systems (EMS) according to the ISO 14001 standard.
All required policies, procedures, and forms to implement an EMS according to ISO 14001.
Accredited courses for individuals and environmental professionals who want the highest-quality training and certification.
Get instant answers to any questions related to ISO 14001 and the EMS using Advisera’s proprietary AI-powered knowledge base.
Implementation and training products for Occupational Health & Safety Management Systems (OHSMS) according to the ISO 45001 standard.
All required policies, procedures, and forms to implement an OHSMS according to ISO 45001.
Accredited courses for individuals and health & safety professionals who want the highest-quality training and certification.
Implementation and training products for medical device Quality Management Systems (QMS) according to the ISO 13485 standard.
All required policies, procedures, and forms to implement a medical device QMS according to ISO 13485.
Accredited courses for individuals and medical device professionals who want the highest-quality training and certification.
Compliance products for the European Union’s Medical Device Regulation.
All required policies, procedures, and forms to comply with the EU MDR.
Implementation products for Information Technology Service Management Systems (ITSMS) according to the ISO 20000 standard.
All required policies, procedures, and forms to implement an ITSMS according to ISO 20000.
Implementation products for Business Continuity Management Systems (BCMS) according to the ISO 22301 standard.
All required policies, procedures, and forms to implement a BCMS according to ISO 22301.
Implementation products for testing and calibration laboratories according to the ISO 17025 standard.
All required policies, procedures, and forms to implement ISO 17025 in a laboratory.
Implementation products for automotive Quality Management Systems (QMS) according to the IATF 16949 standard.
All required policies, procedures, and forms to implement an automotive QMS according to IATF 16949.
Implementation products for aerospace Quality Management Systems (QMS) according to the AS9100 standard.
All required policies, procedures, and forms to implement an aerospace QMS according to AS9100.
- White Papers
- Templates & Tools
Where to Start
- Live Consultations
- Consultant Directory
Solutions for industries:
- Consultants
- IT & SaaS companies
- Critical infrastructure
- Manufacturing
- Transportation & distribution
- Telecommunications
- Banking & finance
- Health organizations
- Medical device
- Laboratories
Implementation, maintenance, training, and knowledge products for consultancies.
Handle multiple ISO 27001 projects by automating repetitive tasks during ISMS implementation.
All required policies, procedures, and forms to implement various standards and regulations for your clients.
Organize company-wide cybersecurity awareness program for your client’s employees and support a successful cybersecurity program.
Accredited ISO 27001, 9001, 14001, 45001, and 13485 courses for professionals who want the highest-quality training and recognized certification.
Get instant answers to any questions related to ISO 27001 (ISMS), ISO 9001 (QMS), and ISO 14001 (EMS) using Advisera’s proprietary AI-powered knowledge base.
Find new clients, potential partners, and collaborators and meet a community of like-minded professionals locally and globally.
Implementation, maintenance, training, and knowledge products for the IT industry.
Documentation to comply with ISO 27001 (cybersecurity), ISO 22301 (business continuity), ISO 20000 (IT service management), GDPR (privacy), and NIS 2 (critical infrastructure cybersecurity).
Company-wide cybersecurity awareness program for all employees, to decrease incidents and support a successful cybersecurity program.
Compliance, training, and knowledge products for essential and important organizations.
Documentation to comply with NIS 2 (cybersecurity), GDPR (privacy), ISO 27001 (cybersecurity), and ISO 22301 (business continuity).
Implementation, training, and knowledge products for manufacturing companies.
Documentation to comply with ISO 9001 (quality), ISO 14001 (environmental), and ISO 45001 (health & safety), and NIS 2 (critical infrastructure cybersecurity).
Accredited courses for individuals and professionals who want the highest-quality training and certification.
Get instant answers to any questions related to ISO 9001 (QMS) and ISO 14001 (EMS) using Advisera’s proprietary AI-powered knowledge base.
Implementation, training, and knowledge products for transportation & distribution companies.
Implementation, training, and knowledge products for schools, universities, and other educational organizations.
Documentation to comply with ISO 27001 (cybersecurity), ISO 9001 (quality), and GDPR (privacy).
Get instant answers to any questions related to ISO 27001 (ISMS) and ISO 9001 (QMS) using Advisera’s proprietary AI-powered knowledge base.
Implementation, maintenance, training, and knowledge products for telecoms.
Implementation, maintenance, training, and knowledge products for banks, insurance companies, and other financial organizations.
Documentation to comply with ISO 27001 (cybersecurity), ISO 22301 (business continuity), GDPR (privacy), and NIS 2 (critical infrastructure cybersecurity).
Implementation, training, and knowledge products for local, regional, and national government entities.
Documentation to comply with ISO 27001 (cybersecurity), ISO 9001 (quality), GDPR (privacy), and NIS 2 (critical infrastructure cybersecurity).
Implementation, training, and knowledge products for hospitals and other health organizations.
Documentation to comply with ISO 27001 (cybersecurity), ISO 9001 (quality), ISO 14001 (environmental), ISO 45001 (health & safety), NIS 2 (critical infrastructure cybersecurity) and GDPR (privacy).
Implementation, training, and knowledge products for the medical device industry.
Documentation to comply with MDR and ISO 13485 (medical device), ISO 27001 (cybersecurity), ISO 9001 (quality), ISO 14001 (environmental), ISO 45001 (health & safety), NIS 2 (critical infrastructure cybersecurity) and GDPR (privacy).
Implementation, training, and knowledge products for the aerospace industry.
Documentation to comply with AS9100 (aerospace), ISO 9001 (quality), ISO 14001 (environmental), and ISO 45001 (health & safety), and NIS 2 (critical infrastructure cybersecurity).
Implementation, training, and knowledge products for the automotive industry.
Documentation to comply with IATF 16949 (automotive), ISO 9001 (quality), ISO 14001 (environmental), and ISO 45001 (health & safety), and NIS 2 (critical infrastructure cybersecurity).
Implementation, training, and knowledge products for laboratories.
Documentation to comply with ISO 17025 (testing and calibration laboratories), ISO 9001 (quality), and NIS 2 (critical infrastructure cybersecurity).
Branimir Valentic
- Get Started
Major Incident Management – when the going gets tough…
Various authors have discussed Incident Management here on several occasions. Being one of the most elaborated key functions, there are a number of issues we could address in depth. Major incident management is one of them, and due to its significant impact and visibility, it deserves a few more words.
ITIL Incident Management Overview
Any unplanned interruption or service degradation is, according to ITIL , considered as incident. So once incident happens, and they will, primary goal of ITIL Incident Management is to restore service as quickly as possible in order to minimize the business impact. Any event that disrupts or could disrupt a service itself is within the scope of incident management. For an example, single failed disk drive within mirrored array is not causing any interruptions, but there is a service degradation in terms that risk of data loss has increased, and that’s why such event is also considered as an incident.
ITIL Incident Management, as part of ITIL Service Management , is responsible for incident identification, logging and categorization. Reports about incidents may come from Service Desk (by call, e-mail, web), event management or directly by technical staff, but all of them have to be recorded, time stamped and contain sufficient data in order to be properly managed.
In order to effectively manage incidents, we need to have means to prioritize them, because they rarely appear only one at the time. And we prioritize incidents by Impact vs. Urgency matrix. Impact is the effect incident has on a business, and Urgency basically defines time business (or customer) is ready to wait for resolution. In example, we may have high impact incident (high level – 1) affecting whole finance department, but low urgency (low level – 3) because they use that service only on the end of the fiscal year which is 6 months away. In such scenario, this incident is categorized as moderate priority – 3. Details about time frames in which each level of priority is expected to be resolved is part of Service Level Agreement (SLA). Read this blog post for more information: All About Incident Classification .
Incidents are generally results of errors or malfunctions within IT equipment. In such cases, root cause is apparent, and resolution is simple as repairing faulty part, or applying a workaround. But in a case where seriousness of the incident is great, or avalanche of similar incidents are recorded, Problem Management process steps in and takes over the search for the root cause. Once root cause has been identified, problem is referred as known error, and is registered in Known Error Database (KeDB). Service Desk, as a function of the Incident Management relies on known error database and workarounds provided. If you need a tool that will help you manage incidents, here is the list of Free tools for ITSM that you may try, and use for free.
What is a major incident?
In theory, a major incident is a highest-impact, highest-urgency incident. It affects a large number of users, depriving the business of one or more crucial services. Business and IT have to agree on what constitutes a major incident. It is one of the rare occasions where ITIL is strict in terms of definition: it MUST be agreed on. ISO 20000 requirements on major incident management are short, but demanding: agreement, separate procedure, responsibility and review.
Who should be involved?
When a major incident occurs, roles and the process should be strictly defined. Mind you, we are talking about the roles here, not the actual day-to-day jobs. Roles will differ according to the size of the IT service management organization and the scope of its service management. Smaller organizations will tend to aggregate a few roles into one job definition, while larger organizations will elaborate sub-roles for each major incident type, customer or technical expertise field.
Major incident manager. Accountable for the general procedure management, taking care that the required resources for incident resolution are engaged and the customer is informed appropriately about the progress. He shall also have basic technical knowledge about the outage. In smaller service management organizations with a lower frequency of major incidents, this role will be taken by the Service Desk manager , who also acts as the Incident Manager . In larger organizations, the appointment of major incident manager will depend on the particular expertise area. It could be the technical account manager best acquainted with the respective business organization specifics, someone from the Technical management function or the Application management function.
Problem manager. This role will often have to be involved, since major incident resolution usually requires finding the underlying cause (root cause analysis) of the major incident. This role can’t be combined with the incident management role, due to the well-known conflict of interests between the incident management and problem management processes. The major incident team will be struggling to restore the service, and problem management tends to take its time finding the root cause.
Change manager. Involved in case some urgent changes have to be implemented to restore the service.
SLA manager. Must be informed in order to keep a record of the downtime and to inform the customer if the procedure requires this.
Service Desk. Responsible for keeping incident records up to date and for primary customer communication.
Communication
We mentioned major roles in the process. Guess whose is the most important role, and who is often omitted from the loop? The customer! It is the most common mistake for growing service management organizations – to get involved so deeply in incident resolution that the communication with the customer is neglected.
The moment you receive the call from the customer to inquire about the resolution progress, you should know that there is something wrong. Frequency, form and the scope of communication with the customer should be clearly stated in the SLA . The customer should always know what to expect. His vital business process is endangered; he must be on his heels. Short, concise information every half an hour or at least every hour should contain info about:
- Start of downtime
- Short description of the known cause of the downtime
- The impact of the downtime
- Estimated time for restoration
- Next scheduled information
The major incident team should maximize its resources in service restoration, so the Service Desk should regularly ping them to receive a quick update about the process, which they will forward formally to the customer.
The after party
The incident is resolved , the service is restored, and the customer returns to his day-to-day business. The aftertaste remains. Why did it happen? What is to be expected going forward – have we done anything to prevent these downtimes in the future? How do we deal with these questions?
In short, the best practice is to resolve the incident and to continue working on a related problem ticket. This will produce a so-called problem report, or at least a root cause analysis (RCA) report in a brief, SLA-defined period of time to the customer. Recommended info in this report should consist of at least the following:
- Short description of the incident
- Downtime duration
- Short incident history
- How we resolved the incident
- What is the root cause
- A set of activities scheduled in order to prevent this kind of downtime
This report will soothe the customer’s concerns, and let him know that he is dealing with mature service management which understands his business needs, and is doing its best to protect his core business.
If you were the customer, what more would you expect?
To implement ISO 20000 easily and efficiently, use our ISO 20000 Documentation Toolkit that provides step-by-step guidance and all documents for full ISO 20000 compliance.
Documentation Toolkit
Step-by-step implementation for smaller companies
Related Products
ISO 20000 Documentation Toolkit
Iso 27001 documentation toolkit, suggested reading.
You may unsubscribe at any time. For more information please see our privacy notice .
- Digital Marketing
- Facebook Marketing
- Instagram Marketing
- Ecommerce Marketing
- Content Marketing
- Data Science Certification
- Machine Learning
- Artificial Intelligence
- Data Analytics
- Graphic Design
- Adobe Illustrator
- Web Designing
- UX UI Design
- Interior Design
- Front End Development
- Back End Development Courses
- Business Analytics
- Entrepreneurship
- Supply Chain
- Financial Modeling
- Corporate Finance
- Project Finance
- Harvard University
- Stanford University
- Yale University
- Princeton University
- Duke University
- UC Berkeley
- Harvard University Executive Programs
- MIT Executive Programs
- Stanford University Executive Programs
- Oxford University Executive Programs
- Cambridge University Executive Programs
- Yale University Executive Programs
- Kellog Executive Programs
- CMU Executive Programs
- 45000+ Free Courses
- Free Certification Courses
- Free DigitalDefynd Certificate
- Free Harvard University Courses
- Free MIT Courses
- Free Excel Courses
- Free Google Courses
- Free Finance Courses
- Free Coding Courses
- Free Digital Marketing Courses
15 Corporate Crisis Management Case Studies [2024]
In an era where corporate operations are under the microscope, and the potential for a crisis looms large, effective crisis management is not just preferable—it is imperative. From environmental disasters to cyber attacks, companies across various industries have faced significant challenges that tested their resilience and operational integrity. This article delves into 10 definitive case studies of corporate crisis management, offering insights into the strategies employed by major organizations when stakes were high. These real-world examples illustrate how companies like Volkswagen, Johnson & Johnson, and Sony Pictures navigated complex crises, providing valuable lessons on the importance of rapid response, transparent communication, and a commitment to rectifying errors. By examining these cases, we aim to highlight the critical components of successful crisis management and the long-term impact of these events on corporate reputation and consumer trust.
1. Crisis Management at Johnson & Johnson: The 1982 Tylenol Poisoning
Company profile.
Johnson & Johnson, a renowned global healthcare company, has been a significant player in the pharmaceutical and consumer goods sector since its founding in 1886. With products ranging from baby care to medical devices, Johnson & Johnson has built a reputation for quality and trust, adhering to a philosophy emphasizing first responsibility to the consumers, doctors, and nurses who use their products.
In the autumn of 1982, a severe crisis hit Johnson & Johnson when cyanide-laced Extra-Strength Tylenol resulted in the deaths of seven individuals around Chicago. This sabotage endangered public health and threatened the reputation of one of its most trusted products, Tylenol, which accounted for a significant portion of the company’s profits.
The manner in which Johnson & Johnson managed this situation became a standard-setting example in corporate crisis resolution. The company immediately alerted consumers nationwide not to consume any of its Tylenol products, which was unprecedented. The company undertook a comprehensive recall of Tylenol, withdrawing around 31 million bottles from the market, which led to financial losses exceeding $100 million. Furthermore, the company cooperated fully with law enforcement agencies and the media to manage the situation transparently and keep the public informed. To restore consumer confidence, Johnson & Johnson developed tamper-resistant packaging, which included a triple-sealed package that would make it obvious if tampering had occurred.
Johnson & Johnson’s handling of the crisis resulted in a quick recovery for the Tylenol brand. The company’s swift and consumer-focused actions maintained and even bolstered consumer trust in the brand. Within a year of the crisis, Tylenol’s market share returned close to its pre-crisis level. Through its decisive actions, the company prevented further damage and established innovative safety benchmarks for the industry. The approach taken by Johnson & Johnson during the Tylenol crisis is frequently highlighted as a prime example of successful crisis handling.
Related: Women Leadership Case Studies
2. Crisis Management at Toyota: The 2010 Accelerator Pedal Crisis
Founded in 1937 in Japan, Toyota Motor Corporation is renowned globally for manufacturing durable and premium-quality vehicles. Toyota has built a strong brand reputation on innovation, sustainability, and reliability principles, with a global presence and a commitment to pioneering advancements in automotive technology.
In 2010, Toyota faced a severe crisis when reports of unintended acceleration in several vehicle models surfaced. This issue was linked to several accidents, including fatalities, which raised serious safety concerns. The crisis was exacerbated by allegations of delayed response from Toyota, which damaged the company’s reputation for safety and reliability.
Toyota’s response involved multiple steps to address the crisis effectively. The company recalled over 8 million vehicles worldwide, one of the largest in automotive history, to fix the faulty accelerator pedals and floor mats causing unintended acceleration. Toyota also halted the production and sale of eight models affected by the issue. To regain consumer trust, Toyota extended its warranties and set up a new rapid-response team to deal with safety complaints more quickly. It increased its focus on quality control and customer communication. The company’s president issued a public apology and testified before the U.S. Congress, committing to greater transparency and improved safety standards.
Toyota’s proactive measures and transparent communications gradually restored consumer trust. The company implemented stringent quality controls and revamped its safety technology, which led to introducing enhanced safety features in future models. Although Toyota initially faced significant financial losses, including fines from the U.S. government for handling the recalls, the company recovered over the following years. Toyota’s commitment to addressing the issues comprehensively helped it regain its position as a leader in the global automotive market, showcasing the importance of prompt and effective crisis management in maintaining brand integrity.
3. Crisis Management at Pepsi: The 1993 Syringe Hoax
PepsiCo, founded in 1898, is one of the world’s leading food and beverage companies. Known for its flagship product, Pepsi, the company offers various popular brands across more than 200 countries. PepsiCo is strongly committed to corporate responsibility and consumer satisfaction, which has helped it maintain a leading position in the competitive beverage industry.
In 1993, Pepsi faced a public relations crisis when claims surfaced about syringes allegedly found in cans of Diet Pepsi. The accusations quickly gained national attention, creating a potential consumer safety scare and threatening the brand’s image and trust.
PepsiCo responded swiftly and strategically to the crisis. The company immediately collaborated with the FDA to investigate the claims and quickly determined that the syringe reports were a hoax. PepsiCo used a transparent approach in its crisis management, utilizing the media to communicate directly with the public. The company produced videos showing the canning process to demonstrate that foreign objects couldn’t be inserted during production. These videos were shared with news outlets and played a crucial role in educating the public and dispelling the rumors.
PepsiCo’s effective use of media and quick response helped to quickly mitigate the impact of the hoax. Consumer confidence was restored, and the company’s transparent and proactive approach was praised in the media and by regulatory bodies. Sales, which had initially dipped sharply, rebounded within a few weeks. The 1993 syringe hoax case is often cited as a textbook example of effective crisis management, demonstrating how decisive action and clear communication can protect and even enhance a company’s reputation in the face of potential disaster.
Related: Pros and Cons of Career in Hedge Fund Management
4. Crisis Management at British Petroleum (BP): The Deepwater Horizon Oil Spill
British Petroleum, a leading global entity in the oil and gas sector, provides energy and retail services besides fuel for transportation. Founded in 1909, BP has operations in nearly 80 countries worldwide, with a strong commitment to delivering energy in a responsible manner, advancing low-carbon living, and improving every aspect of the energy system.
In April 2010, BP was embroiled in one of the most significant environmental and PR crises to date. An explosion on the BP-operated Deepwater Horizon oil rig led to a catastrophic oil spill in the Gulf of Mexico, marking one of the gravest environmental disasters. Eleven workers lost their lives, and millions of barrels of oil spilled into the Gulf over 87 days, causing extensive environmental damage to marine and wildlife habitats and tarnishing BP’s environmental and safety reputation.
BP’s response involved multiple strategies to manage the unfolding crisis. The company committed $20 billion to a fund for damages and initiated a massive cleanup operation involving thousands of people. BP also created a claims process for businesses and individuals affected by the spill. The company’s public relations tactics involved regular updates and leveraging social media to keep the public informed about their response measures. BP’s CEO then made several high-profile media appearances to manage public perception, though some were criticized for poor handling.
The cleanup efforts lasted several years, with BP spending over $65 billion in cleanup costs, fines, and settlements. Despite initial heavy criticism and financial losses, BP restored some public trust through its response efforts and commitment to restoring the Gulf. The company overhauled its safety procedures and corporate governance structures to prevent future disasters. The crisis significantly impacted BP’s market value and reputation, but the firm remains a major player in the energy sector, with ongoing efforts aimed at safer energy practices and sustainability.
5. Crisis Management at Samsung Electronics: The Galaxy Note 7 Battery Fires
Established in 1969, Samsung Electronics has emerged as a technological and consumer electronics leader globally. Samsung, a pioneer in innovation, is recognized as a major producer of electronic components, including digital media devices, semiconductors, and integrated systems.
In August 2016, Samsung faced a severe crisis when reports emerged of its newly released Galaxy Note 7 smartphones catching fire due to faulty batteries. The incidents posed serious safety risks to consumers and led to negative media coverage, severely impacting Samsung’s reputation for quality and safety in the highly competitive tech market.
Samsung immediately recalled over 2.5 million Galaxy Note 7 devices just weeks after the product’s launch. The company issued replacements, but some of the new devices also caught fire, leading to a second recall and the eventual discontinuation of the product. Samsung set up investigation teams to find the cause of the battery failures, enhancing their quality assurance processes. The company was transparent in its communications, regularly updating the public and stakeholders about the steps to resolve the issue.
The Galaxy Note 7 crisis cost Samsung an estimated $17 billion and significantly dented the brand’s image. However, Samsung’s comprehensive recall and commitment to addressing all consumer concerns helped salvage customer loyalty. The company’s rapid response and transparency were crucial in managing the crisis. Samsung enhanced its battery safety protocols and quality assurance processes to avert similar future problems. By addressing the technical flaws and revamping their safety protocols, Samsung managed to recover and maintain its position as a leading innovator in the smartphone market.
Related: Worst Corporate Negotiation Failures
6. Crisis Management at Chipotle: The E. coli Outbreaks
Chipotle Mexican Grill, founded in 1993 in Denver, Colorado, quickly became a popular chain known for its fresh, high-quality ingredients and commitment to sustainable and ethical food sourcing. Chipotle, known for its fast-casual dining concept, is committed to integrity in food sourcing, ensuring that its ingredients are both fresh and ethically sourced.
In late 2015, Chipotle faced a major crisis when multiple E. coli outbreaks linked to several restaurants surfaced across the United States. The outbreaks affected customers in over 14 states and led to a significant public health scare, severely tarnishing the brand’s reputation for food safety and quality. This crisis resulted in a sharp decline in customer visits and a significant drop in stock prices, threatening the company’s profitability and brand image.
Chipotle responded to the crisis by closing affected restaurants to conduct deep cleaning and full sanitation. The company cooperated with health officials to trace the source of the E. coli outbreak and overhauled its food safety procedures. Chipotle rolled out an extensive food safety initiative, modifying food handling and preparation techniques across its outlets. Chipotle launched a marketing campaign to regain customer trust and issued public apologies through various media platforms, reassuring the public about the safety measures being taken. The company also offered free food promotions to encourage customers to return.
Chipotle’s proactive measures and transparency in addressing the food safety issues helped slowly rebuild consumer trust. Although the company faced a steep initial decline in sales, it gradually recovered customer loyalty through its enhanced food safety protocols and ongoing customer engagement in its improvements. The crisis also prompted Chipotle to invest more heavily in food safety training and technology to ensure such an incident would not recur, thereby strengthening the brand’s commitment to quality and safety in the long term.
7. Crisis Management at United Airlines: The Passenger Removal Incident
United Airlines, established in 1926, is one of the world’s largest airlines, offering comprehensive flight schedules and serving millions of passengers annually. With a global network, United is known for its significant contributions to the aviation industry, including pioneering developments in customer service and safety.
In April 2017, a significant controversy arose when United Airlines forcibly ejected a passenger from an overbooked plane at Chicago O’Hare International Airport. This event, widely viewed and shared across social platforms, showed the passenger being forcibly moved along the airplane aisle, resulting in significant injuries. This event sparked international outrage, highlighting issues with United’s customer service and policies on overbooking.
United Airlines initially struggled with its response, with a series of statements seen as insincere or defensive. However, the company soon shifted its approach by issuing a full apology from the CEO, who took personal responsibility for the incident. United announced a thorough review of its policies, especially concerning handling overbooked flights and interactions with passengers. The airline also introduced changes, including increased compensation for bumped passengers, reduced overbooking, and more employee training on customer service. Additionally, United settled a lawsuit with the affected passenger, which helped mitigate some of the negative publicity.
The crisis immediately negatively impacted United’s reputation and stock value, but the comprehensive policy changes and public relations efforts helped the airline recover over time. United Airlines’ enhanced commitment to customer service and revised policies served to regain public trust and demonstrated the airline’s dedication to improving passenger experiences. The incident led to broader industry changes, prompting other airlines to modify their overbooking and customer service practices.
Related: Infamous CEO Frauds
8. Crisis Management at Volkswagen: The Diesel Emissions Scandal
Volkswagen, founded in 1937 and headquartered in Wolfsburg, Germany, is one of the world’s largest and most recognized automobile manufacturers. Volkswagen has been renowned for its iconic vehicles, such as the Beetle and the Golf, which symbolize the company’s commitment to quality, reliability, and innovative design. Volkswagen stands committed to sustainability and the advancement of clean energy solutions within the auto industry.
In 2015, Volkswagen faced a monumental crisis when it was discovered that the company had installed software in diesel engine vehicles to manipulate emissions tests in the United States. This software made it appear that the vehicles met environmental standards when, in fact, they emitted pollutants at levels up to 40 times higher than what is allowed in the U.S. The scandal, known as “Dieselgate,” affected nearly 11 million vehicles worldwide and severely damaged Volkswagen’s reputation for trustworthiness and environmental stewardship.
Volkswagen took several steps to manage the crisis. The company immediately issued a public apology and admitted wrongdoing. Matthias Müller was appointed as the new CEO to replace Martin Winterkorn, who resigned amid the scandal. Volkswagen committed to recalling millions of affected vehicles and retrofitting them to meet environmental standards properly. The company allocated over €6.5 billion to cover costs related to the scandal, including settlements and fines. Volkswagen also launched a comprehensive internal investigation to hold responsible parties accountable and revamped its compliance and regulatory procedures to prevent future violations.
Volkswagen’s initial reaction to the crisis was condemned for lacking transparency and being slow. However, the company’s subsequent actions helped to stabilize the situation. Financially, Volkswagen suffered substantial losses, with billions in fines and legal costs and a significant drop in stock prices. However, Volkswagen has regained some of its market position by committing to electric vehicle technology and discontinuing much of its diesel model offerings. The company’s strategic pivot to electric vehicles and its investments in clean energy technologies have begun to restore consumer and investor confidence, positioning Volkswagen as a leader in the electric mobility future.
9. Crisis Management at Equifax: The 2017 Data Breach
Equifax Inc., one of the premier credit reporting agencies globally, offers analytical and financial data services to individuals and businesses. Founded in 1899 and based in Atlanta, Georgia, Equifax operates or has investments in 24 countries and is a pivotal component of the global financial infrastructure, tasked with managing and protecting the personal data of millions of people.
In September 2017, Equifax disclosed a severe data breach that compromised sensitive data of roughly 147m people, including driver’s license and Social Security numbers. The breach was one of the largest in history to threaten personal identity security, severely damaging Equifax’s credibility and leading to widespread public outrage, especially due to delayed disclosure and inadequate security measures that failed to prevent the breach.
Equifax responded by waiving credit freeze fees for consumers who needed to protect their credit histories and offering free credit monitoring services. CEO Richard Smith retired, and Equifax appointed a new CEO to lead the crisis response and recovery efforts. The company overhauled its security infrastructure and increased technology and data protection investments. Equifax cooperated fully with various government investigations and committed to enhancing transparency and customer service to rebuild trust.
The data breach had far-reaching consequences for Equifax, including numerous lawsuits, Congressional hearings, and a significant decline in stock value. The company’s efforts to repair its reputation focused on rebuilding trust through better security practices and improved customer relations. Despite these efforts, recovery has been ongoing, with Equifax continuing to face challenges in fully restoring its image. The crisis highlighted the critical need for stringent cybersecurity measures and transparent corporate practices, especially for firms handling sensitive personal data.
Related: Business Analysis Case Studies
10. Crisis Management at Sony Pictures: The 2014 Cyber Attack
Sony Pictures Entertainment, a major division of Sony Corporation, is a globally prominent entertainment firm based in Culver City, California. Sony Pictures, a dominant force in the media sector, significantly influences global culture and entertainment with its extensive range of film and television productions.
In November 2014, Sony Pictures experienced a devastating cyber attack by a group calling themselves the Guardians of Peace. The breach resulted in the exposure of extensive confidential data such as personal employee details, executive communications, and multiple unreleased films. The cyber attackers demanded the cancellation of “The Interview,” a film satirizing the North Korean leader’s attempted assassination. This film allegedly motivated the attack, leading to international tensions and a significant crisis for Sony Pictures.
Sony initially pulled “The Interview” from its release schedule, citing threats to theaters and safety concerns. However, this decision faced widespread criticism for capitulating the hackers’ demands. Sony reversed its decision, releasing the film online and in select theaters. The company also worked closely with the FBI and cybersecurity experts to address the vulnerabilities and enhance its digital security infrastructure. Sony Pictures’ executives issued public apologies, particularly for the sensitive content revealed in emails, and took steps to bolster internal and external communications.
Sony Pictures’ handling of the cyber attack drew mixed reactions. While some criticized the initial decision to pull the release of “The Interview,” others praised the eventual release strategy as a stand for creative freedom. The incident led to a reevaluation of security strategies across the entertainment industry. Financially, the cyber attack cost Sony Pictures an estimated $100 million, not including the damage to its reputation. Over time, Sony Pictures managed to recover, implementing stronger cybersecurity measures and continuing to produce successful films and TV shows. The crisis underscored the importance of robust digital security practices and crisis management in the digital age.
11. Crisis Management at Wells Fargo: The Account Fraud Scandal
Wells Fargo, established in 1852, is one of the largest financial services companies in the United States, providing banking, investment, mortgage products, and consumer and commercial finance through thousands of locations and ATMs, with a vast presence online and in mobile apps. Known traditionally for its customer-centric approach, Wells Fargo has played a pivotal role in developing the American West and the financial services sector across the country.
In 2016, Wells Fargo was embroiled in a major scandal after it came to light that numerous employees had covertly set up countless unauthorized accounts in the names of unwitting customers, in an effort to achieve sales quotas over multiple years. This scandal led to widespread consumer mistrust and several investigations by regulatory bodies, tarnishing the bank’s reputation and leading to significant financial penalties.
Wells Fargo took multiple steps to address the issues and reform their corporate practices. The bank fired over 5,300 employees involved in improper sales practices and eliminated sales goals for retail banking team members to prevent future abuses. Wells Fargo’s CEO resigned, and the company launched a nationwide advertising campaign to apologize to the public and its customers. The bank also agreed to pay $185 million in fines and provided millions in refunds to affected customers.
Wells Fargo’s response helped to address the immediate backlash and begin the process of rebuilding trust. The bank undertook extensive efforts to overhaul its corporate culture and governance structures to enhance transparency and accountability. Despite these efforts, Wells Fargo still faces ongoing challenges and scrutiny regarding its business practices, but it remains committed to rectifying past mistakes and restoring customer faith in its services. This incident is frequently highlighted as an essential lesson in the importance of ensuring that corporate actions are in harmony with consumer welfare and ethical norms.
Related: Role of Chief Strategy Officer in Crisis Management
12. Crisis Management at Mattel: The Toy Recalls
Established in 1945, Mattel stands as one of the giants in the toy industry, celebrated for its creation of beloved brands such as Barbie, Hot Wheels, and Fisher-Price. The company has long been celebrated for its innovative toys and commitment to safety and quality, fostering trust among generations of consumers.
In 2007, Mattel faced a significant crisis when it recalled over 19 million toys globally due to hazards from loose magnets and lead paint. These products were mostly manufactured in China, and the recall included popular items that risked harming children, severely impacting consumer confidence and exposing risks in global manufacturing and quality control processes.
Mattel responded to the crisis by swiftly recalling all affected products to prevent any harm to children. The company took full responsibility for the oversight and worked closely with regulatory agencies to ensure compliance with safety standards. Mattel implemented rigorous quality control systems to monitor and prevent future safety issues. Additionally, the CEO issued a public apology and made several media appearances to reassure the public of Mattel’s commitment to product safety and consumer trust.
Mattel’s proactive recall and transparent communications helped mitigate its reputation’s negative impact. The company’s decisive action and enhanced focus on product safety standards reinforced consumer trust. Mattel’s experience underscored the importance of rigorous product safety in manufacturing and the need for continuous improvement in quality control processes, especially when outsourcing production internationally. The crisis ultimately strengthened safety regulations in the toy industry, contributing to better practices industry-wide.
13. Crisis Management at Facebook: The Cambridge Analytica Scandal
Launched in 2004 by Mark Zuckerberg, Facebook quickly transitioned from a college social network to a global titan in social media, profoundly shaping how digital communication and media are consumed worldwide. With billions of users worldwide, Facebook has been at the forefront of technological innovation in social networking but has also faced significant scrutiny over privacy and data management practices.
In 2018, Facebook was thrust into a major scandal when it emerged that Cambridge Analytica, a political consultancy, had unauthorized access to the data of roughly 87 million users. It is alleged that this information was used for manipulating voting behavior in the 2016 US presidential campaign. The scandal raised serious questions about Facebook’s user privacy protections and data sharing policies, leading to a global outcry and demands for stricter regulations.
Facebook responded by apologizing publicly and taking significant steps to restrict third-party developers’ access to user data. The company also overhauled its privacy settings to give users more control over their information and launched a comprehensive review of existing apps accessing large amounts of user data. CEO Mark Zuckerberg testified before the U.S. Congress and the European Parliament to address concerns about Facebook’s data use and privacy policies.
Facebook’s handling of the Cambridge Analytica scandal has led to ongoing challenges, including legal actions and continued scrutiny by regulators worldwide. Despite these difficulties, Facebook has made substantial changes to improve transparency and user data protection. The crisis highlighted the need for greater accountability and regulatory oversight in the tech industry, prompting discussions about data privacy that continue to influence global policy and user expectations.
Related: Role of COO in Crisis Management
14. Crisis Management at Boeing: The 737 Max Grounding
Founded in 1916, Boeing ranks among the foremost aerospace entities globally, as a leading maker of commercial jetliners as well as security, space, and defense systems. Based in Chicago, Boeing is renowned for its innovative contributions to the aviation industry, including developing history’s most popular and influential aircraft.
Boeing faced one of the most significant crises in its history following two fatal crashes involving its 737 Max aircraft, first in Indonesia in October 2018 and then Ethiopia in March 2019. These catastrophic incidents, which together resulted in 346 fatalities, were attributed to defects in the plane’s Maneuvering Characteristics Augmentation System (MCAS). The revelations about these flaws and allegations of oversight lapses during the plane’s certification process led to a global grounding of all 737 Max aircraft. They severely damaged Boeing’s reputation for safety and reliability.
Boeing’s response to the crisis involved multiple steps to address the technical issues and restore trust with the public, regulators, and customers. The company halted deliveries of the 737 Max and focused on fixing the MCAS software to address the system’s vulnerabilities. Boeing worked closely with aviation authorities worldwide to ensure the revised system met safety standards. Additionally, Boeing established a $100 million fund to support the families and communities of the victims of the two crashes. The company also saw a leadership change, with CEO Dennis Muilenburg resigning to pave the way for a renewed corporate focus on safety and quality.
The 737 Max was grounded worldwide for nearly two years while Boeing worked to fix the issues and regain certification. The financial impact on Boeing was profound, with billions in lost revenue and additional costs. The crisis also sparked broader debates about aviation safety and regulatory oversight. While the 737 Max has since returned to service after extensive reviews and modifications, Boeing continues to work on rebuilding trust and demonstrating its commitment to safety. The long-term effects of the crisis on Boeing’s brand and financial health remain significant, underscoring the importance of stringent safety standards and transparent corporate governance.
15. Crisis Management at Target: The 2013 Data Brea
Target Corporation, founded in 1902 and headquartered in Minneapolis, Minnesota, is one of the largest retail chains in the United States. Known for offering various goods from clothing to electronics, Target prides itself on providing high-quality products at affordable prices, appealing to a wide demographic of shoppers with its trendy, upscale, yet budget-friendly product selections.
In December 2013, Target announced that it had been the victim of an extensive data breach, which compromised the personal and payment information of approximately 40 million customers. The breach occurred during the critical holiday shopping season and involved the theft of data from credit and debit cards used at Target’s stores. The breach exposed Target to significant financial losses and damaged its reputation for customer security and trust.
Target responded to the data breach by promptly informing the public and cooperating fully with law enforcement to investigate the security lapse. The company offered affected customers free credit monitoring and identity theft protection to mitigate the damage and prevent future fraud. Target also undertook a major overhaul of its security systems, implementing advanced technology like chip-and-PIN card readers at its registers to enhance security. Additionally, the company made significant changes in its executive leadership, including the resignation of its CEO, to reassure the public and stakeholders of its commitment to addressing the issue comprehensively.
The aftermath of the data breach saw Target grappling with lawsuits and a decline in consumer confidence, which temporarily impacted sales and stock prices. However, the retailer’s transparent handling of the situation and substantial investments in cybersecurity have helped it to slowly regain trust. Target’s extensive security upgrades and its efforts to address customer concerns proactively set a new standard for how retailers handle data security. Despite the initial fallout, Target has maintained its position as a leading retailer, demonstrating the resilience and importance of robust crisis management and recovery strategies.
Related: How to Handle Wealth Management Crisis?
The journey through these 10 corporate crisis management case studies reveals a common theme: the paramount importance of handling crises with strategic foresight and ethical consideration. These companies faced various repercussions, from financial losses to reputational damage, yet those who emerged stronger did so through comprehensive planning, clear communication, and genuine accountability. This collection not only showcases the trials faced by organizations during critical times but also highlights how crises can serve as catalysts for organizational recovery and enhancement. For businesses worldwide, these narratives offer more than cautionary tales; they provide a blueprint for developing robust mechanisms to weather storms and safeguard both stakeholders’ interests and corporate legacies.
- Top 15 Dressing Tips for Remote Working [2024]
- 5 Digital Transformation in FMCG Case Studies [2024]
Team DigitalDefynd
We help you find the best courses, certifications, and tutorials online. Hundreds of experts come together to handpick these recommendations based on decades of collective experience. So far we have served 4 Million+ satisfied learners and counting.
How to sell Online Courses using SEO? [2024]
8 Types of Careers in Human Resources Management [2024]
How Wealth Managers Can Incorporate PE into Client Portfolios? [2024]
Does Your Business Need a CDO? [10 Points to Ponder] [2024]
How to Implement Agile Principles in Non-Engineering Teams? [2024]
20 Product Management Failure Examples [2024]
When a significant IT-related service disruption occurred, a major Transport Agency realised the need to formalise their major incident management process. Kirk Penn, from Service Management Specialist, was engaged to develop a single, standardised process for responding, restoring, and recovering from major IT incidents.
The Transport Agency IT group supported over 30,000 internal staff and hundreds of complex systems. While the existing day-to-day incident management process worked well for lower priority incidents, major disruptions left the agency struggling to cope. The challenge was to establish a cross-agency major incident working group and get everyone to follow a single way of working under pressure, amidst competing operational priorities.
Kirk developed a strawman major incident communications model and approach that clarified inputs and triggers, roles and responsibilities, communication guidelines, and governance for technical and management conference bridges during a major incident. He also drafted a policy and process, creating simplified one-page overviews for each stakeholder to understand their role during each phase of the MIM process. Senior management were briefed, and a significant rollout campaign, including simulations, was undertaken to ensure all stakeholders were clear on their contributions in the event of a major incident.
The major incident management process was successfully implemented and remains a stable and valuable capability within the transport agency IT group. As a result, the Transport for NSW IT leadership gained confidence and endorsed further investment into resources and the centralisation of the MIM function. Additionally, a version of the MIM communication model was adopted for managing P2 incidents, providing increased controls for lower priority incidents and reducing the likelihood of these becoming major incidents.
IMAGES
COMMENTS
Case study: Major availability incident hits a web performance company So this is the case study that we have. We have a web performance and security company that offers CDN, DNS, and DDoS protection to many web sites.
Incident Management Case Studies Contact Enquiries regarding the content and any use of this document are welcome at: The Australian Institute for Disaster Resilience Level 1, 340 Albert Street, East Melbourne Vic 3002 Telephone +61 (0) 3 9419 2388 Email [email protected] This document complements Incident Management (2023).
Feb 5, 2019 · The case study highlights how a digital communications company reduced the time to engage stakeholders in incidents from minutes to seconds using a mobile app for targeted messaging and automated escalations. The document provides questions to start a dialogue on improving preparation, response, and reflection on major incident management ...
Major incident investigation board: A group responsible for investigation and change management. An incident management solution like Jira Service Management will help in each step of the response process, from organizing your on-call schedule and alerting to unifying teams for better collaboration to running incident postmortems.
12 things to consider when buying an Incident Response Software; From to Chalk to Marker to ICR Whiteboards; 10 Tips – Creating Great Crisis Management Team Workflows; Crisis Management Teams need to use workflows; Manage Business Continuity Incidents & Reduce Risk; Crisis Management in the 21st Century (with Les Allan, Heriot-Watt University)
This role can’t be combined with the incident management role, due to the well-known conflict of interests between the incident management and problem management processes. The major incident team will be struggling to restore the service, and problem management tends to take its time finding the root cause. Change manager.
Feb 22, 2023 · In 2017, British Airways experienced a major IT outage that affected its check-in, baggage handling, and customer service systems. ... Incident management case studies provide valuable insights ...
Related: Business Analysis Case Studies . 10. Crisis Management at Sony Pictures: The 2014 Cyber Attack Company Profile. Sony Pictures Entertainment, a major division of Sony Corporation, is a globally prominent entertainment firm based in Culver City, California.
The Major Incident Management Room is a room that is assigned to be commandeered by the Major Incident Team in the event of a Major Incident, regardless of it being used at the time or not. The room should be kitted out with the necessary kit and facilities to manage
SMS Case Study 4 | Developing a Single Enterprise Major Incident Management Process for a Transport Agency.