What We Can Learn from the Recent Microsoft Outage
As people who live and breathe SaaS, this incident got us thinking. What can we learn from it? How can we, as startup founders, make sure we’re better prepared when (not if) something similar happens
Hey folks! There is a high possibility that you were one of the 8.5 million people affected by the Microsoft outage. If not, you’ve at least heard about it or seen the memes/ posts by disgruntled flyers at airports, happy employees with an excuse not to work, and panicked bank executives.
It was a big deal. You're sipping your morning coffee one minute, and the next, your trusty Microsoft services are down. This outage caused the ‘blue screen of death’ globally, with users unable to access their systems.
As people who live and breathe SaaS, this incident got us thinking.
What can we learn from it? How can we, as startup founders, make sure we’re better prepared when (not if) something similar happens to us?
Let’s dive into it.
First off, it was a wake-up call for all tech companies and users: outages happen. Even to the giants. This outage was a simple logic error triggered by a new security update from Crowdstrike. Microsoft, with all its resources, couldn’t avoid it. Both Microsoft and Crowdstrike are giants in the tech industry.
For us in the startup world, it’s a humbling reminder that no matter how robust we think our systems are, there’s always a possibility of things going sideways. So, what should we do about it?
Building Resilience
We hear the word "resilience" tossed around a lot, but what does it really mean for us? It’s not just about having backups – it’s about thinking ahead and being ready for the unexpected. During the Microsoft outage, businesses with solid contingency plans in place managed to keep their operations running smoothly. But, then, there were some that had to go back to the stone age. For instance, Indigo had to handwrite boarding passes and manually check in passengers while handling scheduling issues. This led to poor flyer experiences. If only they had a contingency in place. So, ask yourself: what’s your backup plan? Are your systems designed to handle a hiccup without falling apart?
Think about multi-cloud strategies. If one cloud provider goes down, can your services switch over to another seamlessly? It’s worth investing in. And yes, it might feel like you’re prepping for a disaster that may never come, but trust me when it does, you’ll be glad you did. Just like we had no contingency for a pandemic before Covid hit, but now most businesses do.
Communication Is Key
Let’s be real for a second. Outages are stressful. Your team will feel the pressure, and so will your customers. Being transparent and supportive goes a long way. Clear communication, empathy, and a calm approach can help everyone stay focused and productive.
For your customers, empathy is crucial. Acknowledge the inconvenience, offer support, and, if necessary, make it up to them. One thing Microsoft did relatively well during the outage was communication. They kept their users in the loop, albeit it took them a little longer than it should have. Imagine you’re running a marathon, and halfway through, you realise the route has changed. You’d want someone to tell you, right? Your customers feel the same during an outage. They need updates, even if it’s just to say, “Hey, we’re working on it.”
Have a communication plan in place. Use your social media channels, email, and any other platforms to keep your customers, internal and external, informed. It’s not just about fixing the problem but also about maintaining trust. Trust is key, especially when a startup is trying to build good customer relationships and loyalty.
Monitoring is key
If you’re not already monitoring your systems 24/7, start now. Seriously. The sooner you catch an issue, the quicker you can address it. During the Microsoft outage, their monitoring systems helped them pinpoint the problem faster. Crowdstrike, on the other hand, didn’t properly monitor their code during the new update. Use the best tools you can afford and ensure your team knows how to interpret the data.
Learn and Improve
Here’s the thing: an outage isn’t just a crisis. It’s also a learning opportunity. After the dust settles, do a deep dive into what happened. What went wrong? How did your team handle it? What can you do better next time? Don’t just look at it as a problem in the past but a lesson learned for the future.
Microsoft and Crowdstrike, like any smart companies, should analyse this outage and make changes to prevent it from happening again. You should do the same. Every hiccup is a chance to improve; try to look at the glass half full.
Wrapping up
The Microsoft outage was a wake-up call for all of us in the tech world. It reminded us that no one is immune to technical issues. But with the right preparation, communication, and continuous improvement, we can turn these challenges into opportunities for growth.
So, fellow founders, let’s take this as a lesson. Let’s build resilient systems, communicate openly with our customers, and always strive to learn and improve. After all, in the tech world, it’s not about avoiding problems altogether – it’s about how we handle them when they come.