As the world recovers from the largest IT outage in history, it shows the danger of one point of failure in IT infrastructure

A global IT failure wreaked havoc on Friday, grounding flights and disrupting everything from hospitals to government agencies. Over all the chaos hung a question: how did a flawed update to Microsoft Windows software bring large swaths of society to a screeching halt?

The problem originated with an Austin, Texas-based cybersecurity firm called CrowdStrike, relied upon by most of the global technology industry, including Microsoft, for its Falcon program, which blocks the execution of malware and cyber-attacks. Falcon protects devices by securing access to a wide range of internal systems and automatically updating its defenses – a level of integration that means if Falcon falters, the computer is close behind. After CrowdStrike updated Falcon on Thursday night, Microsoft systems and Windows PCs were hit with a “blue screen of death” and rendered unusable as they were trapped in a recovery boot loop.

Microsoft is a juggernaut with significant market power, dominating cloud-computing infrastructure across Europe and the United States. So it wasn’t just computers that were affected, but servers and a host of other systems as well. Overwhelming requests from users, devices, services and businesses ushered in a cascading series of failures with Microsoft products – namely Azure Cloud and Microsoft 365. Failures plaguing Azure led to additional but separate disruptions with 365 services. A giant clusterfuck ensued.

  • yggdar@lemmy.world
    link
    fedilink
    arrow-up
    26
    ·
    edit-2
    4 months ago

    Am I missing something? I thought the outage was caused by CrowdStrike and had nothing to do with Microsoft or Windows?

    • pycorax@lemmy.world
      link
      fedilink
      arrow-up
      11
      ·
      4 months ago

      The article actually talks about Azure which was using CrowdStrike internally so their point is valid but the headline is absolutely wrong. Azure is nowhere near a monopoly and it ends up implying that Windows, now Azure was the issue they’re describing.

    • Blaster M@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      4 months ago

      This is the typical Guardian sensationalism. Gotta make it look like it was Microsoft’s fault, although this one is square on CrowdStrike’s head. Imagine if a security update for a remote administration tool caused an on-boot kernel panic on every linux server in the world…

    • hangonasecond@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      4 months ago

      Microsoft’s use of CrowdStrike meant that a significant number of their cloud and SaaS offerings also failed, impacting users who likely didn’t know what CrowdStrike was.

    • TrickDacy@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      4 months ago

      This is the extremely important akshually line anyway. Let’s all pretend that every OS is just as shitty because it lets us correct others on the Internet constantly

    • EtherWhack@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      4 months ago

      Only systems running CloudStrike were affected, but all systems were Windows-based as that is the only OS it works with.

      I think it’s more touching on the vulnerability of infrastructure if a larger portion is run by only one OS. Something a lot of usb here may realize, but the general public has never really understood it. Where a scenario like this or similar can can cause a wide-spread blackout, all from a single bug; be it from popular software, or the OS itself.

      • ImADifferentBird
        link
        fedilink
        English
        arrow-up
        8
        ·
        edit-2
        4 months ago

        That’s not correct. Crowdstrike does also work with Mac and Linux, but this particular incident only impacted the Windows sensor.

        They actually had a similar issue with the Linux sensor a couple of months ago, which… doesn’t speak well of their update process.