The CrowdStrike Windows outage that hit the world this week stems back to an EU-Microsoft deal from 2009 that meant Microsoft had to give antivirus vendors the same Windows API access it had.
Personally, I don’t see the issue. Microsoft shouldn’t be responsible for when a third party creates a buggy kernel module.
And when you, as a company, decide to effectively install a low-level rootkit on all your machines in hopes that it will protect you against whatever, you accept the potential side effects. Last week, those side effects occurred.
Hard to say yet, if Microsoft is responsible or not. The thing is they certified it, as a stable and tested driver. But it isn’t just a driver, but an interpreter/loader that loads code at runtime and executes it. In kernel mode. If Microsoft knew this they’re definitely responsible for certifying it, but maybe crowdstrike hid this behavior until it was deployed to the customers.
A configuration file shouldn’t crash the kernel. I don’t understand how this solution could pass the certification. I don’t know the criteria of course, but on the surface it sounds like Crowdstrike created a workaround, and Microsoft either missed or allowed it.
How would you prove that no input exists that could crash a piece of code? The potential search space is enormous. Microsoft can’t prevent drivers from accepting external input, so there’s always a risk that something could trigger an undetected error in the code. Microsoft certainly ought to be fuzz testing drivers it certifies but that will only catch low hanging fruit. Unless they can see the source code, it’s hard to determine for sure that there are no memory safety bugs.
The driver developers are the ones with the source code and should have been using analysis tools to find these kinds of memory safety errors. Or they could have written it in a memory safe language like Rust.
You don’t need to prove that no input can crash the code. “Exhaustive testing is not possible” is one of the core testing principles, ISTQB teaches that. As far as we know, the input was a file filled with zeroes, and not some subtle configuration or instruction. That can definitely be expected, tested, and handled.
That said, their preliminary incident review doesn’t give us much to go on as to what was wrong with the file.
You’re speculating that it was something easy to test for by a third party. It certainly could have been but I would hope it’s a more subtle bug which, as you say, can’t be exhaustively tested for. Source code analysis definitely would have surfaced this bug so either they didn’t bother looking or didn’t bother fixing it.
You’re speculating that it was something easy to test for by a third party.
Based on the data that I have, which is of course very limited! I didn’t know about the recent news regarding the null bytes, thank you for sharing this info.
AFAIK, blue screen doesn’t mean kernel crash. Hell, windows crashing isn’t even rare.
Certification doesn’t mean it has Microsoft seal of approval either, only that it comes from a certified and approved vendor, with some checks at best.
Config files are not part of the driver, ever. How do you think you can change the settings of you GPU without asking Microsoft?
But hey, if you are so willing to blame Microsoft for the one time it’s not their fault, may I talk to you about our Lord Savior Linux? In my office we only knew because of the memes.
Personally, I don’t see the issue. Microsoft shouldn’t be responsible for when a third party creates a buggy kernel module.
And when you, as a company, decide to effectively install a low-level rootkit on all your machines in hopes that it will protect you against whatever, you accept the potential side effects. Last week, those side effects occurred.
Hard to say yet, if Microsoft is responsible or not. The thing is they certified it, as a stable and tested driver. But it isn’t just a driver, but an interpreter/loader that loads code at runtime and executes it. In kernel mode. If Microsoft knew this they’re definitely responsible for certifying it, but maybe crowdstrike hid this behavior until it was deployed to the customers.
It was my understanding that this wasn’t certified. Crowdstrike circumvented the signing process.
The driver was signed, the issue was with a configuration file for that’s not part of the driver.
A configuration file shouldn’t crash the kernel. I don’t understand how this solution could pass the certification. I don’t know the criteria of course, but on the surface it sounds like Crowdstrike created a workaround, and Microsoft either missed or allowed it.
How would you prove that no input exists that could crash a piece of code? The potential search space is enormous. Microsoft can’t prevent drivers from accepting external input, so there’s always a risk that something could trigger an undetected error in the code. Microsoft certainly ought to be fuzz testing drivers it certifies but that will only catch low hanging fruit. Unless they can see the source code, it’s hard to determine for sure that there are no memory safety bugs.
The driver developers are the ones with the source code and should have been using analysis tools to find these kinds of memory safety errors. Or they could have written it in a memory safe language like Rust.
You don’t need to prove that no input can crash the code. “Exhaustive testing is not possible” is one of the core testing principles, ISTQB teaches that. As far as we know, the input was a file filled with zeroes, and not some subtle configuration or instruction. That can definitely be expected, tested, and handled.
CrowdStrike have said that was not the problem:
That said, their preliminary incident review doesn’t give us much to go on as to what was wrong with the file.
You’re speculating that it was something easy to test for by a third party. It certainly could have been but I would hope it’s a more subtle bug which, as you say, can’t be exhaustively tested for. Source code analysis definitely would have surfaced this bug so either they didn’t bother looking or didn’t bother fixing it.
Based on the data that I have, which is of course very limited! I didn’t know about the recent news regarding the null bytes, thank you for sharing this info.
AFAIK, blue screen doesn’t mean kernel crash. Hell, windows crashing isn’t even rare.
Certification doesn’t mean it has Microsoft seal of approval either, only that it comes from a certified and approved vendor, with some checks at best.
Config files are not part of the driver, ever. How do you think you can change the settings of you GPU without asking Microsoft?
But hey, if you are so willing to blame Microsoft for the one time it’s not their fault, may I talk to you about our Lord Savior Linux? In my office we only knew because of the memes.
Maybe it should be. At least part of the package that’s signed.