If it's true that a bad patch was the reason for this I assume someone, or multiple people, will have a really bad day today. Makes me wonder what kind of testing they have in place for patches like this, normally I wouldn't expect something to go out immediately to all clients but rather a gradual rollout. But who knows, Microsoft keeps their master keys on a USB stick while selling cloud HSM so maybe Crowdstrike just yolos their critical software updates as well while selling security software to the world.
Sounds like it was a 'channel file' which I think is akin to an av definition file that caused the problem rather than an actual software change. So they must have had a bug lurking in their kernel driver which was uncovered by a particular channel file. Still, seems like someone skipped some testing.
How about a try-catch block? The software reading the definition file should be minimally resilient against malformed input. That's like programming 101.
Reputational damage from this is going to be catastrophic. Even if that’s the limit of their liability it’s hard not to see customers leaving en masse.
Ironically some /r/wallstreetbets poster put out an ill-informed “due diligence” post 11 hours ago concerning CrowdStrike being not worth $83 billion and placing puts on the stock.
Everybody took the piss out of them for the post. Now they are quite likely to become very rich.
Not sure what material in their post is ill-informed. Looks like what happened today is exactly what that poster warned of in one of their bullet points.
Yea, everyone is dunking on OP here. But they essentially said that crowdstrike's customers were all vulnerable to something like this. And we saw a similar thing play out only a few years ago with SolarWinds. It's not surprising that this happened. Ofc with making money the timing is the crucial part which is hard to predict.
Is the alternative "mass hacking"? I thought all this software did was check a box on some compliance list. And slow down everyone's work laptop by unnecessarily scanning the same files over and over again.
As someone said earlier in these comments the software is required if you want to operate with government entities. So until that requirement changes it is not going anywhere and continues to print money for the company.
But then, if what you say is true and their software is indeed mandatory in some context, they also have no incentive or motivation to care about the quality of their product, about it bringing actual value or even about it being reliable.
They may just misuse this unique position in the market and squeeze as much profit from it as possible.
The mere fact that there exists such a position in the market is, in my opinion, a problem because it creates an entity which has a guaranteed revenue stream while having no incentive to actually deliver material results.
If the government agencies insist on using this particular product then you're right. If it's a choice between many such products than there should be some competition between them.
From experiencing different AV products at various jobs, they all use kernel level code to do their thing, so any one of them can have this situation happen.
You, the admin, don't get to see what Falcon is doing before it does it.
Your security ppl. have a dashboard that might show them alerts from selected systems if they've configured it, but Crowdstrike central can send commands to agents without any approval whatsoever.
We had a general login/build host at my site that users began having terrible problems using. Configure/compile stuff was breaking all the time. We thought...corrupted source downloads, bad compiler version, faulty RAM...finally, we started running repeated test builds.
Guy from our security org then calls us. He says: "Crowdstrike thinks someone has gotten onto linux host <host>, and has been trying to setup exploits for it and other machines on the network; it's been killing off the suspicious processes but they keep coming back..."
We had to explain to our security that it was a machine where people were expected to be building software, and that perhaps they could explain this to CS.
"No problem; they'll put in an exception for that particular use. Just let us know if you might running anything else unusual that might trigger CS."
TL;DR-please submit a formal whitelist request for every single executable on your linux box so that our corporate-mandate spyware doesn't break everyone's workflow with no warning.
Extremely unlikely. This isn't the first blowup Crowdstrike has had; though it's the worst (IIRC), Crowdstrike is "too big to fail" with tons of enterprise customers who have insane switching costs, even after this nonsense.
Unfortunately for all of us, Crowdstrike will be around for awhile.
Businesses would be crazy to continue with Crowdstrike after this. It's going to cause billions in losses to a huge number of companies. If I was a risk assessment officer at a large company I'd be speed dialling every alternative right now.
A friend of mine who used to work for Crowdstrike tells me they're a hot mess internally and it's amazing they haven't had worse problems than this already.
That sounds like any other companies I have ever worked for: looks great from the outside but a hot mess on the inside.
I have never worked for a company where everything is smooth sailing.
What I noticed is that the smaller the company, the less hot mess they are but at the same time they're also struggling to pay the bill because they don't innovate fast.
As at 4am NY time CRWD has lost $10Bn (~13%) in marketcap. Of course they've tested, but just not enough for this issue (as is often the case).
This is probably several seemingly non consequential issues coming together.
I'm not sure why though, when the system is this important that even successfully tested updates aren't rolled out piecemeal though (or perhaps it has and we're only seeing the result of partial failures around the world)
Testing is never enough. In fact, it won't catch 99% of issues by the virtue of them often testing happy paths only, or that they test what humans can think of, and by no means they are exhaustive.
A robust canarying mechanism is the only way you can limit the blast radius.
Set up A/B testing infra at the binary level so you can ship updates selectively and compare their metrics.
Been doing this for more than 10 years now, it's the ONLY way.
I'm not sure that justifies potentially bricking the devices of hundreds(?) of your clients by shipping untested updates to them. Of course it depends... and would require deeper financial analysis.
> They won't be able to test exhaustively every failure mode that could lead to such issues.
That might be acceptable. My point is that if you are incapable of having even absolutely basic automated tests (that would take a few minutes at most) for extremely impactful software like this starting with something more complex seems like a waste of time (clearly the company is run by incompetent people so they'd just mess it up)
And when it’s more costly for customers to walk back the mistake of adopting your service.
Yeah, I get the impression a lot of SaaS companies operate on this model these days. We just signed with a relatively unknown CI platform, because they were available for support during our evaluation. I wonder how available they’ll be when we have a contract in place…
Doesn't matter what testing exists. More scale. More complexity. More Bugs.
Its like building a gigantic factory farm. And then realizing that environment itself is the birthing chamber and breeding ground of superbugs with the capacity to wipe out everything.
I used to work at a global response center for big tech once upon a time. We would get hundreds of issues, we couldn't replicate cause we literally have to set up our own govt or airline or bank or telco to test certain things.
So I used to joke with the corporate robots to just hurry up and take over govts, airlines, banks and telcos already, cause thats the only path to better control.
> Its like building a gigantic factory farm. And then realizing that environment itself is the birthing chamber and breeding ground of superbugs with the capacity to wipe out everything.
Testing + a careful incremental rollout in stages is the solution. Don't patch all systems world-wide at once, start with a few, add a few more, etc. Choose them randomly.