If it's true that a bad patch was the reason for this I assume someone, or multi...

tempaway4575144 · on July 19, 2024

Sounds like it was a 'channel file' which I think is akin to an av definition file that caused the problem rather than an actual software change. So they must have had a bug lurking in their kernel driver which was uncovered by a particular channel file. Still, seems like someone skipped some testing.

https://x.com/George_Kurtz/status/1814235001745027317

https://x.com/brody_n77/status/1814185935476863321

JonChesterfield · on July 19, 2024

The parser crashing the system on a malformed input file strongly suggests their software stack in general is trash

Sohcahtoa82 · on July 19, 2024

Sounds like something a fuzzer likely would have found pretty quickly.

camdenreslink · on July 19, 2024

How about a try-catch block? The software reading the definition file should be minimally resilient against malformed input. That's like programming 101.

caput770 · on July 19, 2024

A badpage fault in a kernel driver doesn't exactly recover from exceptions like that

nilsb · on July 19, 2024

Who needs testing when apologizing to your customers is cheaper?

agrajag · on July 19, 2024

Reputational damage from this is going to be catastrophic. Even if that’s the limit of their liability it’s hard not to see customers leaving en masse.

junto · on July 19, 2024

Ironically some /r/wallstreetbets poster put out an ill-informed “due diligence” post 11 hours ago concerning CrowdStrike being not worth $83 billion and placing puts on the stock.

Everybody took the piss out of them for the post. Now they are quite likely to become very rich.

https://www.reddit.com/r/wallstreetbets/s/jJ6xHewXXp

persedes · on July 19, 2024

What's even better is the reaction here: https://www.reddit.com/r/sysadmin/comments/1e6vx6n/comment/l...

RateMyPE · on July 19, 2024

That user is the equivalent of using a screwdriver to look for gold and succeeding.

deliveryboyman · on July 19, 2024

Not sure what material in their post is ill-informed. Looks like what happened today is exactly what that poster warned of in one of their bullet points.

rozap · on July 19, 2024

Yea, everyone is dunking on OP here. But they essentially said that crowdstrike's customers were all vulnerable to something like this. And we saw a similar thing play out only a few years ago with SolarWinds. It's not surprising that this happened. Ofc with making money the timing is the crucial part which is hard to predict.

BoringTimesGang · on July 19, 2024

A convenient alibi?

dandanua · on July 19, 2024

The company will perish, there is no doubt in that.

esskay · on July 19, 2024

Nah they'll be fine. It happened 7 months ago on a smaller scale, people forgot about that pretty quickly.

You don't ditch the product over something like this as the alternative is mass hacking.

pbalcer · on July 19, 2024

Is the alternative "mass hacking"? I thought all this software did was check a box on some compliance list. And slow down everyone's work laptop by unnecessarily scanning the same files over and over again.

hello_moto · on July 20, 2024

I assume you're not in Sec industry?

This sounds like someone who said "dropbox ain't hard to implement"

daemin · on July 19, 2024

As someone said earlier in these comments the software is required if you want to operate with government entities. So until that requirement changes it is not going anywhere and continues to print money for the company.

pbasista · on July 19, 2024

But then, if what you say is true and their software is indeed mandatory in some context, they also have no incentive or motivation to care about the quality of their product, about it bringing actual value or even about it being reliable.

They may just misuse this unique position in the market and squeeze as much profit from it as possible.

The mere fact that there exists such a position in the market is, in my opinion, a problem because it creates an entity which has a guaranteed revenue stream while having no incentive to actually deliver material results.

daemin · on July 19, 2024

If the government agencies insist on using this particular product then you're right. If it's a choice between many such products than there should be some competition between them.

bdd8f1df777b · on July 19, 2024

Surely there are more than one anti-virus that can check the audit box?

daemin · on July 19, 2024

From experiencing different AV products at various jobs, they all use kernel level code to do their thing, so any one of them can have this situation happen.

camdenreslink · on July 19, 2024

Presumably those other companies try running things at least once before pushing it to the entire world though.

daemin · on July 19, 2024

I'd kind of expect IT administrators to try out these updates on a staging machine before fully deploying to all critical systems. But here we are.

linksnapzz · on July 19, 2024

You, the admin, don't get to see what Falcon is doing before it does it.

Your security ppl. have a dashboard that might show them alerts from selected systems if they've configured it, but Crowdstrike central can send commands to agents without any approval whatsoever.

We had a general login/build host at my site that users began having terrible problems using. Configure/compile stuff was breaking all the time. We thought...corrupted source downloads, bad compiler version, faulty RAM...finally, we started running repeated test builds.

Guy from our security org then calls us. He says: "Crowdstrike thinks someone has gotten onto linux host <host>, and has been trying to setup exploits for it and other machines on the network; it's been killing off the suspicious processes but they keep coming back..."

We had to explain to our security that it was a machine where people were expected to be building software, and that perhaps they could explain this to CS.

"No problem; they'll put in an exception for that particular use. Just let us know if you might running anything else unusual that might trigger CS."

TL;DR-please submit a formal whitelist request for every single executable on your linux box so that our corporate-mandate spyware doesn't break everyone's workflow with no warning.

hello_moto · on July 20, 2024

EDR stands for Endpoint Detection and Response.

People don't realize there's that last bit: Response, what do you do when something is Detected.

That's your Admin setup.

bdd8f1df777b · on July 20, 2024

Some of them might have saner rollout strategy and/or better quality control.

hello_moto · on July 20, 2024

AV definition needs to be roll out quickly for 0day.

Developers aren't used to security lifecycle so quite a few commenters in this thread equates SDLC and Security

icelancer · on July 19, 2024

Extremely unlikely. This isn't the first blowup Crowdstrike has had; though it's the worst (IIRC), Crowdstrike is "too big to fail" with tons of enterprise customers who have insane switching costs, even after this nonsense.

Unfortunately for all of us, Crowdstrike will be around for awhile.

zik · on July 19, 2024

Businesses would be crazy to continue with Crowdstrike after this. It's going to cause billions in losses to a huge number of companies. If I was a risk assessment officer at a large company I'd be speed dialling every alternative right now.

hello_moto · on July 20, 2024

Cybersecurity industry has regular and annual security testing/competitions done by various Organizations that simulates tons of attacks.

Vendors are tested against these cases and graded with their effectiveness.

I heard Crowdstrike is "best-in-market" for good reasons as others who have more deep knowledge of the industry have shared in this thread.

zik · on July 21, 2024

> I heard Crowdstrike is "best-in-market"

A friend of mine who used to work for Crowdstrike tells me they're a hot mess internally and it's amazing they haven't had worse problems than this already.

hello_moto · on July 21, 2024

That sounds like any other companies I have ever worked for: looks great from the outside but a hot mess on the inside.

I have never worked for a company where everything is smooth sailing.

What I noticed is that the smaller the company, the less hot mess they are but at the same time they're also struggling to pay the bill because they don't innovate fast.

ajscanlan · on July 19, 2024

it would be crazy not to at least investigate migration paths away from Crowdstrike, or better redundancies for yourself

tcmart14 · on July 19, 2024

While it probably should, I regret to inform you that SolarWinds is still alive and well.

alch- · on July 19, 2024

I mean, Boeing is still around...

falcor84 · on July 19, 2024

I would assume that its enterprise customers have an uptime SLA as part of their contract, and that breaching it isn't very cheap for Crowdstrike.

jsiepkes · on July 19, 2024

I highly doubt their SLA says something about compensating for damages. At most you won't have to pay for the time they were down.

And even more ironically; A botched update doesn't mean they are down. It means you are down. So I don't even think their SLA applies to this.

InsideOutSanta · on July 19, 2024

Yeah, they'll pay with "credits" for the downtime, if what is currently happening even technically qualifies as downtime.

perbu · on July 19, 2024

Software doesn't have uptime guarantees. They might have time-to-fix on critical issues, though.

I assume this is gross negligence, which would leave them open to claims made through courts, though.

helsinkiandrew · on July 19, 2024

As at 4am NY time CRWD has lost $10Bn (~13%) in marketcap. Of course they've tested, but just not enough for this issue (as is often the case).

This is probably several seemingly non consequential issues coming together.

I'm not sure why though, when the system is this important that even successfully tested updates aren't rolled out piecemeal though (or perhaps it has and we're only seeing the result of partial failures around the world)

tehlike · on July 19, 2024

Testing is never enough. In fact, it won't catch 99% of issues by the virtue of them often testing happy paths only, or that they test what humans can think of, and by no means they are exhaustive.

A robust canarying mechanism is the only way you can limit the blast radius.

Set up A/B testing infra at the binary level so you can ship updates selectively and compare their metrics.

Been doing this for more than 10 years now, it's the ONLY way.

Testing is not.

wwtrv · on July 19, 2024

Depends on what you mean by enough. It should be more than enough to catch issues like this one specifically.

If they can't even manage that they'll fail at your approach as well.

tehlike · on July 19, 2024

Canary offers more bang for the buck, and is much easier to set up. So I kind of disagree.

wwtrv · on July 19, 2024

> Canary offers more bang for the buck

I'm not sure that justifies potentially bricking the devices of hundreds(?) of your clients by shipping untested updates to them. Of course it depends... and would require deeper financial analysis.

tehlike · on July 19, 2024

They won't be able to test exhaustively every failure mode that could lead to such issues.

That's why canaries are easier and more "economical" to implement and gives better value per unit effort.

wwtrv · on July 22, 2024

> They won't be able to test exhaustively every failure mode that could lead to such issues.

That might be acceptable. My point is that if you are incapable of having even absolutely basic automated tests (that would take a few minutes at most) for extremely impactful software like this starting with something more complex seems like a waste of time (clearly the company is run by incompetent people so they'd just mess it up)

Capricorn2481 · on July 19, 2024

But they can test obvious failure modes like this one. You need both.

kjkjadksj · on July 19, 2024

Exactly. They knocked half the world offline probably killed thousands in ERs and the stock is only down to about June lows.

dclowd9901 · on July 19, 2024

And when it’s more costly for customers to walk back the mistake of adopting your service.

Yeah, I get the impression a lot of SaaS companies operate on this model these days. We just signed with a relatively unknown CI platform, because they were available for support during our evaluation. I wonder how available they’ll be when we have a contract in place…

krspnda · on July 19, 2024

hah that tweet was one heck of an apology. "we deployed a fix to the issue, speak with your customer rep"

hello_moto · on July 20, 2024

Unfortunately cybersecurity still revolves around obscurity.

cen4 · on July 19, 2024

Doesn't matter what testing exists. More scale. More complexity. More Bugs.

Its like building a gigantic factory farm. And then realizing that environment itself is the birthing chamber and breeding ground of superbugs with the capacity to wipe out everything.

I used to work at a global response center for big tech once upon a time. We would get hundreds of issues, we couldn't replicate cause we literally have to set up our own govt or airline or bank or telco to test certain things.

So I used to joke with the corporate robots to just hurry up and take over govts, airlines, banks and telcos already, cause thats the only path to better control.

roemerb · on July 19, 2024

> Its like building a gigantic factory farm. And then realizing that environment itself is the birthing chamber and breeding ground of superbugs with the capacity to wipe out everything.

Factorio player detected

jonathanstrange · on July 19, 2024

Testing + a careful incremental rollout in stages is the solution. Don't patch all systems world-wide at once, start with a few, add a few more, etc. Choose them randomly.

jachee · on July 19, 2024

Here's hoping they start from the top.

They won't, but hope springs eternal.