Disaster Cloudflare has admitted that one of its engineers "stepped beyond the bounds of its policies" and throttled traffic to a customer's website. - Website and API became unresponsive due to extensive throttling

UPDATED Cloudflare has admitted that one of its engineers stepped beyond the bounds of its policies and throttled traffic to a customer's website.

The internet-grooming outfit has 'fessed up to the incident and explained it started on February 2 when a network engineer "received an alert for a congesting interface" between an Equinix datacenter and a Cloudflare facilit

Cloudflare's post about the matter states that such alerts aren't unusual – but this one was due to a sudden and extreme spike of traffic and had occurred twice in successive day

"The engineer in charge identified the customer's domain … as being responsible for this sudden spike of traffic between Cloudflare and their origin network, a storage provider," the post states. "Traffic from this customer went suddenly from an average of 1,500 requests per second, and a 0.5MB payload per request, to 3,000 requests per second (2x) and more than 12MB payload per request (25x)

As the spike created congestion on a physical interface, it impacted many Cloudflare customers and peer

Cloudflare's automated remedies swung into action, but weren't sufficient to completely fix the proble

An unidentified engineer "decided to apply a throttling mechanism to prevent the zone from pulling so much traffic from their origin

A post to Hacker News that Cloudflare's post links to – and which The Register therefore assumes was posted by the throttled customer – states the throttle was applied without warning and caused the customer's site and API to become effectively unavailable due to slow responses leading to timeouts.

Cloudflare has issued a mea culpa for its decision to impose the throttle.

"Let's be very clear on this action: Cloudflare does not have an established process to throttle customers that consume large amounts of bandwidth, and does not intend to have one," wrote Cloudflare senior veep for production engineering Jeremy Hartman and veep for networking engineering Jérôme Fleury.

This remediation was a mistake, it was not sanctioned, and we deeply regret it."

Cloudflare has promised to change its policies and procedures so this can't happen again – at least not without multiple execs signing off on it.

"To make sure a similar incident does not happen, we are establishing clear rules to mitigate issues like this one. Any action taken against a customer domain, paying or not, will require multiple levels of approval and clear communication to the customer," Hartman and Fleury state. "Our tooling will be improved to reflect this. We have many ways of traffic shaping in situations where a huge spike of traffic affects a link and could have applied a different mitigation in this instance."

The Hacker News post referenced above sparked a 300-plus comment conversation in which few authors have kind things to say about Cloudflare. Nor do various folks in some of the darker reaches of the web, where Cloudflare has often been accused of throttling traffic as a political act, given its track record of declining to serve sites that host hate speech.

Actually throttling a customer without warning will likely fuel theories that Cloudflare, like its Big Tech peers, is an activist organization that does not treat all types of speech fairly.

Hartman and Fleury promised that Cloudflare is re-writing its legalese to better explain what customers can expect. "We will follow up with a blog post dedicated to these changes later," the pair wrote.

The post does not mention what, if anything, happened to the engineer who applied the throttle. ®

Updated to add at 2350 UTC, February 9
Cloudflare contacted The Register with the following statement: "There were no punitive measures taken against anyone involved in this unfortunate incident. We have a blame-free culture at Cloudflare. People make mistakes. It's the responsibility of the organization to make sure that the damage from those mistakes is limited."

 
It seems there are at least three major takeaways from this:

1) any single employee, even one who isn't supposed to, can take any customer offline
2) there are no internal procedures to ensure the customer is timely notified of this, even when it is actually is necessary
3) there is a "blame-free culture," whatever the fuck that means, but it seems to mean there will be no penalties for any tranny janny who does this maliciously

So a company that supposedly provides security is about as lackadaisical about security and delegating power as Twitter, and there will be no consequences for fuckups, so why would anyone care about not fucking up?

Anyone who trusts this company with anything important is a fool. I hope some more competent company drinks their milkshake.

They're basically a company that has rode on an unearned reputation as an elephant protection company, and pointed at how no elephants have knocked down your house lately, but if a rampaging bull elephant actually does show up, they run away and say "oops."
 
Even Feds don't want their shit fucked with. Janke needs to step out of the shadows and take over cloudflare before some troon fucks with the Feds. Then again it'll be very entertaining.
 
  • Like
Reactions: Pimozide
That's less of a theory and more of a fact, isn't it?
Pretty sure bitching about how slow the Law is and taking matters into your own hands with absolutely no basis on your actions makes it less of a theory.

But what do I know? It's not like my peers are Big Tech. Or famous toe-mushroom farmers. Or Rapists.
What's that? I don't have proof?

Seems like thats proof enough to fuck over Cloudflare, with no repercussions on my end.
 
Back