This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.
| 5 minute read

AWS US-EAST-1 incident: regulators concentrate on concentration risk

In what is now becoming an annual event, IT admins around the world woke up today to discover that large portions of their IT estates were down. The problem this time? An outage in the AWS US-EAST-1 region – Amazon’s largest data centre region, and the cause of a similar outage in 2017.

Impact of the AWS incident today

The impact has been global. It is the headline on most news sites and reports indicate that the incident has affected business across multiple industries. In the UK, there have been major outages affecting Lloyds Bank, HMRC, and National Rail amongst others. Further afield, major services like Coinbase, Robinhood, Snapchat, Duolinguo and Signal have all experienced significant disruption.

According to the AWS incident logs, even some AWS services in other regions were affected where they rely on the US-EAST-1, which may have allowed the incident to cascade and impact more customers. The root cause appears to have been a DNS configuration error. As of the time of writing, AWS say that the underlying issue has been resolved but service disruption is still ongoing.

We expect that everyone will be talking about IT concentration risk over the next few days. This article explains what it is, and how the legal and regulatory landscape is evolving to address it.

What is concentration risk?

Even the most resilient service is only as good as the infrastructure that supports it, the software that runs it, and the humans that manage it. The affected AWS services will have been designed with robust architecture, hosted in resilient data centres and protected by a strict change management process. But a complicated service will always be vulnerable to the wrong combination of events – it’s no surprise that IT services sometimes have outages.

What is remarkable, however, is that a DNS problem affecting just a single region of just a single service provider is able to “break half the internet”. Between them, the big three hyperscalers (Amazon Web Services, Microsoft Azure and Google Cloud Platform) have approximately 63% of the worldwide market. So many businesses and governments rely on them for such a large portion of their day-to-day operations, that a tiny configuration mistake can lead to this kind of disastrous impact on individual business and the wider economy.

Some consider concentration in the technology market to be a vicious circle. The larger the key players become, the greater their economies of scale and ability to invest, and the harder it is for new entrants to launch a competitive offering. The cheaper the offering is, the more businesses migrate to it and the more concentrated the landscape becomes. Especially in the public cloud space, which is notoriously capital intensive and where even giants like IBM have struggled to gain market share, it is difficult to see how the economics will permit any organic change.

Concentration isn’t just a cloud issue

While reliance on the few cloud hyperscalers is a key source of concentration risk, it isn’t the only one. The CrowdStrike incident last year showed what can happen when large numbers of businesses rely on a single piece of software for business-critical functions. As vendors and customer have become more security conscious and are deploying updates more frequently and with less time for testing, so the opportunity for widespread incidents has increased (albeit perhaps a price worth paying for improved cyber security).

More generally, many businesses have outsourced business-critical tasks to one of the major managed services vendors or business process outsourcing providers. A fairly small number of IT suppliers are trusted to maintain infrastructure and applications for many of the world’s largest companies.

Early regulation – regulating the customers

Financial regulators were among the first to identify the risk: concentration has been on their agenda for at least 20 years. Broadly speaking, the EBA, PRA and FCA have all required regulated entities to consider concentration risk when making procurement decisions for the last decade. Financial institutions have also been required to report on their cloud usage so that regulators can monitor concentration within the wider financial system.

So far as we can tell, it hasn’t done much to reduce concentration on a few cloud vendors in the financial sector. That said, it may have led to more awareness and better service design among financial entities.

The modern regulatory approach – regulating the providers directly

Perhaps perceiving that nothing was changing, regulators have adopted a stronger approach over the last few years.

The EU Digital Operational Resilience Act (DORA) takes aim specifically at “potential systemic risk entailed by increased outsourcing practices and by […] ICT third party concentration”. It allows the European Supervisory Authorities to designate and directly regulate “critical ICT third-party service providers” (including cloud providers) who provide services to financial entities and are systemically important.

This includes the right to request information, conduct investigations, impose security and operational requirements, require contractual terms, and to control subcontracting. These are backed up by fines for non-compliance of up to 1% of daily worldwide turnover per day.

The UK has adopted a comparable approach with the Financial Services and Markets Act 2023 and corresponding guidance from the regulators. The Treasury can designate “critical third parties” (not just IT providers) whose services are important to the stability of, and confidence in, the UK financial system. These critical third parties are then subject to direct regulation by the UK financial regulators. 

Beyond financial services – regulatory expansion

The trend of directly regulating critical IT providers has caught on beyond the financial sector. In the EU, the Network and Information Security 2 (NIS2) Directive applies to cloud providers along with providers of various other forms of critical digital infrastructure. They now face stricter cyber security rules (including incident reporting, BCDR requirements and supply chain monitoring obligations) and are subject to regulatory oversight and enforcement. While NIS2 is primarily aimed at cyber security rather than concentration risk, the inclusion of cloud providers reflects their systemic importance to the EU economy (which is in large part a result of concentration).

The UK is on a similar path with the retained NIS Regulations, soon to updated by the proposed Cyber Security and Resilience bill which will bring managed service providers into scope.

Will regulation work?

In our view, regulation alone is unlikely to have any significant impact on concentration risk. Regulating customers may be a losing battle – the economics of cloud computing will continue to drive businesses to adopt the largest platforms, although regulations have made customers aware of (and prepared for) the downsides of concentration risk. We are also sceptical of the benefits of regulating providers – hyperscalers already work to the highest technical and operational standards of any IT organisations, and yet (as today has shown) there will still be outages. So long as the concentration remains, it is difficult to see what action a regulator could require a provider to take that would meaningfully reduce the risk.

In any case, the regulatory focus still seems to be on service providers to the exclusion of software vendors, where in some cases the concentration risk is significantly greater.

The most effective solutions are technical and commercial (read: difficult and expensive). Businesses need to focus on vendor-agnostic technical architecture, resilient service design and careful BCDR planning. This means designing out any single points of failure by using multiple availability zones, or even, for the most critical services, by using multiple providers through multi-cloud or hybrid cloud arrangements.

In the meantime, we’ll start writing our article on next year’s outage.

For more information on our expertise and experience in this area, please see visit the pages below:

Subscribe to receive our latest insights - on the topics that matter most to you - direct to your inbox, at your preferred frequency. Subscribe here

Tags

commercial and technology, technology, technology regulation, commentary