Risk Assessing Single Points of Failure

Single points of failure (SPOF) creep into many business processes. Often unintentionally. Some exist from the outset but were simply not assessed, or were assessed and deemed low risk. That legacy server running a critical piece of code wasn’t legacy at the beginning. That retiring SME, the one who wrote the code, had just started. That Supplier, the only company that still maintains that legacy server, was one of many who could provide support. The process may not have even been that important back then, however it is now critical to the success of the business. We may know instinctively these single points of failure exist. We may have assessed the risk at the beginning – BUT – do you know how much risk now exists? Even if you do, what happens when the SPOF can’t be engineered out or, is too expensive to fix? How do we understand the risks associated with SPOF and, over time, ensure we maintain effective mitigation should something [inevitably] go wrong? Using the recent blockage in the Suez Canal as a case study, let’s find out…

What happened in the Suez?

On Tuesday 23rd March 2021 a cargo ship by the name of Ever Given (not Evergreen as is plastered on the side of the ship in the picture above) got…well stuck…mid-way along the Suez Canal. The 400metre cargo ship, weighing in at around 220,000 tonnes, blocked one of the most important man-made shipping lanes in the world. According to media reports, the blockage holds up the transit of around £7billion – yes billion – worth of trade and that’s getting bigger every day. Lloyds of London estimates the cost to the global economy to be a sobering $400million per hour – yes per hour! Oil prices rose 3% on the news the ship is likely to take more than a week to dislodge.

Why didn’t the ships just go another way?

In short, there are two routes and the Suez canal is nearly a fortnight quicker.

Basically, the Suez Canal is a man-made waterway that connects the Mediterranean Sea in the north with the Red Sea in the south. Prior to the opening of the Suez Canal in 1869, ships transiting between Europe and Asia were forced to take a long perilous route around the coast of West Africa. The Suez canal effectively provided an 8-10 day shortcut. This shortcut naturally saved shipping companies a significant amount of money and as such is the preferred route for 19,000 vessels a year and 12% of the world’s freight.

Even though it is located wholly in the state of Egypt, the Suez Canal is of such material geopolitical importance that its use is governed by the Convention of Constantinople, an international treaty which states the Canal:

“may be used in time of war as in time of peace, by every vessel of commerce or of war, without distinction of flag”

So, whilst ships could go another way, the cost of the alternative route would make many journeys economically unviable.

Single Point of Failure

The canal’s importance to global trade however gives rise to what is technically a single point of failure (SPOF). For those not acquainted with the term, a single point of failure being something that when broken causes everything else connected to it to shut down or become significantly impacted. In the case of the Suez Canal, if the canal is blocked, a significant volume of shipping is left idling on either side with nowhere to go. As you can see in the Vesselfinder snapshot below, the number of vessels awaiting passage has built up significantly. For every day of delay, costs are mounting and so too the environmental impact.

Notwithstanding the cost of delays, the blockage increases the amount of maritime traffic around the horn of Africa, specifically off the coasts of Somalia, Ethiopia and Eritrea – no doubt increasing the risk associated with piracy. Economic impact, environmental impact, piracy impact. Not something that you want at the best of times and certainly not when you’re already in the midst of a global pandemic

Could the Suez SPOF be avoided?

In short, yes. Like most things, a SPOF can usually be engineered out if it is identified and then someone is willing to throw enough money at the problem to get it fixed. But, just because you can engineer out a Single Point of Failure doesn’t mean you should. The system as a whole should be risk assessed first. Where the impact of the SPOF is deemed to be greater than the cost to mitigate, it is appropriate to bake in further resilience. If the impact of the SPOF manifesting is lower than the cost of control, investing in mitigation may not be cost-effective – Risk Management 101. Up until recently. the impact and associated mitigation have been fairly balanced. Of course, there was the Yellow Fleet incident that lasted 8 years between 1967 and 1975. There were also two incidents in 2016 in which ships blocked the entrance to the canal – one of those incidents resulted in the canal shutting down for 2 days. In the case of the most recent Suez canal blockage, what we’re seeing is the impact resulting from a failure to periodically re-assess inherent risk.

Increased Inherent Risk

In 2015, Egypt’s government chose not to invest in engineering out the single point of failure inherent in a single channel design and instead chose a path that ultimately increased the likelihood of this type of full-channel blockage occurring. Instead of investing in a second channel, the Egyptian government invested in making the canal wider and deeper. This was so that bigger ships could navigate the canal. Bigger ships are harder to control and thus more likely to ground in the mud. The Ever Given is not the first ship to ground, but when such a large ship grounds, recovery is more complex. These larger ships can block the whole canal, are harder to dig out and ultimately it takes a lot longer to restore the canal to normal service. The same issues can occur with smaller ships but their recovery is swifter and more straightforward. In some cases, traffic can still navigate the canal whilst the grounded vessel is recovered. The decision of the Egyptian government, to support these bigger ships, means the inherent risk of a situation like the Ever Given incident occurring has increased materially. The likelihood of future events occurring is now something shipping companies should factor into their risk management programmes. But can the rest of us learn too?

What can businesses learn from the 2021 Suez incident?

Whilst you may not be an international shipping magnate or someone in the maritime insurance business, you can still learn from the Ever Given Suez Incident. Here are some of the more salient takeaways:

Design-phase risk assessment must seek to identify SPOF?

Whilst probably not a major component of civil engineering projects in the late 19th century when the Suez was first constructed, risk assessment is now fundamental. Time spent assessing risk at the design phase of a project is seldom wasted. It’s time that the proposed design can be pulled apart, nay ripped apart, for possible weakness. Ensure this initial risk assessment explicitly seeks out SPOF and the associated impacts should a SPOF occur. If you are running a project, ensure you get the right stakeholders involved as early as possible. Risk, InfoSec, Business Resilience, DPO, Legal and Compliance will be well-placed to look at the design and call out potential issues. The earlier we guys are involved, the more likely single points of failure will be identified and a mitigation plan can be put into place.

Model inherent risk against future system use

The risks associated with the Suez changed as ships transiting through the canal have got bigger. In risk terminology, the level of inherent risk increased. The inherent risk associated with your business-critical processes may be doing something similar = drifting upwards unchecked. Inherent risk is brought down to an acceptable “residual” risk level using mitigating controls. The larger the likely impact, the greater investment in compensating controls should be considered. If, however, inherent risk drifts upwards unmonitored, what was once deemed to be adequate investment in control for the then perceived risk, may now be in material deficit. An effective way to keep a check of this drift is first to model different scenarios at the project inception, up-to-and-including a worst-case scenario. The outputs of these assessments should inform investment in mitigating controls and at what point these controls should be introduced. Secondly, periodically reassess inherent risk and then factor this new level of inherent risk into your worst-case scenario models. The output of which should again inform what additional controls may be needed – including response to incidents should they occur.

Regularly test incident response

Once a single point of failure fails, what you don’t want is to be in a situation where your corrective controls are now no longer able to support recovery because they weren’t designed to cope with a big mother of a ship-sized disaster! What exacerbated the Suez blockage was the incident response. The initial response appears to be woefully short of where it needs to be – as can be seen below where Digger McDigface is clearly outgunned!

At the end of your periodic risk assessment, feed the worst-case scenario into your incident response exercises. Test as close as you can to the real-life worst-case scenario as you can. Was the current response effective. If not, ensure you update your plans accordingly, supported by resources commensurate with the likely tasks responders will be facing.

Summing up…

Single points of failure can and do happen but they don’t need to be a complete blockage of the Suez. It’s better to find out about SPOF early. Takes steps in your risk assessment process to first identify SPOF and mitigate where possible. If you can’t engineer out the problem during the design phase, make sure you’re compensating controls remain effective as demand on the underlying processes grow. Keep a check on the current inherent risk and re-assess the adequacy of your current controls – if they are not effective, do something about that. And if the worst comes to the worst, and you haven’t invested in stopping the failure, make sure you can unblock your pipes as quickly as possible! If you need help to identify Single Points of Failure in your systems or test your incident response plans, get in touch, Fox Red Risk can help!

Contact Us

If you would like to have a conversation to discuss your needs, fill out the form below and we can arrange a time to call you over Teams, Zoom, or Google Hangouts.

About Fox Red Risk

Fox Red Risk is a boutique data protection and cybersecurity consultancy and Managed Security Service Provider which, amongst other things, helps client organisations with implementing control frameworks for resilience, data protection and information security risk management. Call us on 020 8242 6047 or contact us via the website to discuss your needs.

Denial of Suez: What can we learn about risk assessing SPOF?