Ransomware - Your backup strategy is not working

No doubt, if you’re following the news, you may have seen an uptick in the number of ransomware attacks doing the rounds. There have been quite a few. In particular, the Colonial Pipeline attack. It’s beginning to seem a lot like Groundhog Day! You would think, after seeing how the ransomware attack in January 2020 crippled Travelex, we would not be seeing these kinds of attack occur so frequently. You would hope that company Boards would be asking questions of their CIOs and CISOs about preparedness. What is becoming apparent though is the way organisations think about and architect their backup systems requires a sea change. In this article, we will look at the Colonial Pipeline ransomware attack, highlight what went wrong (again) and how restoring from offsite backups is no longer the [first] appropriate solution for business-critical systems.

What happened at Colonial?

At some point before 6th May 2021, hackers gained access to Colonial’s IT systems. On 6th May, hackers stole 100Gbs of data and then shut down the Colonial pipeline using ransomware. This pipeline is one of the United States’ major oil pipelines. This pipeline is critical national infrastructure. It transports refined oil from Houston, Texas on a 5,500km journey all the way to New York. The hack caused the pipeline to shut down for 6 days and caused fuel shortages on the East Coast with many people panic buying fuel, increasing the potential for fuel shortages. On 7th May, Colonial CEO authorised the payment of approximately US$4.4million to an Eastern European hacker group known as Darkside. After payment, Darkside promptly provided a decryption key but the decryption process took so long that backups had to be used in parallel. Restoring to a minimum level of service from backups took until around 1700 on 12th May.

Based on the testimony of Joseph Blount, Colonial CEO, the root cause appears to be that hackers were able to access the Colonial environment through a disused VPN and a compromised password. As some may recall, hackers also exploited Travelex by exploiting vulnerabilities in their VPN software. Blount also confirmed that Colonial did pay the ransom once it was clear the company would not be committing a federal offence by paying money to an organisation on a sanctions list. Whether the CEO was right or wrong in paying the ransom is a separate discussion but what is likely is that such a payout is likely to lead to more emboldened attempts in the future, putting more organisations at risk of similar attacks.

Why did it take 6 days to restore if the ransom was paid?

Whilst the decryption key was provided fairly quickly upon paying the ransom, it still takes time to decrypt. The larger the data source, the longer this process takes. This process also needs to be tested. Remember you are essentially taking the word of an organised criminal gang that what they are giving you is safe. Caveat Emptor has never been so true a warning! Just in case, Colonial decided to restore from backups in parallel. Blount highlighted during his testimony to the US Senate that it was not fully understood how long it would take to restore systems from backups. CEOs reading this should take note…backups are not a silver bullet.

Rethinking backups

“We have backups” is the rallying cry when asked about whether the IT team can recover systems after a ransomware attack. “Ok…how long will it take to restore those backups?” is what should then be asked. Then ask “is your answer hypothetical or has it been tested?” If it’s hypothetical, insist it is tested. If the test is real, check it was sufficiently robust. If full recovery can’t be done within a tme period expected, you know you have a problem. Better to know that now, than when facing a live incident and you’re just going to have to accept the restoration time is immutable.

Remember when it comes to restoring from many backup sources, one can only send so much data down a network connection at any one time and the bandwidth available for restoration (as it is likely it is also being used for other business-critical services) will determine how long that restoration takes. If you are performing a full system recovery you need to access the backups. Decompress them if they are in a compressed format. Transfer those backups to where they need to be. Test they are complete. Reintroduce the data back into the production system. Test the system works as expected. Monitor recovery closely for anomalies. This all takes time. These steps are also at a high level. There are many other steps and considerations that need to be thoroughly thought through. You can’t be thinking about all these steps at the time you’re doing the restoration when all hell is breaking loose – something will be missed!

Your network connection to those backups is quite likely to be done on the cheap too. This means it’s going to take a long time to restore from backups – especially if you’re solely relying on off-site backups as your solution of choice. Organisations using such an approach must urgently rethink if they need their systems back online in hours or less. They must ensure data sources are architected to provide not just high system availability but data and configuration high availability too. There is a nuance here that many in senior positions don’t always appreciate. What I see quite often is a DR solution that still points to the original PROD data sources. If that PROD data source is unavailable, say because it’s encrypted by ransomware, the DR is just not going to work – make sure you don’t get caught out with this!

Adopting the THREE-TWO-THREE backup strategy is a great place to start. This requires organisations to:

Hold THREE data copies
Use TWO Media Types (E.g. Cloud and Tape)
Stored in THREE Separate Geographical Locations (Cloud Geozone 1, Cloud Geozone 2 and Offsite)

Cloud-based solutions must be strictly access-controlled. It should not be possible to access this backup data except in a break-glass scenario. It must then be possible to immediately connect and use a backup data source when required until primary systems can be restored.

An optimised model would be to regularly switch between PROD and DR environments. Switch over monthly from PROD to DR and vice-versa the next month. Get switching to be second nature to the operations team and then when it comes to an extraordinary event, it just gets done. This approach makes it a lot easier to ensure the PROD and DR environments remain in close parity.

Now, of course, there are data protection and cost considerations that need to be baked into such a solution. The key is to ensure the solution is commensurate with the risk associated with system unavailability. Just remember when assessing availability risk, that it is often the “little” server that glues everything together that gets omitted as not being important enough. Make sure you have a holistic strategy that considers interoperability and make sure you also consider backup and restoration impacts during CAB. System design and reality quickly part company in my experience of operational IT!!

Prevention is better than cure.

Whilst ensuring backup and restoration strategies are appropriate and effective, prevention should be the primary objective. It has been reported that Colonial spend an estimated US$40million on cybersecurity annually. You would think they should be pretty on it when it comes to the prevention of ransomware. Spending a tenth of that should have been more than enough to prevent this attack from happening. The question I would ask if I were reviewing that security budget as a new CISO, is what is that being spent on? Security resources that document security issues but don’t actually do anything about fixing the problems? Expensive but low-quality resources from big consulting firms? Vanity security tools that are often, not needed, poorly implemented, and misconfigured? A SIEM solution that doesn’t identify the existence of a disused VPN…!?! It’s easy to ask these questions but what can be done to prevent ransomware. Here are my top 5:

Implement effective asset (software and hardware) management. Know what you have, so you know when something pops up that shouldn’t be there (or is still there when it is no longer used). Make sure you are running software and hardware using a stable and up-to-date release!
Harden your infrastructure. CIS provides guides and tools to automate this process and it’s not expensive to implement. Consider using stripped-down versions of Server OS such as MS Server Core to reduce your threat surface. Where CIS doesn’t provide a guide, reputable vendors should also provide best practices on how to lock down their kit!
Monitor ALL systems but tune logs using a risk-based approach. Organisations often feel the need to triage what they can monitor due to storage cost constraints without ever realising they just need to strip out all the fat from the event logs and then they can monitor everything!
Implement robust access control. Use Multi-Factor Authentication MFA for all systems where it is available. Use Just-In-Time Authentication on top of MFA for Critical Infrastructure.
Where systems can’t be air-gapped, use hardened Bastion Hosts to access that infrastructure. Monitor activity on these boxes (yes build in redundancy) extremely closely.

Most ransomware attacks can be avoided…

You don’t need to be a victim. Basic hygiene will make a material difference in preventing ransomware infections. It’s not sexy but it needs to be done. If you’re relying on backups to get your business-critical systems back up and running quickly – think again. It’s going to take a long, long time to recover – if at all. Want to know how well you are protected? Get in touch and see how we can help prevent a similar attack in your organisation.

Contact Us

If you would like to have a conversation to discuss your needs, fill out the form below and we can arrange a time to call you over Teams, Zoom, or Google Hangouts.

About Fox Red Risk

Fox Red Risk is a boutique data protection and cybersecurity consultancy and Managed Security Service Provider which, amongst other things, helps client organisations with implementing control frameworks for resilience, data protection and information security risk management. Call us on 020 8242 6047 or contact us via the website to discuss your needs.

Colonial Ransomware Attack: It’s time to rethink your backup & restoration strategy.