This week I’ve been reading thcimpolite the recent judgment from the Swedish FSA on the Swedbank outage. If you’re unrecognizable with this story, Swedbank had a transport inant outage in April 2022 that was caengaged by an unfinishorsed change to their IT systems. It temporarily left proximately a million customers with inright stabilitys, many of whom were unable to encounter payments.
After spendigation, the regulator set up that Swedbank had not adhereed their change deal withment process and rerentd a SEK850M (~85M USD) fine. That’s a lot of money to you and me, but probably didn’t impact their bottom line very much. Either way I’m confident the whole episode will have been a huge wake up call for the people at the bank whose job it is to asconfident enough danger and change deal withs. So, what went wrong and how could it have been dodgeed?
How did the Swedbank incident happen?
The judgment doesn’t depict the technical details behind the incident, but it does provide glimpses into how they appraiseed what went wrong:
- “The deficiencies that were current in Swedbank’s inner deal with made it possible to originate changes to one of the bank’s most central IT systems without adhereing the process in place at the bank to asconfident continuity and reliable operations. This violation is therefore neither intransport inant nor excusable.”
- “none of the bank’s deal with mechanisms were able to seize the deviation and asconfident that the process was adhereed”
- “one of the main caengages underlying the IT incident was non-compliance with the change deal withment process and that it is probable that this also resulted in a sluggisher analysis of the incident and a wonderfuler impact on the operations.”
- “excellent inner deal with is a prerequisite for a bank to be able to satisfy the needments on danger deal withment”
Even if you skinnyk $85M isn’t much of a fine – spropose the cost of doing business – the filled range of chooseions discdiswatch to the regulator comprised removing Swedbank’s banking license: “It is therefore not relevant to distake part Swedbank’s authorisation or rerent the bank a cautioning. The sanction should instead be restricted to a retag and an administrative fine.” Gulp.
Change deal withment doesn’t mitigate danger
What repartner interests me about cases appreciate this is that, even when adhereed to the letter, the elderly ways of managing change with manual approvals and change encounterings do not mitigate danger in today’s technology organizations. These processes don’t toil becaengage adhereing with them is no secure that changes are being made protectedly and protectedly.
Tell me if you’ve heard this one before?
- Bank has a transport inant IT outage/incident
- A change was applied without adhereing change deal withment process
- Bank claims the danger deal withs toil if they are adhereed
- Regulator fines bank for not adhereing process + having inenough deal withs
- Bank inserts more change deal withs
The position of the regulator constitutes self-referential logic. You shelp you’d do someskinnyg to deal with danger, it wasn’t done, therefore you are in violation. But, is change deal withment the best way to deal with IT danger?
What the UK FCA says about change
I’ve written previously on some amazing research rerented by the Financial Conduct Authority in the UK. They took a data-driven approach to understand the toilings of change deal withment processes, which uncovered some stimulating discoverings:
“One of the key assurance deal withs firms engaged when carry outing transport inant changes was the Change Advisory Board (CAB). However, we set up that CABs finishorsed over 90% of the transport inant changes they scrutinizeed, and in some firms the CAB had not declineed a one change during 2019. This elevates asks over the effectiveness of CABs as an assurance mechanism.”
Change as a deal with gate doesn’t toil, but everyone does it. Why? To dodge $85MUSD fines. In the UK and USA these can be rerentd to individuals as well as organizations. So, if you have adhereed the process, at the very least you are compliant and not liable for weighty financial penalties. It’s also about covering your back – “It’s not my fault, I ticked all the boxes.” But is the bank protected though? Are the systems themselves protected?
Change deal withment collects recordation of process adhereance, but it doesn’t reduce danger in the way that you’d skinnyk. It reduces the danger of unrecorded changes, but dangers in changes that are filledy recorded can sail thcimpolite the approval process unacunderstandledged. This is an transport inant and quite shocking discovering: adherence to traditional change deal withment doesn’t toil to deal with the danger of changes.
Research shows outside approvals don’t toil
The science of DevOps backs this up. Here’s the unvarnished truth on outside approvals and CABs based on research by Dr. Nicole Forsgren, Jez Humble, and Gene Kim in their 2018 book, Accelerate: Building and Scaling High Perestablishing Technology Organizations.
“We set up that outside approvals were adversely corrcontent with direct time, deployment frequency, and repair time, and had no correlation with change fall short rate. In stupidinutive, approval by an outside body (such as a change deal withr or CAB) spropose doesn’t toil to incrrelieve the stability of production systems, meaconfidentd by the time to repair service and change fall short rate. However, it confidently sluggishs skinnygs down. It is, in fact, worse than having no change approval process at all.”
Worse than no change approval process at all. So, if you want to dodge fines, cover your back AND reduce the appreciatelihood of production incidents, what would you do?
Change is not the problem. It’s uninsertressed danger
If change is not the problem, then what is?
What would toil? Well the FCA has some insights on this:
“Frequent frees and speedy dedwellry can help firms to reduce the appreciatelihood and impact of change rcontent incidents:
Overall, we set up that firms that deployed minusculeer, more normal frees had higher change success rates than those with lengthyer free cycles. Firms that made effective engage of speedy dedwellry methodologies were also less probable to experience a change incident.”
In stupidinutive – papertoil doesn’t reduce danger. Less hazardous changes reduce danger. I’m going out on a limb here, but if Swedbank had in fact adhereed processes and still had the outage, I depend Finansinspektionen (the Swedish FCA) would still have given a fine, but for inenough danger deal withment.
Story time: streams feeding the lake
We can skinnyk of gentleware changes as streams, feeding into our environments which are lakes. Change deal withment puts a gate in the stream to deal with what flows into the lake, but doesn’t watch the lake.
If it is possible to originate a change to production without uncoverion, then change deal withment only acquires one source of danger. The only way to be confident you don’t have unrecorded production changes is with runtime watching.
For me what is repartner engaging about this story is the echoes and parallels it has with the Knight Capital incident so well recorded by the SEC. In both cases, an infinish caring of how changes have been applied to production systems due to inenough observability and pursueability prolengthyed and amplified the scale of the outages.
And departs an discdiswatch ask: how many aappreciate changes have been made that didn’t caengage an outage? Without watching it is repartner difficult to understand.
If change deal withment doesn’t toil, why do we do it?
It all goes back to gentleware history. Traditionpartner changes were exceptional, huge and hazardous. It was the annual upgrade, or the monthly patch. Becaengage these huge batches of change were hazardous, companies presentd lengthy testing and qualification processes, change deal withment, service triumphdows, and a big number of verifycatalogs to help mitigate the dangers and test-in quality.
Before we had conmomentary trains of test automation, continuous dedwellry, DevSecOps, rolling deployments with speedy rollback, this was the only way. The trouble is, the financial services industry is packed filled of legacy systems and outsourcing where carry outing these trains is technicpartner challenging and uneconomic.
Maybe it is time we acunderstandledge legacy gentleware, danger deal withment, and outsourcing are a transport inant systemic danger in the financial sector?
The flipside is also real. Many next generation systems in financial services are so vibrant and allotd that it is repartner difficult to get a regulate on the volume of changes occurring.
Risk deal withment that toils
The only way to not get burned is to dodge take parting with fire. Checkcatalogs can help, but if you have a lot of IT danger the only way to repartner reduce it is to do the technical toil to originate changes less hazardous, and relocate to minusculeer, more normal changes. And you can reduce this toil by automating change deal withs and recordation, and introducing watching and attentiveing systems to uncover unpermitd changes. It’s all part of a DevSecOps approach to change deal withment that aligns the speed of gentleware dedwellry with the needs made on organizations by cybersecurity and audit and compliance.