Did you ever have a moment where a concept you have never been able to figure out or understand suddenly clicks in your head? I had long struggled to understand a key element of Risk Management – how to perform a risk assessment model that included likelihood. And a strange confluence of circumstances made my light bulb go off.
Now before I go into the story, let’s cover a bit of background on this. Risk Management is a field that I admire, and consider critical to any organization, its operations, and especially important to my field which is Information Security. Being able to communicate risk using tangible descriptions message of risks to an organization is critical. But I could never quite seem to do it with the precision that I felt necessary.
I always stumbled on the issue of likelihood. I could estimate with surprising ease the cost of an incident. I have mastered the process of asking key business groups about the cost of failure and know how to test their attributions of cost. I have been extremely comfortable identifying the costs of an incident – the cost of lost productivity, the cost of lost sales, the cost of lost intellectual property, and “range of losses” was a concept I could easily make tangible. For retail companies I could estimate a range of lost revenues by looking at highest day revenues (Black Friday) and lowest day revenues. That became my range. I’d find the median and we’d have three values to work with. I would also be able to factor idle time of workers, unused infrastructure and equipment, and compute these down to the last dollar if I so cared to be that details (which I usually didn’t – getting to the nearest $100,000 was more than enough for these companies). I could even sit with a marketing team and estimate lost good will based on cost of advertising to regain those customers lost, and revenue downturns due to those lost customers.
But I could never feel comfortable with creating a picture of the likelihood that some event would occur.
Why? I wanted it to be perfect. I wanted no one to question the numbers – they would be facts, let the chips fall where they may. I wanted people to know in absolutes with absolute precision. Except there is no such thing as an absolute – especially in risk. The light bulb that went off in my head was the light bulb of “imperfect knowledge”. Risk is an estimate of possible outcomes. It is about being accurate, not about being precise. Bad risk analysis is when you pretend you can give absolutes, or make you make no attempt to find a range of things that are “more likely”. Do I have you scratching your head yet? Good.
Let me give you an analogy to illustrate what I mean by accuracy and precision. In a battle, accuracy would be knowing where your enemy is attacking from, or even where they are most likely attacking from. If you find out that your attacker has the capability to scale that 3000 foot cliff that you discounted due to level of difficulty you would add that because it would show a more accurate picture of all possible ways your enemy will attack you. That accuracy is accounting for all possible outcomes. Precision is knowing exactly where to aim your cannon so that it hits your enemy at an exact spot (biggest tank, largest warship, best group of archers). Accuracy won’t help you to aim the cannon. Accuracy will tell you where to put the cannon and what range of fire it will need. Precision be about aiming your cannon, but it will fall short on telling you where to position your entire army.
The problem I have struggled with in risk analysis is that I wanted precision – and that made me struggle with the determining likelihood. The confluence of ideas hit me two days ago. Somehow the idea of Alex Hutton’s and Josh Corman’s “HDMoore’s Law” (an InfoSec bastardization of the “Mendoza Line”) combined with having just chatted quickly about CVSS scores and the idea about “difficulty” associated with vulnerability scores made something click. That and a peek at a risk analysis methodology that didn’t try to make likelihood a precise number. Instead it asked a simple question – describe the skill required to achieve the event, and provide a range of frequency that the event would occur. Bing! I could work with descriptions, and so could executives! If you try to arrive at a precise number, executives who play with numbers all day long will probably rip it apart. If you give them probable ranges and descriptions of the likelihood, they get the information they need to make their decision. It is imperfect knowledge. And executives make decisions using this imperfect knowledge every day. The more accurate the imperfect knowledge is, the more comfortable the executive will feel making the decision. And for an executive, the easier for him to understand the imperfect knowledge you give him, the more he will appreciate your message.
So what did my epiphany look like?
First I realized likelihood is balance of understanding level of difficulty for an event to occur and its frequency. Level of difficulty is really about the level of effort or confluence of circumstances it would require to bring about an event. Take a vulnerability (please, take them all). How much skill would the person require to exploit a given vulnerability? Is the exploit something that even the average person could exploit (an open unauthenticated file share), something that is available in Metasploit, or is it a rare, highly complex attack requiring unknown tools and ninja skills? This is not to say that the exploit cannot be done – it is determining if the population that can perform the exploit is smaller than the universe, and hence likelihood reduced. The difficulty of having a tsunami hit the eastern cost of the United States is based on the rarity of unstable geographic features in the Atlantic Ocean that would generate one. The Pacific Ocean on the other hand has a large population of unstable areas that can generate a tsunami. The skill required to exploit an unauthenticated file share or FTP server is far different than the skill to decrypt AES or to play spoofed man-in-the-middle attacks against SSL. I can already see the binary technologists fuming – “but, but, people can do it!” Sure they can. Any attack that has been published can be done – and there are many more that haven’t even been made public yet that also can be done. A business cannot afford to block against everything, much like we cannot stop every car thief. What we can do is avoid the stupid things, the easy things, and more importantly – the most likely things. This is a calculated defense – choose those things that are more likely to occur until you run out of reasonable money to stop them.
Then I took an old concept I had around frequency. For me there are multiple source that I can use to extrapolate frequency. Courtesy of the three different highly data-driven analyses of breaches produced by the major forensics organizations, we can begin to estimate the frequency of various types of attacks. Data repositories like VERIS, the various incident reports and general research of the news can give us a decent picture of how often various breach types occur. A great illustration of this is Jay Jacob’s research on the Verizon DBIR data looking for number of times that encryption was broken in the breaches researched. The data set was a grandiose zero (0). Frequency can be safely ruled “low”.
Suddenly I was able to walk through a vulnerability report I had been handed and put together a quick risk analysis. I asked five questions:
- What assets are on the affected systems? (for example email, payment card data, PII, intellectual property…)
- What population of people would have access to directly exploit this vulnerability? (Internal employees, administrators, or anyone on the Internet)
- What is the level of difficulty in exploiting this vulnerability? (CVSS provides a numerical scale which I was more than happy to defer to, and in some cases where the general user population could exploit it, we created a “-1” category)
- What is the frequency that this type of exploit has occurred elsewhere, and what have we seen in our organization? (research into DBIR, asking security team at client site)
- What controls are in place that would mitigate the ability of someone to exploit this vulnerability? (such as a firewall blocking access to it, or user authentication, application white-listing etc.)
I took all the data that was collected and turned the risk into a sentence that read something like this:
“Examining the risk of being able to see information sent in encrypted communications: Anyone on the Internet would have access to attempt to exploit this, however a very high level of competency and skill is needed to decrypt the communications. The frequency that this type of attack occurs is very low (typically done in research or government environments with mad skills, and lots of money). There are no additional controls in place that would mitigate this risk.”
The last glue that fit this all together was making all of your assumptions about the risk explicit. I’ve talked extensively about the value of being explicit – it makes the data easier to examine, challenge, correct, and make even better. The result is a more accurate risk assessment based on more accurate data.
The true detractors of Risk Management would point out that none of this is perfect or certain. They would be correct, but then nothing in life is certain. We tend to want to be perfect, to be right and not wrong because we fear wrong. The sources of this tendency are boundless, but a bit of that I suspect is from our high level of exposure to the highly precise and binary world of computers, and as a result we look to make the rest of the world much like this model that we idealize. One’s or Zero’s, exact probabilities, exact measures of cost… but life outside of the artificial construct of computers is not like that. It is full of uncertainty and non-binary answers. Those subtleties are what Risk Management can capture and help us understand in a way that is closer to our binary desires. But never completely. What Risk Management does do is give us better accuracy – so we can make more accurate decisions and be less erroneous.
So step away from the perfection. Give your team a view of the risk that is in terms they understand. You might just find that giving them a description, a narrative and ranges to draw from much more accurate than anything they’ve used in the past. But whatever you do – do not aim for precision. Aim for accuracy, even if that means the guess is even less precise. Your management wants the accuracy. Just like their profits, they will never be precise, but only data can make them more accurate.
Now you might still have a question of “so how do I quantify this?” Ah, that’s for next time…
Pingback: Network Security Blog » Network Security Podcast, Episode 286