The Three Phases to DevOps in Security

The Three Phases to DevOps in Security

Many of those who aspire to create a high-performing security function within a company are looking at DevSecOps and what it represents.  This is laudable, as the concepts that are represented in DevSecOps mirror many of the successful organizations I’ve experienced, as well as the views of dozens of CSOs that I’ve interviewed since 2010.  The CSOs I interviewed often reflected that many of the skills that they valued were not traditional technology skills, but instead skills in critical thinking and collaborative discourse. (Before you toss aside this assertion, bear in mind, they also asserted that technology skill is also needed, but not in isolation)

When a group of us were reviewing very early drafts of “The DevOps Cookbook”, many of us felt something was missing in its approach.  It was David Mortman who first put to paper what many felt was an underpinning concept or theme that was needed: culture.  DevOps depended upon a culture – one that was seemingly at odds with how things were currently being done, and that required buy-in to change.  It required an agent of change, and long-term commitment to overcome disfunction within an organization that may feel counter to existing dogma.

The journey I’ve taken over the past eight years has allowed me to codify some of the successful approaches I’ve taken and understand the why around their success.  This is my collection of ideas, and the basis for them.  It maps the path I’ve charted with my teams towards a culture of broad collaboration, empathy for the “customer”, and a willingness to take chances and learn.  The results I and others have seen are quite rewarding, and you’ll see how each played out in my stories.  These are far from the end-all of approaches, but if they help give you a jumpstart in your journey, then this has achieved what I hoped.

People, Process, Technology

We all know the mantra (principle) of – People, Process, Technology.  It is a fantastic model to explain how things should be.  I even stack them much like I might order Maslow’s Hierarchy of Needs.  People are needed to operate an environment and are the culture of doing and knowledge.  These people build processes that do things that reflect their views on how things get done.  And then you build technology to facilitate the speed of those processes that are designed by these people.

That is wonderful if you are an anthropologist examining how an organization is.  The challenge is that this is rarely how people try to transform their organizations.  They (mistakenly) start upside down.

Technology, Process, People

How many of you have watched an organization declare it’s starting its “DevOps Transformation” and bring in a bunch of technology tools (automation, deployment, cloud)?  This is the “technology shall cure all our ills” club.  I will often tell them, if a process is broken and bad, then all technology will do is make the “bad” faster.  If your process for approving user access requires five different approvals from people who have no idea what they are approving, what system it refers to, or what data it exposes, all technology will do is make that inappropriate access happen faster.  Garbage In, Garbage Out, at Speed.  Have you really made anything better?  How would you know?

How many of you have seen an organization build an isolated team in IT and give it the title “DevOps”?  This one irritates me – if you call your team DevOps, you don’t get it.  Either this implies that only this one team needs to do DevOps, or more likely a naïve notion that it’s all about the automation.  Have you helped the company move forward and improve? How are other groups improving and work across the organization getting better?

Security teams are no better at fostering DevOps.  To frequently I encounter teams sitting behind walls throwing darts (findings) over those walls at groups they barely know.  This grates me more than someone calling their team “DevOps”.  I call this the “We Do This, You Do That” club.  (By the way, I also see Development teams and Infrastructure teams doing the same).  How do your findings relate to what the company is trying to achieve?  How do they relate to the company’s tolerance for risk?

DevOps is the Journey

You would think that by now people would have learned what DevOps is, but instead DevOps has been miscast as purely automation, or more commonly, deployment tooling.  Let’s get over this myth.  Tooling is an outcome.  Even refinement of work if an outcome.  Make no mistake, I love the technology solutions that have come out of the DevOps movement – methods and tools that have refined the flow of work, that have increased its speed.  But these solutions are the outcome.  DevOps is the ongoing journey of getting there.  It’s about how we work together with a common goal of making things better (maybe even faster and stronger…) in a way that makes it possible to focus on the real customers, (blamelessly) identify inefficiencies, collectively learn and make leaps of faith, and create rapid and large shifts in how we do things.  It’s a mindset, or I would say, a culture change that allows us to get to that state where we can make these changes.

My Strategy of Change

When I start working with an organization, I put most of my effort into organizational behavior.  The words I use for this are: Embedded, Collaboration, Discourse, Learning, Growth and Refinement.  I’ll concede that there is often badly broken technology or nasty compliance failures – but even these situations I use as an opportunity to teach Security, Development, Infrastructure and Operations teams to work together and learn the cultural workings of DevSecOps.

I first focus on changing how people work and think – their perceptions, understanding, as well as interpersonal and organizational interactions.  I call this Changing People.

I next weave in changing the processes of working – how they communicate, problem solve, and learn.  This overlaps with changing people and can even overlap with changing the company’s operational processes as people try to refine how they work.  I call this Changing People.

Lastly we look to evolve the processes, tools, and technology used in our operational work.  This can and usually does include changing security controls and operational processes as well as looking at techniques of refinement.  I call this Changing Technology – but in reality it’s about everything that DevSecOps can consume, refine, and make better.  By this time the ideas will start flowing, and the DevSecOps machine is in motion.

Change People

My objectives are to lead the team towards collaboration, communication, discourse, and learning while avoiding being anonymous, disconnection, and debilitating blame:

  • Building “emotional capital” with customers
  • Broad collaboration as the core to succeeding
  • Making the Team feel valued – contributing
  • Using Empathy & Discourse to collaborate and solve problems
  • Leading by Example
  1. Meet Everyone (the Customers): The very first thing I do with a Security team is ask them, who do you know in the organization.  So far, other than one lone individual at one company, they respond just with other IT people – usually support desk, or infrastructure (usually networking).  My response is to task them with getting out and meeting all of the company.  I remind them that what the company does pays their salaries and bonuses, so it would be really good if they knew what that was.  We set up a grand tour where we meet every business unit within the company, and the Security team is tasked with only one task – listening.  They visit Marketing, Sales, Manufacturing, Distribution, Finance, Legal…any and all groups. I challenge the Security team to ask: “What is it that your department does?”, “How does what you do provide value to the company?”, “What keeps you up at night?”, “If you’re department wasn’t available, what would happen?”, “What processes are critical?  What technology is critical for you?”

Oh, and I remind them that there isn’t a wall between IT and “The Business”.  IT is part of the Business (thank you to David Schenk for that mantra).  Further references to “The Business” as a “them” costs them a quarter in a cookie jar.

Result: The team finds out who their customer is.  They will gain an appreciation for what the company does, and what is important to it. Their customer’s value, concerns, and problems become real.  Now the things that Security thinks about can become grounded in what the organization values.  There will be an affinity and empathy towards what the organization does, what pain points exist, and how Security and its actions have an impact.  It Changes the team’s approach from an abstract “Do this so the company doesn’t fail!” to “This will help distribution because they system won’t be disrupted!” or “Charge records will get to Fraud Prevention on time.”

  1. Communication: I mandate communication patterns between the teams.  I set down a few rules, many of which will sound like some training you had at some HR event:
    • If your email exchange goes beyond two messages, make it a phone call.
    • Better yet, always start with a phone call. Email is only for transmitting data (files, file manager…)
    • No, Better yet, if you can, walk over and talk to the person face-to-face (I do it by the theory “Managing by Walking Around”.
    • Group meetings are either Face-to-Face or with Video. Video is good when everyone is remote (global).
    • Communicate frequently – have team group meetings and everyone attends. I have one-on-one’s weekly to ensure people feel listened to.

Result: The team knows each other, their faces, and who each other is.  The team learns that most of communication is about facial expressions, vocal tones and things that don’t transmit via email.  Do not let people become anonymous.  Encourage people to feel included.

  1. Collaboration & Discourse: During meetings encourage feedback and contribution from all team members in priorities, learning, teaching, and what gets done. I have found that putting people on the spot for feedback doesn’t work well for those that may be more introspective, however making it clear that feedback is welcome, and will be considered opens the opportunity for them to speak.  This is achieved by making it clear that you expect your ideas to be challenged, and that you allow the team to do so.  Consider any feedback you do receive carefully – testing it with the team members offering it, examining how it can disprove your ideas (not how it confirm them).  Make sure the comments are focused on the idea, not the person suggesting it.  We all have ideas that have faults, so no sense blaming.  Rather it is better to refine the idea which becomes a learning experience.  You show value in their ideas and feedback by publicly considering it.  As a manager, allow your statements to be challenged.  Ask your team to disprove them – how could my idea be disproved?

Another technique I use is to ask each person in the team to come with updates on what they are doing so that they understand that everything going on is important and we should discuss it together.  Give them praise publicly for presenting the idea (not just when its right).

Result: You’ll be modelling what you expect your team to do in their interactions.  You’ll surface assumptions, find faults in designs and ideas, and gain a lot of opportunities to teach, and learn!  You’ll create feedback loops – a willingness to discuss openly any issues, problems or concerns.  You will do it in a manner that is open, lacking in blaming the individual, but focusing on the idea.  You will create an environment where people will feel they can participate.

  1. Be Willing to Fail: Model this from the top. Admit when you make mistakes.  Give others in the team credit and make it wildly public.  Recognize success globally but keep mistakes internal.  If a team member makes a mistake, take the blame on your shoulders to address, and have the conversation one-on-one with the team member.  Understand the issue, and encourage the learning process.  As the organization as a whole learns blameless environments, you can let mistakes be examined more broadly, but until that adoption occurs, you need to ensure that the team knows that you won’t hold mistakes against them (unless they are systemic and chronic).

Result: You’ll have a dedicated, loyal team.  One that sees learning as a sign of strength.  One that feels they contribute, they’re recognized, and that faults, while always painful and frustrating, will be less so – that they feel they can move forward, learn, grow and correct what goes wrong.

Change Process

  1. Allocate Expertise: I take is to stock of the team, their expertise, strengths, and of course challenges. I also ask what their interests and goals are – what do they like doing, what do they want to do?  With this information I divvy up responsibilities across the Security team.  While the structure depends on needs of the organization, and available skills, I make sure I’ve created comprehensive coverage.  I then collectively let the team know what my thoughts are, let them challenge them, point out what I might have missed, what things need to be added, and where someone feels strengths are not being leveraged properly.  Ultimately its about recognizing expertise in the team, and making sure that expertise is externalized – made public so that everyone knows who they can turn to.

Result: Recognition of expertise in your team, and point to them as the go-to for answers

  1. Embed in Projects: Now to break down more walls.  This is how I ensure that the team not only learns about what is going on in the organization, but also participates in creating the solution.  I assign one of the more experienced security people in my teams (those with broad insight) to projects within the company.  If the effort has a significant need for security, they become the Security Program Manager – the person who triages all requests from security, and who acts as liaison between the project personnel and security specialists within the security team.  This Program Manager needs to be very involved – participating in as many project meetings as possible, engaging with the project personnel, regularly communicating needs, and “Managing by Walking Around”.

I’ve made this arrangement at every client I’ve led.  I’ve had some people take to this like a fish in water – they love the interaction and actively participate with the project team, feel part of the team, and take its success personally.  In another case we attended a new project initiation.  We listened, provided non-security feedback and questions, and were rewarded with a big thank you.  They appreciated our input and made a habit of inviting us for every new project they considered.

Result: Engaging and Embedding in projects.  Knowing what happens in the organization.  Create low friction, high return work environments where Security is perceived as being invested in the success of the project – through the time committed and the willingness to listen and care about the goal of the project.

  1. For Every Control You Implement You Must Give Something Back: This one probably sounds like process, but at its heart is empathy. Security teams have a tendency to impose controls that make tasks harder or take longer.  This is a problem for those trying to get their work done in time for a deadline imposed by their manager.  In an effort to meet the deadline, controls will be broken, steps will be taken all for the sake of doing things quicker, and the (selfish) goal of getting work done for receiving the adulation (or avoiding the wrath) of their boss.  Security needs to empathize with this.  Hence my rule.

Result: A mentality around the potential effects of Security, and a thoughtful approach that looks to minimize that impact.  A view that Security actually cares and is sensitive to personal success.

  1. Prioritize – Be Great at Important Things: This is where I insert a bit of Security – but where understanding the customer comes strongly into play. I force the team through what I call “Risk Week”.  It’s a week-long session (that gets shorter over time as they get great at it) where we create our Risk model and mitigation priorities for the year.  It is a highly collaborative effort.  It includes revisiting all the organizational groups.  It includes assigned responsibilities within the team so they all participate.  It involves presenting their ideas, each participant challenging assumptions, and creating active discourse as priorities are weighed.  We even include an executive presentation where the team is welcome to present so that they gain the experience and the exposure.

Result: Risk Assessment that is based on the company’s goals and priorities, as well as reinforcing the collaborative nature and interaction that we want to foster.

  1. Manage to the Priorities:  Everyone has had the situation where a problem or finding crops up, and suddenly there is belief it needs to be the foremost problem we solve.  It is a “hair-on-fire” moment, and the belief is that all other work must stop so this can be fixed.  While Lean promotes pulling the Andon cord, I like to point out that there are likely many issues in Quality when Security is involved.  I stop everyone in the moment of “hair-on-fire” and ask them to calm down for a minute.  Breathe.  And then look at the list of prioritized items we agreed to work on.  I ask if this “hair-on-fire” issue should displace any of those issues.  If the answer is yes, we codify it with a risk profile that matches what we did during the Risk Assessment.  If it doesn’t (which is almost always the case), we add it to our master list of “all-the-things-we-should-do” so it’s not forgotten.

Result: You recognize the need to fix issues of Quality, but also to balance that against where the greatest returns are achieved, and how they align with the company’s objectives.  People still feel their concerns are valued, but you also ensure they maintain a balanced and normalized view of priorities.

  1. Manage the Flow of Work: This effort was far more ad-hoc in many respects, but I drew on numerous methods and tools for managing work.
    • Make Work Visible: Kanban – nearly every security team I’ve worked with has preferred Kanban as they way to visualize and manage their work. One task in, one task out, pick up the next task.  Because so much of security’s work is intertwined with other teams, it is hard to march to sprint cycles.  We could instead weave in and out of activities – pushing things into “on hold”, and flow with any other style of work more easily.  What we gained was visibility into what was being worked on, and what was yet to be done.
    • Fit to capacity & Level the Workload – we monitored the Kanban, as well as I had conversations about people’s workload.  If I felt they were being overwhelmed, or if they put in more than 40-45 hours of real work (e.g. I found them in the office after hours all the time), then I would postpone work based on company priorities.  I recognized that quality was going to be the first thing that was sacrificed if I didn’t put things on hold (see Build for Quality).  The team now respected that I valued their sanity, would avoid overwork, and would balance priorities.  They knew they could do the same.  I likewise drove those who didn’t put in the time to deliver.  Taking advantage of my desire for quality was not rewarded, and would be privately confronted.  To quote Nick Galbreath from his time at Etsy: “If you don’t take responsibility, then you probably don’t belong.”
  1. Build for Quality:  In many projects I have seen people start with a goal, and a deadline of a date.  I get frustrated by this model because most people are very bad at estimating how long something will take to accomplish.  Even with the concept of an MVP (Minimal Viable Product) they still underestimate the amount of work and time it will require.  To overcome this bias, I lay down a set of rules for every project:
    • Estimate your timeline and amount of work using the worst-case scenario.  We are so over optimistic at time estimation that this will be far more accurate.
    • You are allowed to remove features, but you are not allowed to remove quality.  If the solution will fail to operate shortly after launch, or there is a probability of disruption to regular operations, go back and fix.  Features can be added later.  Quality failures are highly disruptive.
    • Test, Test, Test.  Do the thing and make the change as many times as you want.  In test (non-prod) environments.  Get good at it.  Make mistakes, practice.  Learn.  Then when you get to production, its close to rote.  You’ve tested all the ways you can think of to fail, and have learned from them for the long term, not just this change.
    • If a deadline is going to be missed, evaluate the cost of doing so.  Then evaluate the cost of pushing it out with the missing quality (e.g. if it fails every other day, if operations stop, if it gives the wrong answers).  Measure this using money and time (which can be equated with money).  Deadline pushers will give pause.

Result: You will surface over-optimism (it will take time, but you’ll be right more often than the optimists).  You will keep a focus on Quality, and make sure it stays in the forefront.  You’ll encourage learning during testing so that failures are a reward and help avoiding them when they are painful.  You’ll also provide a model for everyone to evaluate the impact of quality versus feature deadlines (there is no correct answer until the measure is made).

Change Technology

By now, I shouldn’t even need to talk anymore.  You should have a team that is on a path to functioning, collaborating, and looking for ways to save time and effort.  They know what they want to do, where they are frustrated, and have surfaced these issues.

Now go Lean.  Find the weaknesses in your flow of work and in your security risks.  Where do you need speed, where do you need to mitigate risk, and where do you need more data

Posted in CISO, CSO, DevOps, DevSecOps | Leave a comment

Glass Houses…and Music Majors

First, a disclaimer…this post is *not* about bashing or ranting about Equifax’s security practices. Why? Because I do not have first hand knowledge of what they did or did not do, or what specific exploits and vulnerabilities were leveraged throughout the kill-chain of the event. Frankly, it’s likely only the security investigators (internal and external), legal team, and outside counsel will ever know the details. Which is just fine by me. If you wonder why, then you’ve obviously never been involved in a breach and the subsequent investigation. There is a lot of conjecture (some logical, some not so logical), and lot of hand wringing, certainly a lot of drinking (after hours), and a whole lot of lost sleep and hair (if you have any to begin with).

So why would I mention that?

Because I want to rant for a moment about the security community and the press who seem to have taken issue with how Equifax was breached.

This has nothing to do with their response to the breach.  Lets set aside Equifax’s horrible response after the breach. I will not condone, support, or even pretend to empathize with their response. To put it mildly, their response to the breach sucks. You were breached. Mae Cupla, and treat your customers, constituents, not-so-willing-pubic-whose-data-you-have like your own injured child who you just accidentally knocked off a ladder and gave the a lump on the head (and maybe a concussion).

Let’s instead talk about the blame we seem so eager to apportion.  Security professionals, take note of something we always say:

– It is not “IF” you will be breached, but “WHEN”

So suddenly Equifax is evil because they were breached?

You may counter, “but they had a vulnerability that was *3* months old!!!!”

Um, yeah….about that. Let me ask you how old the vulnerabilities are on your laptop that you use for your pen-testing. And if you are a CISO or other security professional employed at a company, and you believe you patch your public facing systems perfectly in less than 90 days, you are *woefully* uniformed, I would argue “naive” in understanding how companies work, and not plugged into something called “risk acceptance”. Ouch, I think I just touched some nerves, but let me assure you, this is not personal. It is about the dynamic of an organization – something that outsizes the best of us.

Again, I cannot say this is Equifax, but I can say that nearly every company I’ve come in touch with struggles with this same problem.

Security Team: “Bad vulnerability, and its out there exposed to the Internet. We must patch it right away!”
Development Team: “Can we test this first? Its probably going to break the application.”
Business Team: “This is a really critical app for our business group. Please don’t break it.”
Managers: “Don’t break the app. Can this wait?”
Executives: “We’re listening to all the people here, and can we please just not break things? Lets take it slow and test.”
Development Team: “We have features to get out that are a priority and are scheduled to go out in the next three weeks for a customer.”
Business Team: “Please don’t interfere with this critical customer need.”
Executives: “Can we please not break things…”
Development Team: “The patch breaks something. It will take us a couple of months to figure out what. Meanwhile we have these other features to get in.”
….

See a trend? I don’t want to represent this as being an endless cycle. Reality is (at least for the organizations I’ve worked with) they do eventually, in a fairly reasonable period of time (which I will admit is a *very* subjective assessment) get around to figuring it out and fixing whatever is broken by the patch. Some organizations are great at it and it might take one or two sprints to figure it out. Others, either other priorities or their backlogs are long, and maintenance work doesn’t fit into such a high priority, but they still get to it within 3-6 months. In some cases, depending upon the complexity of what a patch breaks, that’s pretty darn good. And if you are skeptical of that, you need to spend a bit more time embedded in a development team.

I remember quite a few years ago listening to a talk at BSidesSF (one of the early years) from someone whose day job was to teach companies how to write secure code, and evaluate code for security vulnerabilities.  He talked about a program that a customer asked them to write, and how, in their efforts they found that they committed exactly the same secure programming mistakes they lectured their customers to avoid.  They had vulnerabilities in their code, that were easily exploitable.  They found that deadlines made them take shortcuts and not get around to putting to use all the best practices that they could (or maybe should) have.  And these were some individuals who I had very high regard for in the application security field.  They admitted – “Its hard in the real world to do it right.”

So what should we learn from Equifax?

Security isn’t perfect.  We shouldn’t gang up on an organization just because they had a breach.  Every organization is trying to balance a business opportunity with the risk being posed to those opportunities. Its a balance. Its a risk equation. Its never pretty, but lets face it, most organizations are not in the business purely for the sake of security.  Every control costs money, causes a customer frustration, and has an impact on revenue.  You may say a breach does, and it does, but there is a balance.  Where exactly that balance is can be a subject of great debate because it is not precise, and can never be predicted.

Patching is much more than just “patch and forget”.  Application patching is even more complex.  The alleged vulnerability cited in the Equifax breach was 90 days old.  Even if it was 180 days old, there are factors we cannot even begin to understand.  Competing business interests, a belief that its exploitation couldn’t be leveraged further, a penetration team that didn’t find it or the exposure it could lead to because the applications were too complex to understand, or even human error missing the finding through a reporting snafu.  Stuff happens….no one is perfect, and we shouldn’t throw stones when our own houses have (despite our own protestations otherwise) just as much glass in them.

Ultimately, there are some practices that can help, but I will put a disclaimer here – these may already have been in place at Equifax.  Again, we are human, and systems/organizations are complex.  Complexity is hard to get right.  We also don’t know the full kill-chain in the Equifax scenario.  There may be more things that would help, or for that matter, these things may have been in place, it required even more complex efforts to address the root cause.  That said, here’s some things I would suggest:

  • Try to understand every application in your environment and how they tie together.  Knowing the potential chains of connection can help you understand potential kill-chains.
  • Create multiple layers of protection – so you can avoid a single failure being the result of catastrophic loss.  You can liken this to the “swiss cheese effect” where the failure occurs at multiple layers (or there aren’t any layers) and the breach easily cascades further and further into systems and data.
  • Run red-team exercises with targets as goals (e.g. that big database with customer data, or the AD domain user list).  Let your red team think like an outsider with a fun goal, and flexibility of time to figure out how to get there.  The results will inform you where you can improve primary controls, or where you can add additional layers or protection.
  • Patch external systems with far more urgency than internal.  This seems pretty obvious, but sometimes how we represent vulnerabilities is too abstract.  I have found that using the language of FAIR has been an immense help.  Two factors I try to focus on: Exposure (what population of the world is it exposed to) and Skill/Effort to exploit (is it easy or hard).  Given the volume of opportunistic threat attempts (a.k.a. door knob twisting), it makes sense to point to those values as key indicators of what will happen with exposed vulnerabilities.  I once pointed to the inordinate number of queries on a specific service port that a client used as proof that the “Internet knew they were there…” which leads to my last point…
  • Communicate in a language that people can understand, and in ways that make it real.  If you talk in CVSS scores, you need to go home.  Sorry, but to quote a favorite line of mine, its “Jet engine times peanut butter equals shiny.” (thank you Alex Hutton, your quote is always immortalized in that fan-boy t-shirt).  Put it in terms like: “The vulnerability is exposed to the Internet, there is nothing blocking or stopping anyone from accessing it, and the tools to exploit it are available in code distributed openly to anyone who has Metasploit (an open-source, freely available toolkit).  The attacker can then execute any command on your server that the attacker wants including getting full, unfettered access to that server, its data, and….”

Those are things I coach my teams on.  Things we should look at and learn from.  Because we need to find data that helps us get better.

One last thing that chafed my hide…

Some people had the audacity to say “…who would hire a CISO with a college major in music…”

Setting aside the rather comical philosophical rant I could make based on UCI’s research on the effects of Mozart on students studying mathematics, I’d like to put forth my own experience.

I hold a Bachelor of Architecture (yes, buildings!) and have a minor in Music, and two years post-bachelors in organizational psychology.  I am a fairly accomplished security consultant (who has done penetration testing and programming) and CISO.  My degree is not a disqualification from being a CISO, any more than a Music degree disqualifies the former CISO for Equifax for having her job.  Simply put, “COMPUTER SCIENCE IS NOT A PREREQUISITE FOR BEING A CISO”.

I have interviewed dozens of CISOs around the world.  Nearly every one of them said they liked having liberal arts majors and people outside of Computer Science fields in their teams because they brought a very different insight and analysis to the team.  It is my opinion that by the time you have reached five (5) years of experience, your college education is largely immaterial.  There are theories and data that college informs you of – such as what a petaflop is, if-then statements, and theory of assymetric encryption, but college does not tell you how to use these skills in the ever-changing dynamic of real life.  I call these skills the ability to analyze, synthesize, and respond.  E.g. the act of design.

For the CISO of Equifax, it is likely that her skills of analytics and design, and her ability to communicate those thoughts to executives were highly skilled.  It is also likely that she had experience with software, with networks, and other technical areas.  I can relate because in my undergraduate education for Architecture we had to take a Pascal programming class in our freshman year.  We had to take a “Computers in Architecture” class.  What I did with it was unique, and I would suspect what the former CISO of Equifax did with her experiences was unique as well.  Putting a blanket assumption over anyone’s experience is ill-informed, and frankly, quite naive.  Have a chat with them.  Know their skills, learn what made them capable and skilled, or at least trusted at what they do.  Then critique what they have brought to the table *today* as a result of all of their experiences (school included, but also all their work since then).

So let everyone put down the pitchforks and stones we were going to throw at someone else’s glass house, and go back to tending our own glass houses and note how someone else’s glass house got broken as a way to learn how to protect our own.  Because what I’m hearing so far isn’t helping, and is based on a lot of arm flapping and people far too interested at pointing at other people’s glass houses than tending their own.

Posted in Uncategorized | Leave a comment

Shifting the Conversation (An SDLC Story)

I’d like to tell a story (a mostly real one) that can help you think through how to make your DevOps transition a little smoother, level set some over-exuberance, and ensure everyone feels they are getting a fair shake in a way that is collaborative.

I had a customer whose teams talked endlessly about how they wanted to get to DevOps, continuous integration, and high velocity of deployments.

The challenge is that they talked about DevOps making Deployment going faster.  They wanted rapid deployment, daily changes, and to push code to production every day.  As a result everyone latched onto what they thought it meant.  They talked about faster creation and deployment of new features.  They talked about end outcomes and the excitement of reaching that end goal of daily pushes.  Developers thought they had reached nirvana and could get all the code that was backlogged into production whenever they wanted it. Operations teams thought it meant that development would write cookbooks and test everything and they could focus on undoing technical debt, getting rid of crappy code, and making things work right in production.

Now these are all valid goals of DevOps.  They all are things we want to strive for.  But they were being framed in the legacy biases of Dev vs. Ops.  As an example, someone who is typically production and operations focused could easily quickly admonish the developers for being “unrealistic” in their expectation to jump straight to daily releases and rapid increases in speed and velocity.  You don’t jump straight from typing in code to putting it in production.  At least not in reality, and certainly not with quality.  While there is truth in these statements, any admonishment is going to be perceived by developers as blocker to the speed and velocity they want.  And they’ll push back and say, “Its all Operations being the blocker and slowing us down!”  And they’d be partially right.

So instead of admonishing the developers, we changed our language and effort to focus on one of the source of our issues –  environmental stability. The development and QA environments were unstable, systems were undersized to run any meaningful tests or even run the programs run on production systems, and they did not have representative data to work with.

We started saying “we are going to give you stable Dev and test environments”, “we’re going to increase speed and accuracy of testing”, “we’re going to get you good test data that is as close to current and complete as possible in prod as possible”, “we’re going to give you any data you need to identify, debug, analyze and respond to test and prod failures”.  This shifted the conversation from being adversarial (Devs pointing at Ops and saying their obstructionists) to being collaborative (ooooh, they’re going to give us shiny new toys!).

Ops focused on building a proper development and QA environment that could very accurately depict production.  We first sized resources (hardware, networks) that could support the effort.  This might seem “wasteful” since development doesn’t generate money, why don’t we go with left over systems.  But the point I raised was that development was where the real work was taking place – where undersizing would be a mistake and lead to all the mistakes happening in production, where mistakes cost money.  Lets instead make mistakes in an environment where it doesn’t cost the company money.  This doesn’t mean that we spend exorbitantly, but that we shouldn’t be foolishly cheap.  Development/QA was built in the way that the teams wanted to build production.  It used the tools they wanted to use in production.  And we ignored any further work on production.  Yes, you heard that, we didn’t go after technical debt in production (unless it caused an outage).  Why did we do that?  Because there was no sense in fixing things that we didn’t know yet if those fixes were appropriate.  We needed to test the entire infrastructure, not just the code, as a development effort.  We needed to get code that was tested and optimized and architected the best way through prototyping.  We needed to test building systems, deploying the operating system configured in the way the development teams needed it configured, installing databases, and doing anything else that was needed to give the developers the environment they would expect to deploy on top of.  We needed to do this in an environment where we could make mistakes, learn from them, and correct them – all without impacting generation of revenue.

What we accomplished was a double win.  We gave Developers the resources they needed to be productive.  We gave them tools, stability, data, and capacity to experiment.  We gave them testing tools…and the operations teams got to test right along side of them.  They got to build the tools, build the stability, learn how to handle the data, and build the capacity.  It was no longer pie in the sky but what each Dev team wanted and needed to go faster, and lessons on how Operations could clean up the technical debt in a way that mirrored the Developers’ intention.  It was about how we could positively influence the  lives of our Dev teams, and Ops teams.

Posted in Uncategorized | Leave a comment

Random Favorite Quotes

The following are quotes or paraphrased notes taken from talks I have seen, podcasts, or general conversations with people I know.  If you feel you didn’t say these words, or wish to correct them, just contact me.

———

Microsoft gets it: you don’t teach programmers to be security people.  You do it for them (or make it hard for them to do it wrong). – Unknown

——–

“Don’t make people security experts, make it easy for people.  Get out of the echo chamber.  Make accessible the message that people care about.  People don’t want to think about security in what they do – they just want it to be there.”  – Josh Corman

——–

“Make things simple and they will do it.  Make it easier so people will use it.”  – Unknown

——–

“People respond to transparency and openness.  When issues are exposed – surfaced.” – Unknown SIRAcon 2016

——–

“We have to accept that its not our risk tolerance that matters as risk practitioners or security professionals.  Its the person accountable for the risk at the end of the day. And until you overcome that your almost a barrier to what you’re trying to achieve.”  – Chris Hayes

——-

“We have to work with the biz to get them to understand the risk, and design with it (for better solutions). This is why security should have 2 parts (maybe 3). A) understand and design ways to mitigate the risk for the new, B) manage risk day to day, operations C) Analyze the performance and effectiveness over time”

———

Risk Manager’s job is helping CSO sell security – sell the project.  Whether its a great big investment decision, or small item – what are the attributes, the Risk and Opportunity measures (estimates and forces at play).  – Alex Hutton

———

Risk Management / Security Metrics is a Security Optimization Program

Posted in Uncategorized | Leave a comment

The Legacy of Controls (A DevOps Story)

I recently had a pair of encounters that have opened my eyes further to both the causes of our current messy state of IT affairs, and given me hope for a better future.  In both cases the issue that came up with access to production environments.

In one particular case a user had their access removed – ostensibly on the grounds that their access violated “segregation of duties between development and production”.  There are numerous control frameworks that demand a segregation of production and development environments.  There are even others that say personnel should be fully segregated.  Lets look at where this came from, and what the outcome has been:

  • Segregation of duties came about as a control for preventing one person from performing an end-to-end activity by introducing a check that the activity was appropriate.  It started largely as a financial control.  The most obvious is preventing an Accounts Payable clerk from inputting a purchase or payment request, and then processing that payment request themselves – all for the benefit of their own personal bank account.
  • This control was extended to IT – especially during the Sarbanes-Oxley days – as a way to ensure that a developer could not introduce ways into the programs to siphon off pennies all for the benefit of their own personal bank account.
  • This control was then extended further to include personnel access to anything in production because (again ostensibly) it was believed that sharing information about production would create knowledge that developers could exploit.

Lets be clear.  Controls that prevent the theft of money (fraud) are important.  However the lengths to which this control has been extended has become ludicrous.  What it has done is damage the workflow, trust, collaboration and functioning of the IT department and its ability to support the business needs of all other parts of the company.  How you ask?

  • The segregation-of-duties controls are extended to deny developers visibility into the environment, which means their situational awareness of how their programs are running is removed.
  • They lose the belief that other groups trust them since their visibility is removed.  They pull up a wall.
  • They now view the operation of a program as “someone else’s problem since they don’t let us in”.  The pull that wall up higher.
  • They now throw programs over the wall – because “we’re not responsible for them in operations”.  Operations hates when this happens.
  • Myriads of other controls flow in to stop-gap the problems that development teams don’t have the visibility to understand.  Testing requirements increase to address the problems since it is believed the problem is in insufficient testing.  The testing becomes cumbersome, laborious, and yet largely ignorant to the problems that happen in production.
  • Costs go up, blame goes up, and failures happen…and the speed of work goes down.

Sounding familiar yet?

So how does this fit into my realization?  Access into production for developers is not a bad thing.  Developers should have visibility into application and system logs so they can view the reaction of their code in real world situations.  Developers should have the ability to see elements that are not sensitive.  They likely shouldn’t see sensitive data like payment cards, or encryption keys, but they should be able to see configuration files, data types and definitions.  Give developers what they need to create a feedback loop that is clear, unobstructed, but doesn’t violate regulations.

That being said, developers promoting code into production without checks and balances is a bad thing.  That I think we can agree on, but how does that fly with a DevOps mentality?  How about:

  • Changes can go into production once they go through an automated test suite.  They are only available for check-out when they meet that criteria of that automated test suite.
  • Production personnel (ops) can promote into production anything that has gone through the test suite and is available for check-out into production.
  • Development personnel can check problems and push fixes through this same chain.

If you notice, in the better world, developers have access to view, and monitor the production environment – they have a feedback loop.  In the better world, developers still have to have their programs vetted by a testing procedure before changes are pushed to production.  The key control objective is still met – reduce the probability for fraud – but with controls that keep the collaboration, accountability, and teamwork in place.

Now in the two cases that I came across, both arrived at the same conclusion.  Both believed that visibility was important.  Both believed that it could be achieved.  The challenge was to educate those who have accepted the de facto standard of full segregation without understanding the original goal, and the impact of such a decision.

Posted in Uncategorized | Leave a comment

Velocity vs. Anti-Velocity

No, its not the new anti-matter, or maybe it is.

I’ve watched IT organizations now for 26 years.  The sadness I feel is that I’ve continuously seen the same downward spiral:

  • Failures are reacted to as a only that – failures.  And failures cannot be tolerated.
  • Someone gets blamed because of course it is always a human error
  • Focus is put solely on slowing things down because if we slow down, of course things will get better (right?)  More time can be spent on analyzing every action to make sure it never happens again.
  • More steps are added to processes to make things less prone to failure – usually manual, because of course humans can imbue greater success and less failure into IT systems (remember the old joke: there are two problems in every computer problem, and the first is blaming the computer)
  • Changes, features and maintenance slow down because it requires more manual intervention to get them in place
  • Management, sales, and all that revenue focus pushes for those changes, those features, those requirements and usually overrides the slow down – but for features that are not ready, not tested because we’re still working on the old changes from months prior
  • The CIO and IT Managers fight with sales and management because they are asking too much
  • CIO and IT Managers quit, or are fired because someone always loses that battle.

I dub this cycle anti-velocity.  It is the failure of IT organizations to create velocity.  Organizations reduce their movement to a crawl – frozen and frustrated, unable to move forward and certainly unable to move back.  The freeze themselves in fear, in mis-guided notion of what it takes to correct failures.  “Slow things down so we can study them more.” “Find out who did it and fire their *ss!” “We never test this stuff enough – we need weeks to do this right.” “This requires full review of all test documentation during the Change Control Meeting with all documentation brought to the meeting where everyone must attend.”  (Yes, the last one is a real procedure for Change Control that I’ve encountered.)

Now, lets talk about what builds velocity, or the ability to move forward at a constant and ever growing speed.

  • Find the root cause – the honest the root cause.  What really caused the failure, and be honest and open about it.  Track the causes and know where they come from.  Look for patterns in the analyses.
  • Don’t believe that rote assumptions will tell you where to fix it – use the data you collect and the root cause analysis to really identify patterns.  I have watched companies assume that certain activities are the reason they have failures because they have been schooled to think this way – without ever questioning, “How would I know if my assumption was wrong, how could I test it?”
  • Do not go on witch hunt, and do not go about the task of root causes analysis looking for someone to hang.  Remember that failures are where you learn where you need to improve.  If you fire someone, who says his replacement is going to be any better?
  •  Identify ways to prevent the failure that do not slow down the process.  Remember the death spiral of anti-velocity above?  Remember that you want to do everything to avoid it.  Slow downs are the beginning of that death spiral.
  • I’ll give one caveat allowing for slow-downs: if your slow down is temporary to get a correction to your process in place that allows you to go faster, be more accurate, and be more resilient, then it is okay…because you are gaining a longer term velocity for the sake of what I would call a hiccup.
  • Build solutions that eradicate the faults, the errors and anti-velocity in your environment.  You will learn over time how to do this – through a process of continuous improvement.
  • We want to eradicate the faults, bad practices and build an environment that can sustain itself through human errors.  (Because lets face it, we are the first problem in every computer problem.)

I become quite excited when I see velocity and a process that is fluid and working to speed itself.  The greatest excitement is that their change processes improve dramatically.  They process more changes, they do so with a higher success rate of implementation, and recover from failed implementations because every process has failures.  I have watched four different organizations recover from anti-velocity.  I have seen two who knew how to create velocity, and we were able to build powerful sets of controls that did nothing to slow that velocity.

Unfortunately I have seen just as many mired in their anti-velocity and unwilling to emerge.  The believe in big-bang changes – long cycles of review, backlogs of changes due to failures, blocking pre-requisite implementations stuck in review, and long cycles to get through a cumbersome process.

But then, from what I’ve heard, companies that have anti-velocity in IT, have this tendency to gather anti-velocity in their business as well….hmmmm…..

Posted in IT Governance | Leave a comment

Loving the John In All of Us

I found myself in one of my least favorite moments a few weeks ago.  I was having a discussion about the build out of a new environment.  Someone brought up the subject of how people should access the environment and I started laying out my vision.  It included several specific and significantly restrictive controls and requirements.  I got through half of my list and the most senior person in the room jumped up and said they were unreasonable.  I almost had a knee-jerk reaction of defending them with a “You must do this to be secure!”, but stopped myself as I realized I fell into a trap I so often preach against.

What I had done was bury my head in the sand of a regulation, a checklist of requirements and let myself preach from what I thought security was, and not try to find what the business or the operational environment needed.  I was wrong.  Dead wrong.

Finger To The Foreheadppcover-3d

I had the great fortune to be invited by a Gene Kim to read his early drafts of his book “The Phoenix Project”.  It is the story of one company’s attempt to overcome its obstacles and survive.  One of the characters in the book is named John.  He is in charge of Information Security at the company.  He carries a binder of controls, and is continuously focused on security because he needs to save the company from its security failures.  Except it isn’t security the company is struggling with – it is struggling with its own business and operational survival.  John however is not attuned to this.  He is focused on a checklist of requirements that are completely tangential to the company’s needs.  John has his own climatic scene where the antagonist of the story finally beats down John’s character with a finger to the forehead and a stern lecture that he better find out what’s important to the company and get out of the way.  I laughed hard as I read this scene.  I laughed because I can think of all the times I deserved that finger in the forehead.  If you can’t think of the times you deserved that finger in your forehead you are deluding yourself.

Why Do We Act This Way

There are probably a litany of reasons why we tend to operate this way.  The one reason that always seems to make the most sense to me is the simple constraint in our ability to operate outside of what we know.  We use the skills and knowledge (cognitive domain, awareness, call it what you will) we know best, what we have been schooled in, read and heard.  I have been, like many of us in information security, fed lists of controls, told that things had to be a certain way, and that breaches, like burglary or murder, carried huge consequences.  I was taught responses to situations from the perspective of security – a professional deference – because that was my job and task.

And we are not alone.  Others do the same within their profession.  There are people in marketing who only see the world through a marketing perspective; or sales; or financial; and the list goes on.  Even our own children see the world from the limits of what they know and what they’ve been taught.  If we all knew the bigger picture we likely wouldn’t have had the embarrassing stories from our high school and college years, and use the phrase “If I only knew then what I know now.”  We all have a bit of John in us – even when we consider ourselves enlightened.

Learn To Embrace the John in All of Us

We all have our constraints so the best way to overcome them is to first accept that we have them.  Acknowledge them.  Admit that many of the things that we discuss, propose, and recommend to people come from our perspective on the problem.  This suddenly makes the problem have multiple angles that it can be viewed from.  You may not be able to see all of them, but you certainly can ask someone else to tell you how it looks from their angle.

Ask questions.  One of the first things I do when I find myself in the situation of being dead wrong is to set aside all my security concerns, suspend my preconceptions, pretend to be a complete outsider, and ask what is important to the business – what is the real business goal and objective.  Things like how it creates revenue, how does it help the company, and what would happen if it was to stop working.  The perspective is suddenly very different than when I look at it as a security person.

Then, I take one of my favorite steps.  I create a solution that focuses on achieving the business goal, and that gives back just as much as it takes away.  I have a rule with my teams, “For every control you put in place, you must give something back to the people affected by the control.”  This creates some shock, some amusement, and then very puzzled looks.  Several people have asked me why I do this.  Some have resisted the rule, but I rarely waiver.  This rule forces my teams to focus on and understand the impact of what they are doing when they put controls, policies, rules or anything else in place that is restrictive.  And then it forces them to think of how they can make it less restrictive, or provide some benefit that is in line with the original business objectives and goals.  It makes them understand what the affected people need to do their jobs better and what really matters to them.  You also create some raving fans when they realize you understand their needs.

And lastly, and most importantly, recognize the Johns in all of us – in everyone around you.  Encourage them to do the same as you – to learn to accept their inner John, to explore and ask questions, and to look from different perspectives.  As role models we can develop the patterns in others and they will begin to mirror our behavior.  Poke people in the forehead once in a while, and remind them to learn what is really important, and listen a little better.

Posted in CISO, CSO, Information Security, InfoSec | Leave a comment

The Quantum Vulnerability Tunneling Effect

I know I had promised to talk about how to implement a risk management program in your small organization, but bear with me for a blog (or two).  Given that my brain has been wrapping itself carefully around risk management for the last few weeks, I have found myself revisited ideas from my past.  One particular incident this week reminded me of a subject that I’ve talked and written about before.

One of the individuals on my client’s InfoSec team is responsible for vulnerability scanning and management.  He’s quite talented, has good insight on the vulnerabilities, but like many others in InfoSec, he suffers from the blinding effects of Quantum Vulnerability Tunneling.

“The What?” you ask.

Yes, you heard me, Quantum Vulnerability Tunneling Effect.  For those of you not familiar with physics, this is akin to a process whereby a particle can bypass barriers that it should not normally be able to surmount.  So what does that have to do with vulnerabilities?

The barrier we place to separate vulnerabilities to address and those to accept is typically an arbitrary line we set that says “We’ll address fives and fours, but we’re going to let threes, twos and ones go for now.  This is our barrier, and heaven help the vulnerability that thinks it is going to make its way over that line.  Except….

Did you ever do a vulnerability scan, read through the findings, and find yourself stopping on one vulnerability in particular.  You see it and the thought runs through your head, “Oh, Sheiße!”  Suddenly the world around you stops and you focus on the vulnerability.  You know how it can be exploited.  You’ve read about it in magazines, and you’ve even done some of the necessary tricks yourself in a lab using your kit of tools.  In this case the individual at my client’s site had found a vulnerability that had been classified by the vulnerability scanner as just below the event horizon of “critical vulnerabilities”.

He saw this and upon looking at it had his “Oh, Sheiße!” moment.  He went to his manager and presented his case for why this vulnerability should be remediated.  Immediately.  He proceeded in a very animated fashion demonstrate with his hands and his words how this vulnerability could be exploited and how dangerous it was.  His manager had some good replies to his demand, but the individual walked away unsatisfied – probably because the replies talked to business impact and other metrics that did not have meaning to a vulnerability guru.  When all you have is a vulnerability scanner everything looks like a…

So I sat him down and had a little chat so he could consider the same answer from a different perspective.  I didn’t focus on the impact to the business operations since I saw that it was not clicking for him.  What I did was asked him to do a risk assessment of the vulnerability with me:

I asked, “What is the population of threat actors.”  We had already had a chat within the group that we would classify threat actors by loose groups of individuals so we could get groupings of actors.  We agreed on classifications of Universe/Internet, Company Internal, (specific) Department, Local Machine Users, Administrators, and No One.  He replied that it was *anyone* Internal (said with animation).

I asked him, “What level of difficulty was the vulnerability, keeping in mind commonly known mitigating controls in our environment.”.  He commented that it was a module in Metasploit.  Ah, so it was below HDMoore’s line.  I asked him how certain simple controls we had in place would mitigate it.  His reply, it would make it pretty difficult but not impossible, and it had been documented.  So we agreed to put it right at HD Moore’s line. (We haven’t really qualitatively classified difficulty yet, working on that definition still, but HD Moore’s Line is the start).

I asked, “What is the frequency of attempts to exploit this vulnerability.”  We use attempts since there is rarely good data on actual breach counts, but with a good honey-pot we’ve found we can estimate pretty well the frequency of attempts.  I’m really warming up to the importance of a honey-pot in a company’s environment.  The data you can collect!  And it makes frequency something you can lump into categories.  In this case we didn’t have any data at all since no one would set up an internal honey-pot, so we deferred to Threat Actors as a reference point.

I asked, “What’s the value of Assets that are vulnerable.”  The individual responded, “All things on the computers!”  I whittled him down to some tangible types of data.

We merged all of his answers into a sentence that he could say.

And then I asked the magic questions.

“How many vulnerabilities have we identified in the environment?”

He gave me a number.

“Using the same risk measures, how many of these vulnerabilities are a greater risk than the one you just pointed out to your manager?”

Silence for a moment, and a sheepish smile came across his face, and he said, “I get it.”

I have seen this situation many times before.  In the moment of discovery we get too close to a vulnerability or a threat, and we obsess on it.  We study it intently and learn everything we can about how to leverage it, how it can work.  It becomes real because we can understand it and perform at least portions of the attack ourselves.  We focus on it because it is tangible and at the forefront of our mind.  We become obsessed and let that item tunnel its way beyond any barriers of urgency to place itself at the front of our priorities.  The Quantum Vulnerability Tunneling Effect.  We’ve all fallen prey to it.  We’ve all tunneled our issues to the forefront out of fear and uncertainty.  That’s why I liked using the risk assessment.  It required that he re-examine his assumption that this vulnerability was critical, and test it with facts through a risk assessment.  It reset the perspective of the vulnerability in relation to everything else that should be considered with.  He wasn’t happy that the vulnerability was going to be accepted as a risk, but he also recognized where it belonged in the universe of risks.  He could look at the forest and see that it was filled with trees, and some were more worth harvesting than others.

I used to do a similar exercise with my team when I was leading security.  We did an in-house risk assessment.  I made the team list all of their perceived priorities regardless of how big or small, how insane or sane, and regardless of whether they thought it urgent or not urgent.  I wanted them to know that their ideas and concerns were going to be considered.  We then went through a highly interactive and risk analysis session that resulted with a list of priorities based on those ideas.  We put the top ten that we felt we could accomplish during the year on a board at my desk, and the remainder went in a book on my desk so we could say they never got lost.

Someone on my team would invariably come to my desk, hair on fire to say they had a risk that*had* to be taken care of right away.  My response was cool and calm.  I would simply ask, “Does it require greater attention than any of the items on that board.”  This would stop them in their tracks and make them think.  They would look at the board, think for a few minutes and respond with a “Yes”, or a “No”.  Usually it was a “No”.  If it was a No, we would pull out my book and write down the issue.  If it was a Yes, I would have them write it on the board where they thought it should go, and put their name next to it.  They could claim the success, or suffer the ridicule from our team if they were way off.  Priorities and perspective were maintained.

The Quantum Vulnerability Tunneling Effect was avoided, we stayed calm and on course, and we could react well when a real emergency came along.

But those are just the effects of when you think using your risk.

Posted in CISO, CSO, Information Security, InfoSec, IT Risk Management, Uncategorized | Leave a comment

Accuracy vs. Precision – My Risk Epiphany

Did you ever have a moment where a concept you have never been able to figure out or understand suddenly clicks in your head?  I had long struggled to understand a key element of Risk Management – how to perform a risk assessment model that included likelihood.  And a strange confluence of circumstances made my light bulb go off.

Now before I go into the story, let’s cover a bit of background on this.  Risk Management is a field that I admire, and consider critical to any organization, its operations, and especially important to my field which is Information Security.  Being able to communicate risk using tangible descriptions message of risks to an organization is critical.  But I could never quite seem to do it with the precision that I felt necessary.

I always stumbled on the issue of likelihood.  I could estimate with surprising ease the cost of an incident.  I have mastered the process of asking key business groups about the cost of failure and know how to test their attributions of cost.  I have been extremely comfortable identifying the costs of an incident – the cost of lost productivity, the cost of lost sales, the cost of lost intellectual property, and “range of losses” was a concept I could easily make tangible.  For retail companies I could estimate a range of lost revenues by looking at highest day revenues (Black Friday) and lowest day revenues.  That became my range.  I’d find the median and we’d have three values to work with.  I would also be able to factor idle time of workers, unused infrastructure and equipment, and compute these down to the last dollar if I so cared to be that details (which I usually didn’t – getting to the nearest $100,000 was more than enough for these companies).  I could even sit with a marketing team and estimate lost good will based on cost of advertising to regain those customers lost, and revenue downturns due to those lost customers.

But I could never feel comfortable with creating a picture of the likelihood that some event would occur.

Why? I wanted it to be perfect.  I wanted no one to question the numbers – they would be facts, let the chips fall where they may.  I wanted people to know in absolutes with absolute precision.  Except there is no such thing as an absolute – especially in risk. The light bulb that went off in my head was the light bulb of “imperfect knowledge”.  Risk is an estimate of possible outcomes.  It is about being accurate, not about being precise.  Bad risk analysis is when you pretend you can give absolutes, or make you make no attempt to find a range of things that are “more likely”.  Do I have you scratching your head yet?  Good.

Let me give you an analogy to illustrate what I mean by accuracy and precision.  In a battle, accuracy would be knowing where your enemy is attacking from, or even where they are most likely attacking from.  If you find out that your attacker has the capability to scale that 3000 foot cliff that you discounted due to level of difficulty you would add that  because it would show a more accurate picture of all possible ways your enemy will attack you.  That accuracy is accounting for all possible outcomes.  Precision is knowing exactly where to aim your cannon so that it hits your enemy at an exact spot (biggest tank, largest warship, best group of archers).  Accuracy won’t help you to aim the cannon.  Accuracy will tell you where to put the cannon and what range of fire it will need.  Precision be about aiming your cannon, but it will fall short on telling you where to position your entire army.

The problem I have struggled with in risk analysis is that I wanted precision – and that made me struggle with the determining likelihood.  The confluence of ideas hit me two days ago.  Somehow the idea of Alex Hutton’s and Josh Corman’s “HDMoore’s Law” (an InfoSec bastardization of the “Mendoza Line”) combined with having just chatted quickly about CVSS scores and the idea about “difficulty” associated with vulnerability scores made something click.  That and a peek at a risk analysis methodology that didn’t try to make likelihood a precise number.  Instead it asked a simple question – describe the skill required to achieve the event, and provide a range of frequency that the event would occur.  Bing!  I could work with descriptions, and so could executives!  If you try to arrive at a precise number, executives who play with numbers all day long will probably rip it apart.  If you give them probable ranges and descriptions of the likelihood, they get the information they need to make their decision.  It is imperfect knowledge.  And executives make decisions using this imperfect knowledge every day.  The more accurate the imperfect knowledge is, the more comfortable the executive will feel making the decision.  And for an executive, the easier for him to understand the imperfect knowledge you give him, the more he will appreciate your message.

So what did my epiphany look like?

First I realized likelihood is balance of understanding level of difficulty for an event to occur and its frequency.  Level of difficulty is really about the level of effort or confluence of circumstances it would require to bring about an event.  Take a vulnerability (please, take them all).  How much skill would the person require to exploit a given vulnerability?  Is the exploit something that even the average person could exploit (an open unauthenticated file share), something that is available in Metasploit, or is it a rare, highly complex attack requiring unknown tools and ninja skills?  This is not to say that the exploit cannot be done – it is determining if the population that can perform the exploit is smaller than the universe, and hence likelihood reduced.  The difficulty of having a tsunami hit the eastern cost of the United States is based on the rarity of unstable geographic features in the Atlantic Ocean that would generate one.  The Pacific Ocean on the other hand has a large population of unstable areas that can generate a tsunami.  The skill required to exploit an unauthenticated file share or FTP server is far different than the skill to decrypt AES or to play spoofed man-in-the-middle attacks against SSL.  I can already see the binary technologists fuming – “but, but, people can do it!”  Sure they can.  Any attack that has been published can be done – and there are many more that haven’t even been made public yet that also can be done.  A business cannot afford to block against everything, much like we cannot stop every car thief.  What we can do is avoid the stupid things, the easy things, and more importantly – the most likely things.  This is a calculated defense – choose those things that are more likely to occur until you run out of reasonable money to stop them.

Then I took an old concept I had around frequency.  For me there are multiple source that I can use to extrapolate frequency.  Courtesy of the three different highly data-driven analyses of breaches produced by the major forensics organizations, we can begin to estimate the frequency of various types of attacks.  Data repositories like VERIS, the various incident reports and general research of the news can give us a decent picture of how often various breach types occur.  A great illustration of this is Jay Jacob’s research on the Verizon DBIR data looking for number of times that encryption was broken in the breaches researched.  The data set was a grandiose zero (0).  Frequency can be safely ruled “low”.

Suddenly I was able to walk through a vulnerability report I had been handed and put together a quick risk analysis.  I asked five questions:

  1. What assets are on the affected systems?  (for example email, payment card data, PII, intellectual property…)
  2. What population of people would have access to directly exploit this vulnerability? (Internal employees, administrators, or anyone on the Internet)
  3. What is the level of difficulty in exploiting this vulnerability? (CVSS provides a numerical scale which I was more than happy to defer to, and in some cases where the general user population could exploit it, we created a “-1” category)
  4. What is the frequency that this type of exploit has occurred elsewhere, and what have we seen in our organization? (research into DBIR, asking security team at client site)
  5. What controls are in place that would mitigate the ability of someone to exploit this vulnerability? (such as a firewall blocking access to it, or user authentication, application white-listing etc.)

I took all the data that was collected and turned the risk into a sentence that read something like this:

“Examining the risk of being able to see information sent in encrypted communications:  Anyone on the Internet would have access to attempt to exploit this, however a very high level of competency and skill is needed to decrypt the communications.  The frequency that this type of attack occurs is very low (typically done in research or government environments with mad skills, and lots of money).  There are no additional controls in place that would mitigate this risk.”

The last glue that fit this all together was making all of your assumptions about the risk explicit.  I’ve talked extensively about the value of being explicit – it makes the data easier to examine, challenge, correct, and make even better.  The result is a more accurate risk assessment based on more accurate data.

The true detractors of Risk Management would point out that none of this is perfect or certain.  They would be correct, but then nothing in life is certain.  We tend to want to be perfect, to be right and not wrong because we fear wrong.  The sources of this tendency are boundless, but a bit of that I suspect is from our high level of exposure to the highly precise and binary world of computers, and as a result we look to make the rest of the world much like this model that we idealize.  One’s or Zero’s, exact probabilities, exact measures of cost… but life outside of the artificial construct of computers is not like that.  It is full of uncertainty and non-binary answers.  Those subtleties are what Risk Management can capture and help us understand in a way that is closer to our binary desires.  But never completely.  What Risk Management does do is give us better accuracy – so we can make more accurate decisions and be less erroneous.

So step away from the perfection.  Give your team a view of the risk that is in terms they understand.  You might just find that giving them a description, a narrative and ranges to draw from much more accurate than anything they’ve used in the past.  But whatever you do – do not aim for precision.  Aim for accuracy, even if that means the guess is even less precise.  Your management wants the accuracy.  Just like their profits, they will never be precise, but only data can make them more accurate.

Now you might still have a question of “so how do I quantify this?” Ah, that’s for next time…

Posted in Information Security Governance, InfoSec Governance, IT Risk Management, Security Governance | 1 Comment

BSides San Francisco Presentation

So I did a little talk at BSides San Francisco 2012.  Its a pre-quel to my book “So You Want to Be the CSO…”  The talk was recorded so you can view it at your leisure.  Just pity the poor guy in the front row who I accused of being “sexy”.

A BrightTALK Channel

Posted in CISO, CSO, Information Security Governance, IT Risk Management, Security Governance | Leave a comment