Ultimately, all security metrics are an attempt to answer the question of “how secure are we?”
But rather than killing myself trying to answer that question–because, as the fact that KPI #1 is relevant (implying that my coverage & control is less-than-adequate), I’ll settle for targeting metrics that allow me to answer the question, “Are we secure enough?”
How much is “enough?” It depends. In the case of a corporation, “enough” is defined by its security policies. So I consider how we measure compliance to policies:
- Audit findings, which document non-compliance to policy, process, or standards (process and configuration compliance)
- Risk Assessments, which are effectively operational audits of the architectural decisions and work quality for an application (ensure IT teams are making design decisions which match the accepted level of risk)
- Documented Exceptions, which are where the application owners formally state that they’re ignoring policy (portion of the environment that’s knowingly non-compliant)
- Proportion of the IT environment participating in the policy regimen (coverage & control for extrapolation of above)
Now this does make one huge assumption–that our policies, procedures, standards and guidelines are, in fact, an accurate representation of the company’s risk tolerance. Assuring that to be the case is part of my job, which I do by measuring, analyzing, and doing any number of other activities.
From a KPI perspective, though, this is OK. It’s like the way that most drivers only care about the reading on their speedometer and maybe the gas gauge. They leave worrying about the rest of the things that go into making the car run to the engine’s computer and focus on getting the thing from A to B. They let my team be the engine’s computer, and just like with cars, it generally runs better that way.
And when it breaks, I’m also the mechanic they take it to to complain that it’s leaking oil, too slow off the line, or the air conditioning isn’t cold enough.
Getting back to what I can do with my KPI, though, if we want to go a little bit further, we can (and are going to) look at information such as how many Gaps identified through the risk assessment process are eventually closed versus convert into Exceptions as the application goes live. Initial Gaps are a good indicator of how {risk|security}-aware the system owners and implementers are. Go-live gaps (exceptions) are a good indicator of how serious they are about hitting the accepted level of risk as documented through policy.
Looking at the Gap-to-Exception flow through the projects’ System Development Lifecycle (SDLC), we now also have a first stab at a leading indicator of risk.
When I combine this with the baseline level of control over the total environment, as measured by overall IT hygiene (primarily participation in centralized management (e.g. Active Directory), patch compliance, and basic secure configuration), and now we actually have a feel for “how secure am I?” which can be adjusted over time by setting the SLA’s for the various operations groups. This, in turn, means that we now have a metric that both resonates with Senior Leadership as well as can be translated into specific goals for the people doing the actual work–it passes both sides of the so what test.
Additional metrics can be built either by tuning the slicing & dicing of the information (sub-report on “critical” applications, manufacturing, R&D, SoX, systems with some regulatory requirement, etc.) which we can then use to document actions which address those risks at a more macro level.
For example, I feel that we need to do some internal firewalling. No one disagrees, but thus far we have not been able to effectively document the expected benefit, which is important both to justify the effort and also to demonstrate how we expect it to improve things. Now, I expect to point to reduction of risk by addition of this compensating control, which will both protect both the rest of the environment from manufacturing, as well as the manufacturing environments from the rest of the environment.
This is a Good Thing since every manufacturing enterprise I’ve ever worked in has been (internally) famous for running outdated applications and platforms (NT4, anyone?), being unwilling to patch (”We know you have a patching maintenance window, but we use your patching window for other things!”), and being supremely high-impact if there is an outage–which they can measure to the minute (”If this factory goes down, we lose $4 million an hour!” Nice metric, btw–something that Andrew Jaquith also points out in his book). Similar rules and tales apply to the Mad Scientists in the R&D labs, too.
Eventually, if we agree that the model is valid, we might look at applying it to other environments, as well, based on agreement that risk is excessive and needs to be partitioned–we need only partition the measurement and agree accordingly.
It’s not the holy grail, but I’ll argue that it’s good enough for Senior Leadership to decide by, and in the KPI game, that’s all that really matters.
I think this is leading you to the same conclusion I have (which is a seemingly unique approach when compared to the rest of the world)
Governance and Compliance are priors for Risk Management, and not the other way around.
Chandler Howell Says:
Alex,
Unfortunately, I have to agree with you. I do, however, like your turn of phrase.
One of the questions that I’m now pondering in my hours and hours of spare time is exactly *why* so many people seem to find this approach counter-intuitive.
Iang Says:
Secure: unfortunately in the English language it is a binary, it is either yes or no, whereas in our field it is more a catchall phrase for a whole bunch of things.
Which leads to (a) grave difficulties in deciding where the binary bar is, and (b) semantic difficulties in usage when we mean something soft or wide instead of hard and narrow, and (c) consequent noise creeping into our attempt to actually do anything in this area. Possibly it would help to have a definition upfront?
And I don’t think it’s so bad to invent new terms. (Click to see one attempt.)
Iang Says:
OK, I don’t see it! What do you mean by “Governance and Compliance are priors for Risk Management” ??
What I see right now is a model of dividing the topic into two areas, being policy and operations. The first takes as input a chosen definition of risk/security and any other external forces (regulations, “best” practices) and creates a policy. The latter takes the policy as the input (baseline?) and measures how close we are to it.
For the latter “operations security/risk” view, I agree with the “priors” claim. But not for the former. If you don’t have a way to adjust the “g&c” then you will die … it will get out of whack as the threats move around.
Is that close?
Chandler Howell Says:
Ian,
First, you and I both know there’s no such thing as “secure,” only “secure enough.”
Once we remember that we use the first as shorthand for the second, I think a lot of your syntactic concerns should go away.
As for the question of getting out-of-whack, I also agree–but part of governance is ensuring that the rules (laws, policies, etc) evolve to match the changing threat landscape.
I’ve got more to say on this particular topic, but it merits a post of its own, probably tomorrow.
Iang Says:
OK, so using your terms, I now think what you are saying is this:
Risk Analysis ==> Compliance + Governance ==> Risk Management
Is that it?
Alex Says: