» Archive for the 'Security metrics' Category

Scooter-nomics meets metrics judo

Tuesday, June 3rd, 2008

There has been a flurry of news stories of late about how high gas prices have people turning to scooters. You’re expected to listen to the “high gas prices drive scooter demand” angle, but the whole thing falls apart if you actually work the numbers. I like to think of this as “Metrics Judo”–turning someone’s own facts and figures back on them to get to the core of the argument.

As my old boss used to advise, “Before you get into any discussion of the issues, always make sure everyone agrees on the facts.”

For example, this morning I heard this story on NPR while driving to work. I was going to ride my scooter, but they’re predicting severe thunderstorms for the afternoon commute, which is no fun at all on two wheels, so I took my wife’s car. She’s not terribly pleased, but that’s life sometimes. If I had a public transit option, I’d take it in a second.

Getting back on topic, this story has a stronger safety component than most stories of its ilk–it actually stresses that scooters are dangerous, for a change, something that I have harped on in the past.

That part I liked.

But what I didn’t like was that the facts don’t actually support the story. First, the shop owner in SanFran said that his sales are up from ~6 scooters/month to his full shipment of 40/month. But he says he’s “thinking of” offering a promotion of “free gas all summer,” which he figures would cost him only $40/scooter. He also says that a scooter along with “all the safety gear” is only $3,000.**

If he’s selling out his inventory every month, why would he spend $1,600/month on a promotion? He’s supply-constrained, so the promo can’t increase revenue. The owner of a successful small business has to know that, or he won’t be a business owner for long. So, ignore everything but the cost per scooter–$40 or less than 10 gallons of gas per-scooter for the summer.

Then, we take that $40 cost over the course of the summer, compare it to the $3k-6k that people are spending for a scooter to save somewhere between $40 and $275* over the course of a “summer” (which I’ll define as May-Sept, or 5 months), depending on how heavily you want to weight the model in the scooter’s favor. That means that the average annual savings is between $100 and $700/year. This obviously doesn’t add up to the cost of the scooter, and it didn’t take anything but paying attention and some arithmetic to know that.

The only way that a scooter makes any sense at all is to do what I did–get rid of the car and buy the scooter instead. I did that and have been money ahead for years. The fact that I live in a highly-congested urban area where a scooter provides a tactical advantage in maneuvering through gridlock traffic and parking when I get to my destination is just gravy.

Now, turn this back to IT and/or Risk Management. How many times have we all been presented with data by some vendor which, with a little analysis, can easily be pulled apart and used to produce either the real value (rather than the one the marketing people want us to focus on) or, getting algebraic for a moment, allowed us to solve for the “real” data which, if we have to jump through these sorts of hoops, probably isn’t going to say what the vendor wants us to hear?

* Assume 60 mpg*10 gallons = 600 miles on a scooter, 30mpg*20 gallons = 600 miles in a car, so net fuel reduction of 10 gallons @ $4/gallon. Or, for a more fully loaded value, use $0.53/mile, the standard deduction for operating a vehicle, and now you’re looking at a cost avoidance of, at most, $278 ($318 - $40 for gas on the scooter, but no operating costs, so it’s still apples-to-oranges).

If gas is at $4/gallon, that’s 10 gallons. Figure that most scooters, despite the 100mpg claims, get about 60 mpg (that’s about what my Stella, a 149cc 2-stroke, 4-speed manual transmission gets–YMMV, of course), that’s 600 miles–not much riding unless you live and only ride around your neighborhood.

** He’s selling cheaper scooters than I ride. My scooter was $2,800, my helmet was about $250 on sale, I have two jackets, which were each about $200, my boots were over a $100 (leather, steel shank & toe, hard vibram soles), gloves another $50. A pair of armored trousers would be another $120 or so, which I’ll probably buy sooner than later now that I’m riding to work on a regular basis. Yes, I’m a bit of a safety nerd–I got of easily on learning that lesson the hard way.

Metrics and oranges

Monday, April 14th, 2008

I’ve been pretty busy lately, which has impacted, among other things, my time and energy for blogging. I don’t know when I expect this to materially improve, so I’m going to fire these thoughts out before I run off to my next meeting even though they may still be a bit partially-formed.

Lately, I’m feeling that there are two fundamental problems with risk and security metrics.

The first, which I’ve written about previously, is that they don’t scale the corporate ladder well. The second, and perhaps more serious from an industry perspective, is that the metrics which do scale the corporate ladder well don’t compare well across industries or even within industries. Thus, there seems to be a paradox here: the more business value a security metric represents, the less either generic or share-able it will be.

For example, take metrics related to policy compliance, one of my KPI’s. I assume that policy (or lack thereof) is an expression of or proxy for a company’s tolerated level of risk. Given that no two companies have the same policies (unless they both cribbed them whole-cloth from SANS), the risk measurement is going to be inherently different between companies. Throw in the fact that most companies won’t be willing to share this data in anything but a tightly-controlled forum, and you’ve got a real problem.

Nevertheless, I’d still be pretty happy if we could get general agreement (or even understanding) across so-called risk managers that, like it or not, policy effectively defines organizational risk acceptance. With that starting point, we might then actually be able to begin doing meaningful comparisons of different policy/control sets (e.g. does CObIT+SoGP produce better compliance (as measured through audit findings & exceptions) than CObIT+ISO-27001? *That* would be an interesting and worthy research project, IMHO), although the vested interest factor could definitely hurt the effort.

And one final piece of the puzzle (and this is probably too much to even dream of, but keystrokes are cheap) would be to then correlate these measures of relative compliance to operational metrics. While correlation is not causation, we still might then be able to begin using compliance as an attribute to describe our accepted level of risk rather than as an end unto itself.

What you measure matters

Wednesday, March 19th, 2008

Don’t assume that traditional measures are good measures. For an example, The Economist looks at GDP growth:

WHICH economy has enjoyed the best economic performance over the past five years: America’s or Japan’s? Most people will pick America. The popular perception is that America’s vibrant economy was sprinting ahead (albeit fuelled by credit and housing bubbles that have now painfully burst), whereas Japan crawled along at a snail’s pace. And it is true that America’s average annual real GDP growth of 2.9% was much faster than Japan’s 2.1%. However, the single best gauge of economic performance is not growth in GDP, but GDP per person, which is a rough guide to average living standards. It tells a completely different story.

(emphasis mine)

For example…

Using growth in GDP per head rather than crude GDP growth reveals a strikingly different picture of other countries’ economic health. For example, Australian politicians often boast that their economy has had one of the fastest growth rates among the major developed nations—an average of 3.3% over the past five years. But Australia has also had one of the biggest increases in population; its GDP per head has grown no faster than Japan’s over this period. Likewise, Spain has been one of the euro area’s star performers in terms of GDP growth, but over the past three years output per person has grown more slowly than in Germany, which like Japan, has a shrinking population.

Some emerging economies also look less impressive when growth is compared on a per-person basis. One of the supposedly booming BRIC countries, Brazil, has seen its GDP per head increase by only 2.3% per year since 2003, barely any faster than Japan’s. Russia, by contrast, enjoyed annual average growth in GDP per head of 7.4% because the population is falling faster than in any other large country (by 0.5% a year). Indians love to boast that their economy’s growth rate has almost caught up with China’s, but its population is also expanding much faster. Over the past five years, the 10.2% average increase in China’s income per head dwarfed India’s 6.8% gain.

So, if you’re a Finance Minister, you’re apparently going to go with the number that makes you look best (total growth) rather than the number that most accurately reflects the economic fortunes of your populace–and even that number is probably not as good as median per-capita growth per-head, especially as a measure of relative change. The Minister knows better (I hope), but presents the less-honest number and knows that the vast majority will never catch him at it.

a problem I may not actually have

Tuesday, March 18th, 2008

I’ve been looking at my anti-virus metrics of late, and I’m thinking that I’ve been asking the wrong questions there. Basically, I’ve got two different sets of anti-virus metrics, coverage rates (% of machines with anti-virus deployed by region) and infection rates (% of machines with infections, again per-region).

But I noticed this morning that, depending on how I’m defining my population, we’re only seeing 1-2% of the identified infections. That is, itself, only 7% of my total system population, or 0.1% (1/10th of 1%) of my total population calling the help desk due to malware problems every month.

So I’ve been failing my own first question for security issue–is this a problem I have?

Amrit rocks the house with some Desktop Security Agent BOTE calc’s

Friday, March 14th, 2008

Amrit asks, “Is the cure costlier than the disease?” regarding desktop security agents. His story starts out familiarly enough:

When I was still an analyst I was part of the mobile workforce, coming into the office maybe once or twice a year. The company owned laptop I was provided ran 4 different security agents, plus several other agents for various systems management functions (asset, configuration, etc) and remote access. Since the majority of the time the company had no ability to manage these mobile systems they would enforce some fairly draconian security policies, such as locking down aspects of the OS, disallowing certain protocols and applications to traverse the network VPN, as well as configuring the various scan-based security technologies to scan the system on a recurring basis (OK so maybe these are all reasonable and I felt they were draconian because I suffer from a Nietzsche “super-employee” complex and believe myself to be above the normal security policies of other employees - coincidentally I stopped using the corporate supplied laptop and switched to a Mac) .

Here is the kicker, my machine suffered from significant performance problems. Not only did it now take a good 5+ minutes to restart, it was unusable during a scan - which meant I was unable to work several hours a week

This is the story of the life of the average “enterprise” worker. In a past life, we were effectively told, “you can’t add any more agents unless you take one of the existing ones away.” Today, I “only” suffer from two or three different security-related agents on my laptop, which is especially ironic given that I do much of my work inside a virtual machine running Ubuntu Linux.

Getting back to Amrit, though, he’s kind enough to provide a great Back-of-the-Envelope (BOTE) analysis of the costs of providing desktop “security” for a theoretical 5,000 person company.

How’s it stack up, and to what? Amrit uses the same data set I used for my DLP and Full Disk Encryption BOTE analysis, the ISF’s Annual Survey, which told him

“The Computer Security Institute conducted a survey of 538 computer security practitioners in corporations, government agencies, financial institutions, medical institutions, and universities in the United States. Their results revealed that 85 percent of respondents had detected computer security breaches within a twelve-month period. The 35 percent who listed a financial impact reported $377,828,700 in financial losses. Of these, many cited their Internet connection as the point of attack for hackers.”

I’m not going to give you the spoiler–you can go read it yourself–other than to say I wholeheartedly agree with his assumptions, his methodology, and his conclusion.

BOTE analysis of DLP vs. full-disk encryption

Wednesday, February 20th, 2008

I did some Back-of-the-Envelope (BOTE) analysis yesterday to explain why I think that Digital Leakage Protection (DLP) is *not* where we need to be spending my company’s money right now. The overall analysis was much larger than this, but I did have a little lightweight numerical analysis which I found quite entertaining:

Using data from the notoriously-inaccurate-but-about-as-good-as-anything-else-out-there 2007 FBI/CSI study, I worked out that:
1) 194 respondents actually responded (divided total loss by average loss per customer)

2) Two categories of identified losses could reasonably be argued to be preventable via DLP (assuming a number of other security management practices were in place):
- “all data losses but mobile devices”
- “Unauthorized access to information”
Totaling $6,727,700 in reported losses

3) Divided by 194 to get $34,678.87 (call it “under $35k”) in average losses per respondent.

Even when, just for grins, I decided to assume that only Large Enterprises (revenues > $1b/year, 36% of respondents) suffered data loss, the average annual loss only jumped to $96,330.18.

Not much of a business justification for a multi-million dollar product (and that’s just the technology–it ignores everything that has to come before and after to actually make it perform) for any enterprise without either a zero-tolerance for loss or extra-large business and/or regulatory risk associated with data leakage.

Today, I decided to see how full-disk encryption of my laptops would stack up against the same analysis. Going back to the 2007 FBI-CSI survey, I came up with three categories of loss which would be addressed by disk encryption and remote wipe tools:
- Laptop or mobile hardware theft
- Theft of proprietary info from mobile device
- theft of confidential data from mobile device
Totalling $8,429,150 in reported losses

This gave me an annual average loss of $43,448.61 or $120,690 if I assumed a 100% weighting to large enterprises.

Also on the upside, the supporting activities required to support an effective rollout of full-disk encryption is a lot shorter. You just have to decide whose laptops get it and in what order, then do the deployments. Since the candidates can be pretty easily identified with nothing more than an org chart and some common sense (either “do all” or some picking & choosing: “HR? Yes. Sales? Yes. Media Relations? Probably not–we wish more people were reading the press releases, etc.) What’s more, it’s only a few thousand devices and once it’s in place, the support and maintenance overhead is fairly minimal.

So when I start to look at my priorities, this becomes pretty much a no-brainer. DLP costs more, reduces risk less (including some specific, high-profile regulatory risks), is much harder to implement, much costlier to support, and at the end of all that, is less likely to actually make a difference in our losses (IMHO).

Definitions

Tuesday, February 19th, 2008

Since IanG has been wondering about it in comments, I thought I’d take a moment to follow up on the theme that Alex Hutton so nicely summarized in one of his comments on my post:

Governance and Compliance are priors for Risk Management, and not the other way around.

So, starting with my favourite data source, Wikipedia, Compliance is defined as:

conforming to a specification, standard or law that has been clearly defined.

Governance, similarly, is defined as:

In the case of a business or of a non-profit organization, governance relates to consistent management, cohesive policies, processes and decision-rights for a given area of responsibility. For example, managing at a corporate level might involve evolving policies on privacy, on internal investment, and on the use of data.

Governance and compliance are two sides of the same coin–compliance is about following the rules, governance is about making sure the rules are clearly, consistently defined and enforced.

I think that one mistake people tend to make is to confuse Risk Analysis, effectively the process of compressing the three axes of risk (impact and likelihood over time) into a single value, with Risk Management, the process of ensuring that risks are identified and kept at some desired level.

How is this accomplished? Well, first and foremost, we must define the level–that’s where the clearly defined policies, processes, etc. come in. Once we define the rules, we (try to) ensure that people are aware of them and following them, then enforce and update them over time.

When people are in compliance, they are implicitly at our accepted level of risk. If they get too far outside of tolerances, then we now have a risk that must be managed. But without knowing what our accepted level risk is, we don’t know which risks can be accepted and which risks must must be managed to that level.

Hence, Alex’s observation.

Measuring risk reduction

Monday, February 18th, 2008

Another thought on KPI #2, “Are we secure enough?”:

Once management agrees that the approach (tracking compliance, gaps, and exceptions, extrapolated for coverage), then we can now effectively calculate the cost-per-gap-closed of a particular mitigation approach.

I’ll use a trivial example to demonstrate what I mean.

Say I have 10 network-level exceptions related to systems on a particular network, say a production line in a factory. I want to mitigate the risk (really, partition the risk, but I’ll argue/assume that the effect on the aggregate network is mitigation). To do so, I need to demonstrate that firewalling off the network is not only effective, but also a cost-effective approach to the problem.

Suppose, also, that I know from my risk assessment efforts that I have 50 exceptions on the network and 50% of my systems have been assessed for risk (based on best estimates from KPI #1, coverage and control. Extrapolating conservatively (reality is that the 50 assessed systems are probably somewhat better-than-average from a compliance perspective), then I assume that I have at least 100 documented or potential exceptions.

Therefore, deploying the firewall to mitigate the risk of 10 of them will reduce risk in excess of tolerance by 10%. This means that I can now provide a cost-per-exception to mitigate of 1/10th the cost of the firewall.

If I have some estimate of the impact of the risk (lifted, say, from the BIA for the systems/applications), then I can determine if the firewall is a cost-effective approach to protect those systems, or if I need to come up with something cheaper. This also allows me to prioritize my risk reduction efforts to maximize efficiency, and also explain to others why I’ve ranked them in the order I have.

I’ve also managed to turn my risk assessment into dollars, and the dollar amounts all come from the people I’m managing risk for–no accusations by the “customer” of FUD’ing up my numbers, either.

So, no math that’s more complex than four-function arithmetic. It’s simple enough both to maintain over time and to explain to any half-way competent business or IT leader. What’s not to love? (I’m sure you’ll let me know in comments)

KPI #2: How secure are we?

Thursday, February 14th, 2008

Ultimately, all security metrics are an attempt to answer the question of “how secure are we?”

But rather than killing myself trying to answer that question–because, as the fact that KPI #1 is relevant (implying that my coverage & control is less-than-adequate), I’ll settle for targeting metrics that allow me to answer the question, “Are we secure enough?”

How much is “enough?” It depends. In the case of a corporation, “enough” is defined by its security policies. So I consider how we measure compliance to policies:

- Audit findings, which document non-compliance to policy, process, or standards (process and configuration compliance)
- Risk Assessments, which are effectively operational audits of the architectural decisions and work quality for an application (ensure IT teams are making design decisions which match the accepted level of risk)
- Documented Exceptions, which are where the application owners formally state that they’re ignoring policy (portion of the environment that’s knowingly non-compliant)
- Proportion of the IT environment participating in the policy regimen (coverage & control for extrapolation of above)

Now this does make one huge assumption–that our policies, procedures, standards and guidelines are, in fact, an accurate representation of the company’s risk tolerance. Assuring that to be the case is part of my job, which I do by measuring, analyzing, and doing any number of other activities.

From a KPI perspective, though, this is OK. It’s like the way that most drivers only care about the reading on their speedometer and maybe the gas gauge. They leave worrying about the rest of the things that go into making the car run to the engine’s computer and focus on getting the thing from A to B. They let my team be the engine’s computer, and just like with cars, it generally runs better that way.

And when it breaks, I’m also the mechanic they take it to to complain that it’s leaking oil, too slow off the line, or the air conditioning isn’t cold enough.

Getting back to what I can do with my KPI, though, if we want to go a little bit further, we can (and are going to) look at information such as how many Gaps identified through the risk assessment process are eventually closed versus convert into Exceptions as the application goes live. Initial Gaps are a good indicator of how {risk|security}-aware the system owners and implementers are. Go-live gaps (exceptions) are a good indicator of how serious they are about hitting the accepted level of risk as documented through policy.

Looking at the Gap-to-Exception flow through the projects’ System Development Lifecycle (SDLC), we now also have a first stab at a leading indicator of risk.

When I combine this with the baseline level of control over the total environment, as measured by overall IT hygiene (primarily participation in centralized management (e.g. Active Directory), patch compliance, and basic secure configuration), and now we actually have a feel for “how secure am I?” which can be adjusted over time by setting the SLA’s for the various operations groups. This, in turn, means that we now have a metric that both resonates with Senior Leadership as well as can be translated into specific goals for the people doing the actual work–it passes both sides of the so what test.

Additional metrics can be built either by tuning the slicing & dicing of the information (sub-report on “critical” applications, manufacturing, R&D, SoX, systems with some regulatory requirement, etc.) which we can then use to document actions which address those risks at a more macro level.

For example, I feel that we need to do some internal firewalling. No one disagrees, but thus far we have not been able to effectively document the expected benefit, which is important both to justify the effort and also to demonstrate how we expect it to improve things. Now, I expect to point to reduction of risk by addition of this compensating control, which will both protect both the rest of the environment from manufacturing, as well as the manufacturing environments from the rest of the environment.

This is a Good Thing since every manufacturing enterprise I’ve ever worked in has been (internally) famous for running outdated applications and platforms (NT4, anyone?), being unwilling to patch (”We know you have a patching maintenance window, but we use your patching window for other things!”), and being supremely high-impact if there is an outage–which they can measure to the minute (”If this factory goes down, we lose $4 million an hour!” Nice metric, btw–something that Andrew Jaquith also points out in his book). Similar rules and tales apply to the Mad Scientists in the R&D labs, too.

Eventually, if we agree that the model is valid, we might look at applying it to other environments, as well, based on agreement that risk is excessive and needs to be partitioned–we need only partition the measurement and agree accordingly.

It’s not the holy grail, but I’ll argue that it’s good enough for Senior Leadership to decide by, and in the KPI game, that’s all that really matters.

KPI #1: Knowing what we don’t know

Friday, February 8th, 2008

As G.I. Joe taught us back many moons ago, Knowing is half the battle.

To test this theory, I’ve been testing potential KPI’s by mentioning the issues or concerns that the potential KPI’s represent with relevant parties in various conversations. Hands-down, the one that has gotten the most interest is how little we actually know about our risk landscape.

From this exercise, it has become obvious to me that my first Security Metrics KPI must be related to Coverage and Control:

percentage of internal hosts which are centrally managed and protected.

No matter what else I might try to tell people about our risk profile, I look like either chicken little or a buffoon if I don’t know how much of the total enterprise I’m actually speaking knowledgeably about.

And while this is really more of a table which rolls up to the KPI, and also while we can debate what exactly is required to be “centrally managed and controlled,” we cannot manage what we cannot control, and as such anything which is outside the framework (even if it meets compliance without our help) doesn’t matter in this case.

To know the percentage, I need to know the total number of active nodes on the internal network. From there, I can begin to provide detail around what type or level of control I have over those hosts. Things like:
- Number of windows hosts that are members of Active Directory
- Number of windows hosts with centrally-managed anti-virus
- Number of Linux/Unix hosts which are managed by IT
- Number of hosts which are patched by IT (and in keeping with our patching SLA’s–but that’s another metric for another day)

We have a fair amount of AS400 out there, too, but from a host count perspective, it’s small and it’s all centrally-managed. How well-managed is another deal entirely. There just isn’t anyone who says, “We can just order an iSeries and turn it in on the corporate card as a team dinner.”

But once I have this, I can provide not only my KPI, but also a measurable definition of what comprises it and from that, provide the operational roadmap of what must be done in order to achieve the necessary level of control for our network, given the stated risk tolerance for host security.

I would like to be able to do something similar for “applications,” but that creates a couple of problems which we can’t actually solve right now. First, IT can’t provide me with the inventory data that I would need to provide an accurate assessment. Second, finding “applications” is much more difficult than finding hosts on the network and determining simple characteristics like operating system and domain membership