Risk analysis. It sounds obscure, but we do it all the time. Decide to cross the street, and you're performing risk analysis. It also turns out that we're pretty bad at analyzing risk. This explains, in part, how so many people get hurt crossing the road. A lot of research has gone into figuring out why we do such a poor job, and it turns out there are lots of reasons. Sometimes you don't have all the information. Sometimes you have the information, and ignore it. And many times you don't understand the potential consequences of different courses of action. The bottom line is that risk analysis and risk management often go awry.
In the world of enterprise networks, we are under a barrage of information about risk. Every day, we hear news about some security problems lurking on our networks, and we're urged to fix them immediately. When we deal with that information, we're performing risk analysis. When we run out and install every security patch we read about, we're performing poor risk analysis.
Every major software vendor maintains a "security alert" mailing list, and if you're a conscientious network manager, you subscribe to the ones most relevant to you. But dozens of other mailing lists and web sites are also dedicated to making security patch information available in a timely way. Network World runs one, edited by Jason Meserve. Last weekend, I learned about a business, Threat Focus, selling customized security alerts.
One factor which contributes to poor risk analysis is having too much awareness of a problem. Get hypersensitized about an issue, like security threats, and you're bound to react in a way disproportionate and uncalled for by the real facts of the situation. We're not just inundated with security information: we're overwhelmed by it. This sets us up to make poor decisions.
The reality of today's software development life cycle is that full production releases don't come out bug-free. What does this mean for quickly made, poorly tested, security patches? They're as likely to have bugs, if not more so. Microsoft, because it releases so many patches, has hit the news with reports of updates that made things worse, but they're not alone. A few weeks ago, Apple introduced 10.2.4, a bug-and-security patch to their OS X operating system. People who installed it suddenly discovered problems with their power management and PPP stacks. Anyone can make these errors.
The complexity of systems, the difficulty of doing good quality assurance, and the rush to push products out as quickly as possible has put us all on an upgrade-and-patch treadmill. Experienced network managers know, however, that patching a working system is often worse than leaving it alone. The old saying, "if it ain't broke, don't fix it," has become firmly ingrained.
Why, then, do we throw normal caution and good business sense out the window when it comes to security patches? Our normal strategies of testing, containment, and problem avoidance disappear and are replaced by prevention and anticipatory self-defense. A company I work with rushed last week to react to the most recent sendmail security patch and ended up trashing their email system - this for a bug which had, as its worse effect, the potential to crash the mail handling process and require a restart.
The bug hunters are also partially to blame. The notoriety of discovering a problem means that there is an incentive to blow the threat out of proportion. Any unchecked buffer copy is automatically described as a way for "a third party to potentially gain control of the system," even if the likelihood of this being true is infinitesimal. All security, all encryption, all authentication, is based on probabilities, and one factor contributing to poor risk analysis is failing to pay attention to the probability of a risk actually becoming a problem.
A recent paper from security researchers at Stanford showed how it is possible in some implementations of OpenSSL to recover the private key from the outside. It's innovative and interesting research, and it will help to make cryptographic software better. But it also requires a system with a gigaHertz precision clock to be sitting less than a millisecond away from the server being attacked. The attack is, in fact, impractical and impossible over the Internet. But this didn't keep system managers all over the Internet from updating their OpenSSL code. One of those companies is one of our clients, and we spent several days trying to figure out why an OpenSSL-based application they were building wouldn't work. The answer turned out to be a change in the behavior of OpenSSL which they hadn't noticed when an eager system manager upgraded the library during development.
I'm not saying that patching systems is a bad idea. What I want is for my fellow network managers to step back a second and do a real risk analysis on these perceived threats. Is the cure, in fact, worse than the disease?