By Adam Dodge
Lately, there has been a flurry of activity in the land of security breach reports with organizations such as Debix, Verizon, the Identity Theft Resource Center and the Department of Justice all releasing reports looking at security breaches, breach notification laws and the state of information security in general. As someone who has been in the world of tracking and monitoring breaches for two years now through Educational Security Incidents, I am excited over the increased attention and information that is coming forth and the lessons that can be learned from these breaches. However, it is important to remember that are inherent limitations on the applicability of breach statistics and therefore we all must be cautious about reading too deeply and arriving at conclusions that the information in these reports do not support.
Before we go any further, yes I do develop a similar report each year and yes my report is subject to the same limitations as all of these other reports. My point here is not that all other reports are wrong while the ESI YiR is the shining beacon of truth. The point is that the information delivered in these reports is simply that, information. It is up to the reader to interpret this information in a meaningful way. The problem, then, stems from misinterpretation and this
What do I mean by “misinterpretation”? Well a common problem with the statistics provided in these reports (remember, I’m including my own report as well) is that the numbers are based the sample set and the ability to apply these numbers depends a great deal upon the size of the sample and how randomly the sample was chosen from the total population. Alright, that might not be a good enough answer so allow me to explain further.
The Verizon report has made a big splash in the security world and for good reason. Verizon did an amazing job with this report. If you haven’t read it, go do so now. Seriously, stop reading this and go read the report. It is that good.
However, the report is based around 500 forensic investigations performed by Verzion’s Business RISK team between 2004 and 2007. These 500+ breaches that Verizon has analyzed for this report were not randomly chosen from all breaches that occurred. Instead, the information was mined from the investigations stemming from breaches that were serious enough for a company to reach out and contract with Verizon for assistance. This is a potential point of bias for this survey.
Most companies are not going spend money on investigations for small breaches or those that are easily explainable. Therefore, it is very likely that breaches of data such as information left in public, information accidently placed on a public web site, etc. are underrepresented in the sample Verizon used. It is also likely that smaller companies and non-profit organizations are underrepresented as well since these entities lack the funding that larger, for-profit organizations have at their disposal.
What does this sample bias mean for the validity of the Verizon report? Nothing. Nothing at all. There is no problem with the sample bias of the Verizon report. The simple fact is that all of security breach reports (again, including the ESI YiR) suffer from the same problem. Unfortunately, there is no go way around this problem yet. Everyone that I talk to involved with tracking breaches has the same complaint: There is no centralized reporting of breaches in the United States and those states that do require breach reporting to a central authority have different reporting requirements, litmus tests and public access to breach information.
So I am suggesting that everyone stop reading these reports? Absolutely not. It is not just self-preservation that makes me say this, however much I enjoy my work with ESI. These reports are an excellent way for information security practitioners to track the movement of threats and discover what types of security threats similar organizations are facing. The point of all of these is that each and every one of us (including the media) need to make sure that we are interpreting the data of these reports properly before we remove our firewall because the 2007 ESI YiR said that employee mistakes outnumber hackers as the cause of a breach 2:1 or before we discontinue our security awareness and training programs because the Verizon reports says that 73% of all breaches came from external sources.
How can these reports be so different and yet both be correct? Simple, look to the samples used to compile them.