Monday, April 21, 2003

Counterpane: Crypto-Gram: April 15, 2003
National Crime Information Center (NCIC) Database Accuracy

Last month the U.S. Justice Department administratively discharged the FBI of its statutory duty to ensure the accuracy and completeness of the National Crime Information Center (NCIC) database. This database is enormous. It contains over 39 million criminal records. It contains information on wanted persons, missing persons, and gang members, as well as information about stolen cars, boats, and other information. Over 80,000 law enforcement agencies have access to this database. On average, there are 2.8 million transactions processed each day.

The Privacy Act of 1974 requires the FBI to make reasonable efforts to ensure the accuracy and completeness of the records in this database. Last month, the Justice Department exempted the system from the law's accuracy requirements.

This isn't just bad social practice, it's bad security. A database with more errors is much less useful than a database with fewer errors, and an error-filled security database is much more likely to target innocents than it is to let the guilty go free.

To see this, let's walk through an example. Assume a simple database -- name and a single code indicating "innocent" or "guilty." When a policeman encounters someone, he looks that person up in the database, and then arrests him if the database says "guilty."

Example 1: Assume the database is 100% accurate. If that is the case, there won't be any false arrests because of bad data. It works perfectly.

Example 2: Assume a 0.0001% error rate: one error in a million. (An error is defined as a person having an "innocent" code when he is guilty, or a "guilty" code when he is innocent.) Furthermore, assume that one in 10,000 people are guilty. In this case, for every 100 guilty people the database correctly identifies it will mistakenly identify one innocent person as guilty (because of an error). And the number of guilty people erroneously listed as innocent is tiny: one in a million.

Example 3: Assume a 1% error rate -- one in a hundred -- and the same one in 10,000 ratio of guilty people. The results are very different. For every 100 guilty people the database correctly identifies, it will mistakenly identify 10,000 innocent people as guilty. The number of guilty people erroneously listed as innocent is larger, but still very small: one in 100.

The differences between examples 2 and 3 are striking. In example 2, one person is erroneously arrested for every 100 people correctly arrested. In example 3, one person is correctly arrested for every 100 people erroneously arrested. The increase in error rate makes the database all but useless as a system for figuring out how to arrest. And this is despite the fact that, in both cases, almost no guilty people get away because of a database error.

The reason for this phenomenon is that the number of guilty people is a very small percentage of the population. If one in ten people were guilty, then a 0.0001% error rate would mistakenly arrest one innocent for every 100,000 guilty, and a 1% error rate would arrest approximately one innocent for every guilty. And if the number of guilty people is even less than one in ten thousand, then the problem of arresting innocents magnifies even more as the database has more errors.

Now this is a simple example, and the NCIC database has far more complex data and tries to make more complex correlations. And I am assuming that the error rate for false positives are the same as the error rate for false negatives, and there aren't any data dependencies that complicate the analysis. But even with these complications, the problems are still the same. Because there are so few terrorists (for example) amongst the general population, a error-filled database is far more likely to identify innocent people as terrorists than it is to catch actual terrorists.

This kind of thing is already happening. There are 13 million people on the FBI's terrorist watch list. That's ridiculous, it's simply inconceivable that a number of people equal to 4.5% of the population of the United States are terrorists. There are far more innocents on that list than there are guilty people not on that list. And these innocents are regularly harassed by police trying to do their job. And in any case, any watch list with 13 million people is basically useless. How many resources can anyone afford to spend watching about one-twentieth of the population, anyway?

That 13-million-person list feels a whole like CYA on the part of the FBI. Adding someone to the list probably has no cost and, in fact, may be one criterion for how your performance is evaluated at the FBI. Removing someone from the list probably takes considerable courage, since someone is going to have to take the fall when "the warnings were ignored" and "they failed to connect the dots." Best to leave that risky stuff to other people, and to keep innocent people on the list forever.…
http://www.counterpane.com/crypto-gram-0304.html#7

No comments:

Post a Comment

con·cept