The Chilly Hit Downside | Math ∞ Weblog

The earlier article Are Fingerprints Distinctive? mentioned the case of Brandon Mayfield, a Muslim American lawyer from the Portland, Oregon space who was wrongly recognized as one of many Madrid prepare bombers in 2004 by the FBI primarily based on an inaccurate fingerprint identification.

The Mayfield case might be essentially the most well-known case of an incorrect fingerprint identification. The Mayfield case is an instance of a “chilly hit” through which an enormous biometric database was looked for a doable match to an unknown fingerprint taken from a criminal offense scene. In contrast to suspects with believable hyperlinks to the crime, there was nothing particular to attach Mayfield to the crime apart from the database search match.

There are delicate and severe mathematical and statistical issues with chilly hits, which happen with each DNA profiling and fingerprint identification. This text explores intimately the arithmetic and statistics of the chilly hit downside.

The chilly hit downside is intently associated to a widely known downside in likelihood and statistics often called the birthday downside. Think about a room full of individuals: Bob, Frank, Mary, Estelle, and others. Every individual has a birthday: Could 1, December 13, March 11, July 17, and so forth.

Not understanding the birthdays of the individuals within the room, what’s the likelihood that no less than two individuals within the room have the identical birthday? How many individuals have to be within the room for there to be a good (50/50) likelihood that no less than two individuals within the room have the identical birthday?

A naive and incorrect reply could be to cause as follows. There are three-hundred and sixty-five (365) days within the yr. The likelihood that two individuals have the identical birthday is 1/365. Subsequently, the likelihood that no less than one pair of individuals in a room with N individuals have the identical birthday is about N/365. Thus the room wants about 183 individuals for a good likelihood of a match. The precise reply is twenty-three (23) individuals, a lot smaller than 183!

Allow us to contemplate the issue intimately. First, what’s the likelihood that Bob and Frank have the identical birthday? There’s a 1/365 likelihood that Bob was born on January 1. There’s a 1/365 likelihood that Frank was born on January 1. Thus, there’s a 1/(365*365) likelihood that each Bob and Frank had been born on January 1. There are, nonetheless, 300 and sixty-five days within the yr, so the likelihood that Bob and Frank had been born on the identical day is 365/(365*365) or 1/365.

We have to discover the likelihood that no less than one pair of individuals within the room (Bob and Frank, Bob and Mary, Bob and Estelle, Frank and Mary, Frank and Estelle, Mary and Estelle, and all different doable distinct pairs) have the identical birthday. If there are N individuals within the room, there will likely be [tex]N(N-1)/2[/tex] distinct doable pairs of individuals. Every pair could have a likelihood of 1/365 of getting the identical birthday.

The likelihood that no less than one pair of individuals have the identical birthday is:


P = 1.0 - (Chance that the Pair Does Not Have the Similar Birthday)^(Variety of Distinct Pairs of Individuals)

which is


P = 1.0 - (Variety of Distinct Pairs of Individuals)(Probility that the Pair Does Not Have the Similar Birthday)

or 

P = 1.0 - (1.0 - 1/365)^(N(N-1)/2)

It seems that P is 0.50048, nearly precisely even, for N = 23. The variety of distinct pairs of individuals within the room is proportional to the sq. of the variety of individuals within the room [tex](N(N-1)/2) [/tex], not the variety of individuals within the room (N). Therefore, it takes far fewer individuals within the room than one would naively anticipate for there to be a good likelihood that no less than two individuals within the room have the identical birthday.

Probability At Least Two People in Room Have Same Birthday

Chance At Least Two Individuals in Room Have Similar Birthday

The plot of the likelihood of no less than two individuals in a room having the identical birthday was generated utilizing the 2 Octave scripts beneath: birthday.m and plot_bday.m.

Octave is a free open-source numerical programming atmosphere that’s largely suitable with MATLAB.

birthday.m



perform [p] = birthday(n, m, bTrace)
% p = birthday(n [, m, bTrace])
% likelihood that no less than one pair of members of set of N have identical birthday (M days in yr)
% n  variety of individuals 
% m  variety of "days" in yr (default worth = 365)
% bTrace flag to hint operation of perform (default worth = false)
%
% (C) 2011 John F. McGowan
% E-Mail: [email protected]
% 

if nargin 

plot_bday.m



% plot likelihood of no less than two individuals having the identical birthday
% in a room stuffed with N individuals
%
% (C) 2011 John F. McGowan, Ph.D.
% E-Mail: [email protected]
%

p = zeros(1,100);

for i=1:100
	if mod(i, 10) == 0
		printf("processing %d individuals within the roomn", i);
	finish
	p(i) = birthday(i);
finish

printf("displaying graph");
fflush(stdout);

determine(1);
plot(p);
title('Chance At Least Two Individuals Have Similar Birthday');
ylabel('P');
xlabel('Variety of Individuals in Room');

printf("writing plot to file prob_bday.jpg");
fflush(stdout);

print('prob_bday.jpg');




What does the birthday downside need to do with fingerprint identification, DNA profiling, or different types of biometric identification? Change the individuals within the room with fingerprints or different biometric identifiers (DNA profiles, iris pictures, faces,...) in a database.

Change the three-hundred and sixty-five distinct birthdays with 1000's, thousands and thousands or extra distinct biometric identification codes derived from the fingerprint, DNA profile, iris, or different type of identification. The pairs of individuals with the identical birthday develop into pairs of individuals with the identical fingerprint or different biometric identifier: the precise felony who commits a criminal offense and no less than one different harmless individual.

What occurs if a fingerprint database has 100 million individuals and the prospect of two individuals having the identical fingerprint (we're referring to the identical partial prints comparable to a thumb print lifted from a criminal offense scene) is just one in a trillion ([tex]10^{12}[/tex]).

Astonishingly, the likelihood of no less than two individuals within the database having the identical fingerprint is sort of one (1.0). It is because there are (100,000,000)(99,999,999)/2 doable pairs of individuals within the database — about 5 quadrillion (1,000 trillion) doable pairs. Although the likelihood of any two individuals having the identical fingerprint is extraordinarily low, no less than one misidentification occurring someplace within the system is sort of sure (likelihood 1.0).

The FBI fingerprint database comprises about 200 million individuals, collected for the reason that Nineteen Twenties, and the likelihood of two individuals having an identical or indistinguishable partial fingerprints (and even all ten fingerprints) is unknown.

DNA profiles are at the moment claimed to have a likelihood of two individuals having the identical profile of about one in ten trillion. With chilly hits, with a search of a giant database of DNA profiles comparable to are at the moment being collected, it's truly possible that there will likely be incorrect matches someplace within the system.

Brandon Mayfield most likely fell sufferer, partially, to the counter-intuitive statistics of the birthday downside. As the dimensions of biometric databases collected by governments, regulation enforcement businesses, intelligence businesses, and personal firms grows, the chilly hit downside will develop — because the sq. of the variety of entries within the databases.

If everybody, all the practically seven billion individuals on Earth, was within the databases, one may produce a listing of all doable suspects primarily based on fingerprint or different biometric identification alone. This might simply be tons of or 1000's or much more individuals.

How does one deal with doable suspects who lack an sufficient alibi and will have flown to a criminal offense? What number of of these doable suspects could have some tenuous seven levels of separation connection to the crime? Brandon Mayfield was a Muslim American who had represented an alleged Islamic terrorist in a toddler custody case: a tenuous however doable connection to the terrorists accountable for the Madrid prepare bombings. That is the crux of the chilly hit downside.

© 2011 John F. McGowan

In regards to the Writer

John F. McGowan, Ph.D. solves issues utilizing arithmetic and mathematical software program, together with creating video compression and speech recognition applied sciences. He has intensive expertise creating software program in C, C++, Visible Primary, Mathematica, MATLAB, and lots of different programming languages. He's most likely finest identified for his AVI Overview, an Web FAQ (Regularly Requested Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has labored as a contractor at NASA Ames Analysis Heart concerned within the analysis and improvement of picture and video processing algorithms and know-how. He has revealed articles on the origin and evolution of life, the exploration of Mars (anticipating the invention of methane on Mars), and low-cost entry to area. He has a Ph.D. in physics from the College of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Expertise (Caltech). He could be reached at [email protected].