Hash: collision attack (birthday problem)

When Is It Likely That Two Inputs Map to the Same Output?

Problem: Assume a hash function H that pretty much randomly maps an integer input to an integer output. Suppose the number of output values for H is k. Pick n input integers randomly. How large should n be so that the probability that at least one pair of input integers map to the same output is 1/2?

Setup:

Outline of Solution: By definition of conditional expectation write

E(X)=Pr(X=0)*E(X|X=0) + Pr(X>0)*E(X|X>0)

=Pr(X>0)*E(X|X > 0)

and therefore, after rearranging,

Pr(X>0) = E(X)/E(X|X > 0)

We can solve for E(X) exactly. Because the sum of expectations is the expectation of the sum (well known fact), we can write

E(X)=E(X1,2)+E(X1,3)+...+E(Xn-1,n) .

But

E(Xi,j)=0*Pr(Xi,j=0)+1*Pr(Xi,j=1) .

The first of these two terms is always 0 and the second is the probability that the ith and jth input values map to the same output value. This is k/k2=1/k. Since this is the same for any i and j, E(X)=n(n-1)/(2k).

It is convenient to bound the term E(X|X>0) rather than compute it exactly. When conditioning on X>0 there must be at least one pair of inputs that map to the same output. Without loss of generality suppose inputs n-1 and n map to the same output. Let P denote the event that inputs n-1 and n map to the same number. Then we can write

Pr(X>0) = E(X)/E(X|P).

Consider all other inputs, namely those numbered 1 to n-2. There are (n-2)(n-3)/2 pairs of these numbers. Whether or not one of these pairs maps to the same number is independent of the mapping of inputs n-1 and n. Hence,

E(X1,2|P) + E(X1,3|P) + ... + E(Xn-3,n-2|P) = (n-2)(n-3)/(2k).

We also know

E(Xn-1,n|P) = 1.

That leaves 2*(n-2) pairs unaccounted for, each pair, say i and j, containing exactly one input, say j, that either maps to the target of n-1 or n as the case may be. Expectations of all such Xi,j are the probability that input i maps to the same number as both input j and n-1 or n, as the case may be. This is 1/(k-2). Hence,

Pr(X > 0) = (n(n-1)/(2k))  /  (1+(n-2)(n-3)/(2k)+2*(n-2)/(k-2)).

This is 1/2 approximately (find upper and lower bounds for the denominator - these differ by a term vanishing in n) when n(n-1)/(2k) = 1, that is, roughly when n2 = 2*k. This shows how large n should be so that the probability that at least one pair of input integers map to the same output (that is, Pr(X > 0)) is 1/2.

Hash: preimage attack

When Is It Likely That The Input for a Given Output is Found?

Brute Force Attack: The number of output values, k. So, a brute force preimage attack on SHA-1 with a 160 bit hash should take 2160 hashes.