20-CS-5153 | Network Security | Spring 2017 |
---|---|---|
Some Odd Ciphers |
Hash: collision attack (birthday problem)
When Is It Likely That Two Inputs Map to the Same Output?
Problem: Assume a hash function
H that pretty much randomly maps an integer input to an integer
output. Suppose the number of output values for H is k.
Pick n input integers randomly. How large should n be
so that the probability that at least one pair of input integers map to
the same output is 1/2?
Setup:
Outline of Solution: By definition of conditional expectation write
and therefore, after rearranging,
We can solve for E(X) exactly. Because the sum of expectations is the expectation of the sum (well known fact), we can write
But
The first of these two terms is always 0 and the second is the probability that the ith and jth input values map to the same output value. This is k/k^{2}=1/k. Since this is the same for any i and j, E(X)=n(n-1)/(2k). It is convenient to bound the term E(X|X>0) rather than compute it exactly. When conditioning on X>0 there must be at least one pair of inputs that map to the same output. Without loss of generality suppose inputs n-1 and n map to the same output. Let P denote the event that inputs n-1 and n map to the same number. Then we can write
Consider all other inputs, namely those numbered 1 to n-2. There are (n-2)(n-3)/2 pairs of these numbers. Whether or not one of these pairs maps to the same number is independent of the mapping of inputs n-1 and n. Hence,
We also know E(X_{n-1,n}|P) = 1. That leaves 2*(n-2) pairs unaccounted for, each pair, say i and j, containing exactly one input, say j, that either maps to the target of n-1 or n as the case may be. Expectations of all such X_{i,j} are the probability that input i maps to the same number as both input j and n-1 or n, as the case may be. This is 1/(k-2). Hence,
This is 1/2 approximately (find upper and lower bounds for the denominator - these differ by a term vanishing in n) when n(n-1)/(2k) = 1, that is, roughly when n^{2} = 2*k. This shows how large n should be so that the probability that at least one pair of input integers map to the same output (that is, Pr(X > 0)) is 1/2.
When Is It Likely That The Input for a Given Output is Found? Brute Force Attack: The number of output values, k. So, a brute force preimage attack on SHA-1 with a 160 bit hash should take 2^{160} hashes. |