Advanced Algorithm - Hash Table - Lec2

Perfect Hashing

Hashing 的一個 criteria 是希望儘可能不要發生 collision (也就是 Worst Case 也在 $\mathcal{O}(1)$ lookup),今天假設我們已知元素的個數,該如何設計一個 hash function 使得不會有任何 collition 發生?

Def.

A **Perfect Hash Function** is a hash function w/o collision

想法的共通性是利用空間換取時間。

Naive method - $\mathcal{O}(N^2)$ space method

(w/ $\mathcal{O}(1)$time)

Let the number of slots be $N^2$, then

任意從 2-Universal Hash Family $\mathcal{H}$ 中抽一個 $h$ 出來 $$ \Pr_{h \in \mathcal{H}}[\,\text{collision happened}\,] = \binom{N}{2} \cdot \frac{1}{N^2} \leq \frac{1}{2} $$

抽取次數的期望值為 $2$ (成功率的倒數)

$\mathcal{O}(N)$ space method

(w/ $\mathcal{O}(N)$ time)

  1. $\mathcal{H}$ 中抽出 $h$$N$ 個 element hash 到 $N$ 個 slot 當中 (Note: 這裡會發生 collision) \
    Denote $m_i$ be the number of elements with hash value $= i$

  2. Redo Step1. until $\sum m_i^2 \leq 4 N$

  3. For each slot $i$ with number of elements $m_i$\
    Pick a hash function from 2-Universal Hash family $\mathcal{H}_i: , \text{slot} , i \rightarrow \lbrace 0, 1, \cdots, m_i^2 - 1\rbrace$\
    Repeat until no collision happened\
    Note: Need to do for every slots (from $0 \sim N-1$)

所以最後每個元素 $x$的鍵值會是一個 pair ($h(x),h_{h(x)} \normalsize(x)$) ,所需總 slot 數 $\leq 4 N$

Running Time

先來看 Step3.,跟上面 Naive method 的方法一樣,每個 slot 抽取 no collision hash function 的次數期望值 $\leq 2$

再來看 Step1. , Number of total collision is

$$ \sum_{i=1}^N \frac{(m_i)(m_i - 1)}{2} = \frac{1}{2}\sum_{i=1}^N m_i^2 - \frac{N}{2} $$

又已知

$$ \mathbb{E}[\,\text{# of collisions}\,] \leq \binom{N}{2} \cdot \frac{1}{N} = \frac{N-1}{2} $$

綜合兩式,可得

$$ \mathbb{E}[\, \sum_{i=1}^N m_i^2 \,] \leq 2 N $$

根據 Markov Ineq.

$$ \Pr[\, \sum_{i=1}^N m_i^2 \, > 4 N] \leq \frac{1}{2} $$

$\Rightarrow$ 抽取次數的期望值 $\leq 2$

 
comments powered by Disqus