在 第一講 中,我們證明了有很高的機率,任意 slot 中 element 的個數均為 Θ ( ln n ln ( ln n ) ) \mathcal{\Theta}(\frac{\ln n}{\ln (\ln n)}) Θ ( l n ( l n n ) l n n ) ,而這一講想討論的是,如果今天我們手上有兩個 hash function (from same H \mathcal{H} H )可以選,每次我們就看哪一個 h h h 回傳的 slot 中元素比較少,就將 element hash 到 slot,那麼現在 E [ , max num of elements in any slots , ] \mathbb{E}[, \text{max num of elements in any slots} , ] E [ , max num of elements in any slots , ] ?
Chernoff Bound
n n n 個 identical R.V. 的相加,想 measure 其偏離平均值的機率
Let X 1 , X 2 , ⋯ , X n X_1, X_2, \cdots, X_n X 1 , X 2 , ⋯ , X n be independent binary R.V with E [ X i ] = p i \mathbb{E}[X_i] = p_i E [ X i ] = p i
X = ∑ i X i X = \sum_i X_i X = ∑ i X i , ; μ = E [ X ] = ∑ i p i ; \mu = \mathbb{E}[X] = \sum_i p_i ; μ = E [ X ] = ∑ i p i
Pr [ X ≥ e μ ] ≤ e − μ
\boxed{\Pr[X \geq e \mu] \leq e^{- \mu}}
Pr [ X ≥ e μ ] ≤ e − μ
Power of 2 choices
Throw n n n balls into n n n bins, \
For every ball, we pick 2 bins uniformly at random, throw the ball into the bin with less load
Let B k = ∣ ; { i , ∣ , b i ≥ k } ; ∣ B_k = | ;\lbrace i , | ,b_i \geq k \rbrace;| B k = ∣ ; { i , ∣ , b i ≥ k } ; ∣ (有 ≥ k \geq k ≥ k 顆球的 bin 數)\
β k \beta_k β k 為 B k B_k B k 的 upper bound ; E k \quad \mathcal{E}_k E k 為 β k > B k \beta_k > B_k β k > B k (上限符合要求) 的 event
\
假定我們已經有了 B k ≤ β k B_k \leq \beta_k B k ≤ β k 這個條件 (E k \mathcal{E}_k E k 成立),該如何 inductively 去找 β k + 1 \beta \scriptstyle k+1 β k + 1 ?
For any balls j j j , we say j j j is r e d \color{red}{red} r e d if the number of balls below it ≤ k \leq k ≤ k
Pr [ , j , is red , ] ≤ ( β k n ) 2 \Pr[ , j , \text{is red} ,] \leq (\frac{\beta_k}{n})^2 Pr [ , j , is red , ] ≤ ( n β k ) 2 (兩個 bin 球數都要 ≥ k \geq k ≥ k )
E [ B k + 1 , ∣ E k ] ≤ E [ num of red balls , ∣ E k ] ≤ n ⋅ ( β k n ) 2 \mathbb{E}[B\scriptstyle k+1 \normalsize , | \mathcal{E}_k] \leq \mathbb{E}[\text{num of red balls} , | \mathcal{E}_k] \leq n \cdot (\frac{\beta_k}{n})^2 E [ B k + 1 , ∣ E k ] ≤ E [ num of red balls , ∣ E k ] ≤ n ⋅ ( n β k ) 2 \
(第一個等號因為有紅色球的 bin 一定有 ≥ \geq ≥ k + 1 k+1 k + 1 個球,後面那個等號是因為 Union Bound)
Using Chernoff Bound \color{blue}{\text{Chernoff Bound}} Chernoff Bound , Pr [ B k + 1 ≥ e β k 2 n ∣ E k ] ≤ exp ( − β k 2 n ) \Pr[B \scriptstyle k+1 \normalsize \geq e \frac{\beta_k^2}{n} | \mathcal{E}_k] \leq \exp(-\frac{\beta_k^2}{n}) Pr [ B k + 1 ≥ e n β k 2 ∣ E k ] ≤ exp ( − n β k 2 ) \
\
(如果 exp ( − β k 2 n ) ≤ 1 n 2 \exp(-\frac{\beta_k^2}{n}) \leq \frac{1}{n^2} exp ( − n β k 2 ) ≤ n 2 1 ) , 則我們可以選 β k + 1 = e , β k 2 n \beta \scriptstyle k+1 \normalsize = e , \frac{\beta_k^2}{n} β k + 1 = e , n β k 2 \
整理一下此遞迴式,我們有了以下關係式
$$ \boxed{\frac{\beta_{k+m}}{n} = \frac{1}{e} \, (\frac{e \beta_k}{n})^{2^m}} $$
接下來要選定初始值 k k k ,就可以確定 β \beta β 的值了,\
注意到 β 6 ≤ ⌊ n 6 ⌋ ≤ n 2 e \beta_6 \leq \lfloor \frac{n}{6} \rfloor \leq \frac{n}{2e} β 6 ≤ ⌊ 6 n ⌋ ≤ 2 e n (顯然會超過 6 顆球
的bin數 一定會小於均分的 case)\
代回上式,可得
$$ \frac{\beta_{6+m}}{n} \leq \frac{1}{e} \, \frac{1}{2^{2^m}} $$
當 m = O ( ln ln n ) m = \mathcal{O}(\ln \ln n) m = O ( ln ln n ) , we can find that β 6 + m < 1 \beta \scriptstyle 6+m \normalsize < 1 β 6 + m < 1 \
(亦即超過 O ( 1 ) + ln ln n \mathcal{O}(1) + \ln \ln n O ( 1 ) + ln ln n 顆球的 bin 出現的機率極小)
這裡只的機率很小是指 Pr [ ¬ , E m ] \Pr[\neg , \mathcal{E}_m] Pr [ ¬ , E m ] 很小,根據上方的遞迴式,我們可以給出一個明確的上界
Fact
$$ \boxed{\Pr[ \neg \, \mathcal{E}_{k+1}] \leq \Pr[\neg \, \mathcal{E}_{k+1} \,| \, \mathcal{E}_k] \cdot \Pr[\mathcal{E}_k] + \Pr[\neg \, \mathcal{E}_k]} $$
proof.
$$ \begin{align} \text{RHS} \, &= \, \Pr[\neg \, \mathcal{E}_{k+1} \,| \, \mathcal{E}_k] \cdot \Pr[\mathcal{E}_k] + \Pr[\neg \, \mathcal{E}_k] \\ & = \Pr[\neg \, \mathcal{E}_{k+1} \, \land \, \mathcal{E}_k] + \Pr[\neg \, \mathcal{E}_k]\\ & = \underbrace{\Pr[\neg \, \mathcal{E}_{k+1} \, \land \, \mathcal{E}_k] + \Pr[\neg \, \mathcal{E}_k \, \land \, \neg \, \mathcal{E}_{k+1}]}_{\Pr[\neg \, \mathcal{E}_{k+1}]} + \Pr[\neg \, \mathcal{E}_k \, \land \, \mathcal{E}_{k+1}] \\ & \geq \text{LHS} \end{align} $$
⇒ Pr [ ¬ , E k ] ≤ 1 n \Rightarrow \Pr[\neg , \mathcal{E} \scriptstyle k \normalsize] \leq \frac{1}{n} \quad ⇒ Pr [ ¬ , E k ] ≤ n 1 (利用 Pr [ ¬ , E 6 ] = 0 \Pr[\neg , \mathcal{E}_6 ] = 0 Pr [ ¬ , E 6 ] = 0 ,遞迴求出)
More Discussion
l l l -choice
第 2 步變為 Pr [ , j , is red , ] ≤ ( β k n ) l \Pr[ , j , \text{is red} ,] \leq (\frac{\beta_k}{n})^l Pr [ , j , is red , ] ≤ ( n β k ) l (l l l 個 bin 球數都要 ≥ k \geq k ≥ k )
$$ \boxed{\frac{\beta_{k+m}}{n} = \frac{1}{e} \, (\frac{e \beta_k}{n})^{l^m}} $$
當 m = O ( ln ln n ln l ) m = \mathcal{O}(\frac{\ln \ln n}{\ln l}) m = O ( l n l l n l n n ) , we can find that β 6 + m < 1 \beta \scriptstyle 6+m \normalsize < 1 β 6 + m < 1 ,但其實還是在同個 order
What if β k \beta_k β k is so small
That is, 第 4 步的 bound 並不足夠小,也就是隨著 β k \beta_k β k 的 decay, exp ( − β k 2 n ) \exp(-\frac{\beta_k^2}{n}) exp ( − n β k 2 ) 終會 > 1 n 2 > \frac{1}{n^2} > n 2 1 ⇒ β k < 2 n ln n \Rightarrow \beta_k < \sqrt{2n \ln n} ⇒ β k < 2 n ln n \
Pick 使得此式成立的 min. k ⋆ k^\star k ⋆ , k ⋆ = ln ln n ln 2 + O ( 1 ) k^\star = \frac{\ln \ln n}{\ln 2} + \mathcal{O}(1) k ⋆ = l n 2 l n l n n + O ( 1 )
可以證明有 bin 超過 k ⋆ + 2 k^\star + 2 k ⋆ + 2 的機率足夠小 (但證明我沒看懂QQ)
Please enable JavaScript to view the comments powered by Disqus.