NTU Machine Learning - Lec4 · Stay hungry. Stay foolish

28 Dec 2017

Machine Learning / NTU

這一講要討論的是機器學習真的是可行的嗎？能推廣到所有 scenerio 嗎？或是只有部份？是否需要某些假設才能證明可行呢？

前言

課堂中給出了一個 learning puzzle 的例子，簡單來說，對於那些 training data 以外的資料，總是有個小惡魔設計與我們找到的 $g$ 之 predction，恰恰相反的 label ，來反駁 ML 並沒有學到任何事情(儘管在 training data 上全部 align 了…)。對我們未知的部份，我們只能任人擺佈，但這些未知，也正是需要 ML 發揮價值的地方。

Infer sth unknown

讓我們先跳出 learning 的框架，想一想從 training data 中，我們能否去推估關於未知部份的一些統計量？從已知連向未知的過程中，嘗試解決(partially)上方遇到的困難。這裡以簡單的 bin model 作為介紹。

Bin Model

A bin consist many $\textcolor{orange}{\text{orange}}$ and $\textcolor{green}{\text{green}}$ balls
Want: estimate the portion of $\textcolor{orange}{\text{orange}}$ balls $\mu$

先隨機抓取一把球，算出當中 $\textcolor{orange}{\text{orange}}$ ball 的比例 $\upsilon$，是否能以$\upsilon$ 去 estimate 未知的 $\mu$？

Hoeffding’s Inequality

\[ \Pr[|\upsilon - \mu| > \textcolor{red}{\epsilon}] \leq 2 \, \exp(-2\textcolor{red}{\epsilon}^2\textcolor{blue}{N}) \]

Probably approximately correct (PAC)

When $\Pr[|\upsilon - \mu| > \textcolor{red}{\epsilon}]$ is small enough,
we say the statement $\upsilon = \mu$ is PAC (which can be achieved by sampling large $\textcolor{blue}{N}$)

Connect with learning framework

Given fixed hypothesis function $\textcolor{orange}{h}(x)$

$h$ is $\textcolor{orange}{\text{wrong}}$，mark as $\textcolor{orange}{\text{orange}}$
$h$ is $\textcolor{green}{\text{correct}}$，mark as $\textcolor{green}{\text{green}}$

bin 可以想成是 input space $\mathcal{X}$，但我們僅從中 sample 出 $\mathcal{D}$ 作為 training data ，當$|\mathcal{D}|$ (i.e $\textcolor{blue}{N}$) 夠大時，可以說 $\textcolor{orange}{h}$ 在 training data 與未知的世界表現是 probably consistent 的 ($\Pr[|E_i(h) - E_o(h)| > \textcolor{red}{\epsilon}]$ is sufficiently small)。

How to find sufficiently good $h$

我們的目標是要找一個 $E_o(h)$ 夠小的 $h$ ，而能推估$E_o(h)$ 的唯一已知量便是 $E_i(h)$ 了，而 Hoeffding Ineq. 告訴我們對於單一 $h$，這件事是 PAC。但 $\mathcal{H}$ 不會只有一個 $h$ ，學習的流程是對每一個 $h$ 都去做 sample，並從中挑選出 sample error rate $E_i(h)$ 最小者。然而，這件事情會放大 Hoeffding Ineq. 的 upper bound。舉例來說，對 150 個公平銅板，各作 5 次隨機試驗，其中有一個銅板連續 5 次都是正面的機率 > 99% ，但我們能因此便說該銅板正面的機率為 1 嗎？我們把這類使得對給定銅板($\textcolor{orange}{h}$) $E_i$ 跟 $E_o$ 差很多的試驗($\mathcal{D}$)稱作 $\textcolor{red}{\text{BAD}} \, \text{sample}$(也就是說，雖然單看一個銅板，$\Pr[\textcolor{red}{\text{BAD}}]$ 很小，但對 150 個銅板而言， $\Pr[\textcolor{red}{\text{BAD}}]$ 便超高。

從 learning 的角度來看，就是給定 $\mathcal{D}$，它對各個 $\textcolor{orange}{h}$ 為 $\textcolor{red}{\text{BAD}}$ 的機率雖小，但對 $\mathcal{H}$ 中至少一個 $\textcolor{orange}{h}$ 是 $\textcolor{red}{\text{BAD}}$ 的機率卻是不小的。

$$ \begin{align} \Pr_D[\color{red}{\text{BAD}} \, \text{for} \, \mathcal{H}] &= \Pr_D[\color{red}{\text{BAD}} \, \text{for}\, h_1 \, \lor \color{red}{\text{BAD}} \, \text{for}\, h_2 \, \lor \, \cdots \, \color{red}{\text{BAD}} \, \text{for}\, h_M ] \\ & \leq \Pr_D[\color{red}{\text{BAD}} \, \text{for}\, h_1] \, + \, \Pr_D[\color{red}{\text{BAD}} \, \text{for}\, h_2] \, + \, \cdots \, \Pr_D[\color{red}{\text{BAD}} \, \text{for}\, h_M] \, (\because \, \text{Union Bound}) \\ & \leq 2 \, M \, \exp(-2 \color{red}{\epsilon}^2 \color{blue}{N}) (\because \, \text{Hoeffding Ineq.}) \end{align} $$

$\Rightarrow$ learning is possible when $|\mathcal{H}|$ is finite and $E_i(h)$ is small
所謂 possible 是指，在 data 足夠的情況下，是可以找到 $E_o$ probably approximately and sufficiently small 的 $\textcolor{orange}{h}$ 。

Reference

林軒田老師的講義

View times

NTU Machine Learning - Lec4

Feasibility of Learning

前言