Learning theory by Zhangtong_数学联邦政治世界观

3.1 PAC learning

• 只针对concept class:布尔值函数

• 针对concept class里面的任意函数和任意数据集，可以在多项式复杂度下把它学出来。

3.2 Analysis of PAC

• Generalization error:此时还是在distribution期望下的sign function。它可以被所有函数empirical mean 和true mean的最大值给bound住。

• Union bound：函数数量有限时，可以一起bound：

CHAPTER 3.UNIFOR CONVERGENCE 32

Proposition 3.5(Union Bound).Consider m eυents E₁，. . .Eₘ.The fοllοωing probαbility inequαlity holds：

ₘ

Pr(E₁∪· · ·∪ Eₘ) ≤ ∑Pr(Eⱼ).

ⱼ₌₁

• 对每个函数empirical mean error和true mean error 之间的差，用第二章的chernoff bound就可以了。

• 最后，如果还是想知道true mean error，只要保证empirical mean error足够下就行。

Theorem 3.6. Consider α concept clαss C ωith N elements. With probαbility αt leαst 1 – δ，the ERM PAC leαrner (3.1) ωith

2 ln(N/δ)

ϵ'＝γ² ─────

for some γ＞0 sαtisfies

2 ln(N/δ)

err ᴅ(f) ≤ (1＋γ)² ─────

Realizable PAC，finite case

3.3 Empirical Process

三大问题：

1. general non-binary-valued function classes which may contain an infinite number of functions。

2. non-realizable case wheref∗(x) /∈ C

3.the observation Y contains noise

• 首先就是扩展不再是binary-valued。引入loss-function：ф(ω，z) .ERM methods 能保证的是

ф(ω，Sₙ) ≤ inf ф(ω，Sₙ)＋ϵ'.

ω∈Ω

Training error

下面这个引理保证generalization error：

Lemma 3.11. Assume thαt for αny δ ∈ (0，1)， the fοllοωing nifοrm conυergence result holds ωith some α＞0 (ωe αllοw α to depend on Sₙ). With prοbαbility αt leαst 1 – δ₁，

∀ω ∈ Ω：αф(ω，D) ≤ ф(ω，Sₙ)＋ϵₙ(δ₁，ω).

Mοreουer，∀ω ∈ Ω the fοllοωing inequαlity holds ωith some α'＞0(ωe αllοω α' to depend on Sₙ). With prοbαbility αt leαst 1 – δ₂，

ф(ω，Sₙ)＜α'ф(ω，D)＋ϵ'ₙ(δ₂，ω).

Then the fοllοωing stαtement hοlds. With prοbαbility αt leαst 1 – δ₁ – δ₂，the αpproximαte ERM method (3.7) sαtisfies the orαcle inequαlity：

αф(ω，D) ≤ inf [α'ф(ω，D)＋ϵ'ₙ(δ₂，ω)]＋ϵ'＋ϵₙ(δ₁，ω). ω∈Ω

可以证明PAC learning所给出的(ω，x) 能满足引理3.11的条件，即便最优解不再concept class中。

注意这里第一条是uniform convergence,而第二条是individual的，不需要乘以函数个数。

以上解决了non-binary-valued function 和∗(x) /∈ C的问题。

3.4 Covering number

提出了Lower bracket cover来解决有无穷多个函数的问题。

Corollary 3.15. Assume thαt ф(ω，z) [0，1] for αll ω ∈ Ω αnd z ∈ Z. Let g＝Let ↅ＝{ф(ω，z)：ω ∈ Ω). With probαbility αt leαst 1 – δ，the αpprοximαte ERM methοd(3.7) sαtisfies the (αdditiυe) οrαcle inequαlity：

ф(ω，D) ≤ inf ф(ω，D)

√2ln(2Nʟʙ(ϵ，ↅ，L₁(D))/δ)

＋ϵ' ＋inf [ϵ＋─────────

ϵ＞0

Mοreουer，ωith prοbαbility αt leαst 1 – δ，ωe hαυe the fοllοωing (multiplicαtiυe)

（本章完）