Information Theory – Chapter 2: Property of Entropy
In this chapter, we will further discover the properties of Entropy.
Theorem(Jensen’s inequality): If function f is convex and X is a discrete random variable. It takes values on \mathcal{X}(x_1, x_2, \cdots, x_n), P_X=(p_1, \cdots, p_n). Then
This could also be written as
If f is strictly convex, and if equality holds, then X=\mathbb{E}[X].
Theorem: P(x), Q(x) are distribution on \mathcal{X}, for x\in \mathcal{X}. So D(P||Q)\ge 0 with equality iff P(x)=Q(x), \forall x\in \mathcal{X}.
Proof: Let
we may assume Q(x)>0, x\in A. (else if Q(x)=0 for some x\in A then D=\infty)
Then,
-D(P||Q)&=\sum_{x\in A}P(x)\log\frac{P(x)}{Q(x)}\\
&=\sum_{x\in A}P(x)\log\frac{Q(x)}{P(x)}\\
&\le \log \sum_{x\in A} P(x) \frac{Q(x)}{P(x)}\\
&=\log\sum_{x\in X}Q(x)\\
&=0
\end{aligned}
Since f(x)=\log(x) is concave \Rightarrowequality holds iff \frac{Q(x)}{P(x)}=C, \forall x\in \mathcal{X}. As
Corollary: I(X;Y\ge 0) with equality iff X and Y are independent.
Proof:
I(X;Y)=D(P_{X, Y}||P_XP_Y)∎
\end{aligned}
Corollary: D(P_{Y|X}||Q_{Y|X})\ge 0 with equality iff P_{Y|X}(y|x)=Q_{Y|X}(y|x) for all (y, x)\in Y\times X s.t. P_X(x)>0.
Corollary: I(X, Y|Z)\ge 0 with equality iff P_{XY|Z}(xy|z)=P_{X|Z}(x|z)P_{Y|Z}(y|z).
Theorem: H(X)\le \log |X| with equality iff X is uniformly distribution over X.
Proof:
Let U(x)=\frac{1}{|X|} for x\in X.
D(P_X||U)&=\sum_{x\in X} P(x)\log\frac{P_X(x)}{U(x)}\\
&=\log |X|-H(X)\ge 0∎
\end{aligned}
Theorem: H(X|Y)\le H(X).
Proof: H(X)-H(X|Y)=I(X;Y)\ge 0.∎
However, note that condition on an specific event may/maynot decrease entropy:
Theorem(Independent bound): H(X_1, \cdots, X_n)\le \sum_{i=1}^nH(X_i) with equality iff X_1, \cdots, X_n are independent.
Han’s inequality:
H(X_1, X_2, \cdots, X_n)\le \frac{1}{n-1}\sum_{i=1}^nH(X_1, \cdots, X_{i-1}, X_{i+1}, \cdots, X_n).\\
\forall i, H(X_1, X_2, \cdots, X_n)=H(X_1, \cdots, X_{i-1}, X_{i+1}, \cdots, X_n)\\
+H(X_i|X_1, \cdots, X_{i-1}, X_{i+1}, \cdots, X_n)
\end{aligned}
So
\sum_{i=1}^nH(X_1, X_2, \cdots, X_n)&=\sum_{i=1}^n H(X_1, \cdots, X_{i-1}, X_{i+1}, \cdots, X_n) \\
&+ \sum_{i=1}^n H(X_i|X_1, \cdots, X_{i-1}, X_{i+1}, \cdots, X_n) \\
&\le \sum_{i=1}^nH(X_1, \cdots, X_{i-1}, X_{i+1}, \cdots, X_n) \\
& + \sum_{i=1}^nH(X_i|X_1, \cdots, X_{i-1}) \\
&= \sum_{i=1}^nH(X_1, \cdots, X_{i-1}, X_{i+1}, \cdots, X_n) + H(X_1,\cdots, X_n) \\
\Rightarrow & (n-1)H(X_1, \cdots, X_n)\le \sum_{i=1}^nH(X_1, \cdots, X_{i-1}, X_{i+1}, \cdots, X_n).∎
\end{aligned}
We could demonstrate the Han’s inequality by another method.
Definition(Submodular functions): Let
The function f is called submodular if
This definition is equivalant to: if \forall T_1\subset T_2\subset S, b\in S-T_2,
which means adding elements brings dimishing marginal return.
Theorem: Let discreterandom variable
Let
then the function T\rightarrow H(X_T) is submodular.
So with this theorem, we could demonstrate Han’s inequality more conveniently.
Denote
T_1=\{1, \cdots, i-1\}, T_2=\{1, \cdots, i-1, i+1, \cdots, n\}, b=i.
\end{aligned}
Using equivalent definition,
& H(X_{T_2\cup\{i\}})-H(X_{T_2}) \le H(X_{T_1\cup \{i\}})-H(X_{T_1})\\
\end{aligned}
\Rightarrow H(X_1, \cdots, X_n) – H(X_1,\cdots, X_{i-1}, X_{i+1}, X_n) &\le H(X_1, \cdots, X_i)-H(X_1, \cdots, X_{i-1})\\
&=H(X_i|X_1, \cdots, X_{i-1})
\end{aligned}
Summing over i=1, 2, \cdots, n, we obtain Han’s inequality.
Theorem(Log-sum inequality): Let a_i\ge 0, b_i\ge 0, i=1, 2, \cdots, n. Let a=\sum_{i=1}^n a_i, b=\sum_{i=1}^n b_i,
Proof:
\sum_{i=1}^n a_i \log \frac{a_i}{b_i} &= \sum_{i=1}^n b_i \frac{a_i}{b_i}\log\frac{a_i}{b_i}\\
&=b\sum_{i=1}^n \frac{b_i}{b} \frac{a_i}{b_i} \log\frac{a_i}{b_i}\\
&\ge b(\sum_{i=1}^n \frac{b_i}{b}\frac{a_i}{b_i})\log(\sum_{i=1}^n\frac{b_i}{b}\frac{a_i}{b_i})\\
&=b \frac{a}{b}\log\frac{a}{b}\\
&=a\log\frac{a}{b}. ∎
\end{aligned}
Theorem(Convexity of D): Let P(X), Q(X) be PMFs, then D(P||Q) is convex in (P, Q), i.e., D(\lambda P_1+(1-\lambda)P_2||\lambda Q_1+(1-\lambda)Q_2)\le \lambda D(P_1||Q_1) + (1-\lambda) D(P_2||Q_2).
Proof: Apply log-sum inequality termwise. Fix x\in X,
Summing over the x\in X, we will obtain the desired results. ∎
Theorem(Convexity of H):H(P) is a concave function of P.
Proof:
Note that D(P||U)=\sum_{x\in X}P(x)\log\frac{P(x)}{|X|}. ∎
Theorem: Given RVs X, Y with joint PMF P_{XY},
(1). I(X; Y) is concave in P_X for a fixed P_{Y|X}.
(2). I(X;Y) is convex in P_{Y|X} for a fixed P_X.
Proof:
(1). Note that
As
which means P_Y is linear in P_X.
Accordingly, H(Y) is concave in P_X, so is I(X;Y) when P_{Y_X} is fixed.
(2). Fix P_X.
Let P_1(y|x), P_2(y|x) be two possibility for P_{Y|X}.
Denote P_i(x, y)=P_X(x)P_i(y|x), P_i(y)=\sum_{x\in X}P_i(x, y).
Choose \lambda\in [0,1) and consider the mixtures:
So
P_\lambda(x, y)&=P_X(x)P_\lambda(y|x)=\lambda P_1(y|x)P_X(x)+(1-\lambda)P_2(y|x) P_X(x) \\
P_\lambda(y)&=\sum_{x\in X} P_\lambda(x, y)=\lambda P_1(y) + (1-\lambda) P_2(y).
\end{aligned}
Let Q_\lambda(x, y)=P_X(x)P_\lambda(y)=\lambda P_1(y)P_X(x)+(1-\lambda)P_2(y)P_X(x)=\lambda Q_1(x,y)+(1-\lambda)Q_2(x, y).
So, with convexity of D,