Information Theory – Chapter X Lempel-Ziv coding

This chaper is to talk about LZ78 and its asympotic optimaling.

LZ78: Ride along the sequence: if what you have observed after the last reset is not in the dictionary, place it in the dictionary with $(\text{pointer to the prefix}, \text{last symbol})$ in the encoding. Below shows an example:

For a $0-1$ string $(0000001101010000011010)$ , first parse the string into phrases: $(0,00,000,1,10,101,0000,01,1010)$ . And then encode the phrases as two parts, the first is the index of the prefix(if no prefix, set to $0$ ) and the second is the last digit. The encoding result is shown below:

index of prefix	last digit
0	(0000, 0)
00	(0001, 0)
000	(0010, 0)
1	(0000, 1)
10	(0100, 0)
101	(0101, 1)
0000	(0011, 0)
01	(0001, 1)
1010	(0110, 0)

Now we consider about its performance.

The worst case: The number of phrases $C(n)$ in a sequence of length $n$ satisfying

C(n)\le \frac{n}{\log n}(1+o(1)).

Proof:

First denoted by $n_k$ the sum of lengths of all strings of length $\le k$ . So

n_k=\sum_{j=1}^kj2^j=(k-1)2^{k+1}+2. \\

\begin{aligned} C(n_k)\le \sum_{j=1}^k2^j = 2^{k+1}-2&=\frac{(k-1)2^{k+1}-2(k-1)}{k-1} \\ &<\frac{(k-1)2^{k+1}+2}{k-1} = \frac{n_k}{k-1}. \end{aligned}

Let $n_k\le n<n_{k+1}$ , more precisely, we can write $n=n_k+m$

\begin{aligned} C(n)&\le \frac{n_k}{k-1} + \frac{m}{k-1}\\ &\le\frac{n_k}{k-1} + \frac{m}{k-1}\\ &=\frac{n}{k-1}. \end{aligned}

To bound $k$ , note that

\begin{aligned} &n\ge n_k\ge 2^k \Rightarrow k\le \log n \\ &n< n_{k+1}=k2^{k+2}+2\le (k+2)2^{k+2} \le (\log n +2)2^{k+2} \end{aligned}

\Rightarrow k+2\ge \log\frac{n}{\log n+2}

\begin{aligned} \Rightarrow k-1&\ge \log n – \log(\log n +2) -3 \\ &\ge (1-\epsilon_n)\log n \end{aligned}

where

\epsilon_n=\min\{1, \frac{\log\log n +4}{\log n}\}\rightarrow0(\text{when } n\rightarrow \infty).

C(n)\le \frac{n}{k-1}\le \frac{n}{(1-\epsilon_n)\log n}. ∎

For random string, LZ78 asymptotically approaches the entropy.

Random cases: Suppose $X$ is a memoryless source with PMF $P_X(i)=P_i$ . $P(x^n)=\prod_{i=1}^nP_X(x_i)=\prod_{i=1}^n P_{x_i}$ . Parse $x^n=(x_1, x_2, \cdots, x_n)$ into distinct phrases $(x_1, x_2, \cdots, x_n)=(y_1, y_2, \cdots , y_C)$ . So $P(x^n)=\prod_{i=1}^C P(y_i)$ .
Proof:

Let $C_L$ be the number of phrases of length $l$ among $y_i$ , $i=1, \cdots, c$ .

Before going on, we prove following lemma first:

Ziv’s Lemma:

-\log P(x^n)\ge \sum_lc_l\log c_l

Proof of lemma:

P(x^n)=\prod_l\prod_{y_i: |y_i|=l}P(y_i)

We know that all $y_i$ are distinct, so

\begin{aligned} &\sum_{y_i:|y_i|=l}\le 1 \\ \Rightarrow& \prod_{y_i:|y_i|=l} P(y_i)\le (\frac{1}{c_l})^{c_l} \\ \Rightarrow& \log P(x^n)\le \sum_{l}c_l\log \frac{1}{c_l} ∎ \end{aligned}

Suppose there are $m_i$ entries of symbol $i$ among $x_1, \cdots, x_n$ . Then

P(x^n)=\prod_{i\in X} P^{m_i}_i.

Invoking LLN, $m_i \approx nP_i$ , so

P(x^n)\approx \prod_{i\in X} P^{nP_i}_i,

\log P(x^n)\approx n\sum_{i\in X} P_{i} \log P_i \approx -nH(P).

So with Ziv’s lemma,

H(P)\ge \frac{\sum_l c_l\log {c_l}}{n}.

So we will prove that LZ comporesses to entropy if we show that:

\sum_{l}c_l\log c_l\approx C\log(C).

We note that

\sum_l c_l=C \Rightarrow \sum_{l}c_l\log c_l=C\log C+C\sum_l\frac{c_l}{C} \log \frac{c_l}{C},

where $-\sum_l\frac{c_l}{C}\log\frac{c_l}{C}$ is entropy of the distribution $Q=(q_i=\frac{c_i}{C}, i=1, 2, \cdots).$

Consider the following maximization problem:

\begin{aligned} \max H(Q)\\ s.t. \sum_l{lc_l=n} \end{aligned}

An exact solution can be found, but a very crude estimate is

\begin{aligned} H(Q)=o(log\frac{n}{C}) &\Rightarrow nH(P)\ge\sum_{l}c_l\log c_l=C\log C-o(\log\frac{n}{C})\\ &\Rightarrow \frac{C\log C}{n}\le H(P) + \frac{o(\log(\frac{n}{C}))}{n}. ∎ \end{aligned}

As $C$ is the number of the phrases, $\log C$ is the length to be coded. so $\frac{C\log C}{n}$ is bounded by $H(P)$ . asymptotically optimal holds.

Information Theory – Chapter X Lempel-Ziv coding

留下回复 取消回复

留下回复取消回复