IS250 ComputerBased Communications Systems and Networks, Spring 2000
Project 1
ErrorCorrecting Codes. An Introduction.
1) Introduction
The theory of error detecting and correcting codes is that branch of engineering and mathematics which deals with the reliable transmission and storage of data. Information media are not 100% reliable in practice, in the sense that noise (any form of interference) frequently causes data to be distorted. To deal with this undesirable but inevitable situation, some form of redundancy is incorporated in the original data. With this redundancy, even if errors are introduced (up to some tolerance level), the original information can be recovered, or at least the presence of errors can be detected. We saw in class how adding to the original message the parity bit or the arithmetic sum allows the detection of a (certain type of) error. However, that kind of redundancy doesn't allow for the correction of the error. Errorcorrecting codes do exactly this: they add redundancy to the original message in such a way that it is possible for the receiver to detect the error and correct it, recovering the original message. This is crucial for certain applications where the resending of the message is not possible (for example, for interplanetary communications and storage of data). The crucial problem to be resolved then is how to add this redundancy in order to detect and correct as many errors as possible in the most efficient way.
This document starts with an overview of the applications of the errorcorrecting codes, followed by a session describing the fundamental concepts. It then addresses some of the issues involved in the design of the codes. The last session is a brief introduction of the very last discovery in the theory of errorcorrecting codes: the turbo codes.
2) Applications of Error Correcting Codes
The increasing reliance on digital communication and the emergence of the digital computer as an essential tool in a technological society have placed errorcorrecting codes in a most prominent position. We cite here a few specific applications, in an attempt to indicate their practicality and importance.
Many computers now have errorcorrecting capabilities built into their random access memories; it is less expensive to compensate for errors through the use of errorcorrecting codes than to build integrated circuits that are 100% reliable. The single errorcorrecting Hamming codes, and linear codes in general, are of use here. Disk storage is another area of computing where errorcoding is employed. Storage capacity has been greatly increased through the use of disks of higher and higher density. With this increase in density, error probability also increases, and therefore information is now stored on many disks using errorcorrecting codes.
In 1972, the Mariner space probe flew past Mars and transmitted pictures back to earth. The channel for such transmissions is space and the earth's atmosphere. Solar activity and atmospheric conditions can introduce errors into weak signals coming from the spacecraft. In order that most of the pictures sent could be correctly recovered here on earth, the following coding scheme was used. The source alphabet consisted of 64 shades of gray. The source encoder encoded each of these into binary 6tuples and the channel encoder produced binary 32tuples. The source decoder could correct up to 7 errors in any 32tuple. This was done with the ReedMuller codes.
In 1979, the Voyager probes began transmitting color pictures of Jupiter. For color pictures the source alphabet needed to be much larger and was chosen to have 4096 color shades. The source encoder produced binary 12tuples for each color shade and the channel encoder produced 24tuples (i.e. 24 dimensional vectors). This code would correct up to 3 errors in any 24tuple, and is the Golay code.
The increasing popularity of digital audio is due in part to the powerful errorcorrecting codes that the digitization process facilitates. Information is typically stored on a small aluminized disk as a series of microscopic pits and smooth areas, the pattern representing a sequence of 0's and 1's. A laser beam is used as the playback mechanism to retrieve stored data. Because the data is digital, error correction schemes can be easily incorporated into such a system. Given an errorcoded digital recording, a digital audio system can on playback, correct errors introduced by fingerprints, scratches, or even imperfections originally present in the storage medium. The compact disc system pioneered by Philips Corporation of the Netherlands in cooperation with Sony Corporation of Japan, allowing playback of prerecorded digital audio disks, is an excellent example of the application of errorcorrecting codes to digital communication; crossinterleaved ReedSolomon codes are used for error correction in this system. Digital audio tape (DAT) systems have also been developed, allowing digital recording as well as playback.
Errorcorrecting codes are particularly suited when the transmission channel is noisy. This is the case of wireless communication. Nowadays, all digital wireless communications use errorcorrecting codes.
3) Fundamental Concepts
We begin with a few definitions, in order to develop a working vocabulary.
Let A be an alphabet of q symbols. For example, A = {a,b,c,...,z) is the standard lower case alphabet for the English language, and A = (0,1) is the binary alphabet.
Definition  A block code of length n containing M codewords over the alphabet A is a set of M ntuples where each ntuple takes its components from A. We refer to such a block code as an [n,M]code over A. 
In practice, we most frequently take A to be the binary alphabet.
Given a code C of block length n over an alphabet A, those specific ntuples over A which are in C are referred to as codewords.
Note that while the channel encoder transmits codewords, the ntuples received by the channel decoder may or may not be codewords, due to the possible occurrence of errors during transmission.
Example 1
Suppose the information we are to transmit comes from the set of symbols {A, B, C, D}. For practical considerations we associate sequences of 0's and l's with each of these symbols.
A > 00

B > 01

C > 10

D > 11

This is the source encoding.
Now we want to add some redundancy (channel encoding).
A > 00 > 00000

B > 01 > 10110

C > 10 > 01011

D > 11 > 11101

We have just constructed a [5,4]code over a binary alphabet. That is, we constructed a code with 4 codewords, each being a 5tuple (block length 5), with each component of the 5tuple being O or 1. The code is the set of ntuples produced by the channel encoder (as opposed to the source encoder). The source encoder transforms messages into ktuples (k=2 in the example above) over the code alphabet A, and the channel encoder assigns to each of these information ktuples a codeword of length n (n=5 in the example). Since the channel encoder is adding redundancy, we have n > k and hence we have message expansion. While the added redundancy is desirable from the point of view of error control, it decreases the efficiency of the communication channel by reducing its effective capacity. The ratio k to n is a measure of the fraction of information in the channel which is nonredundant.
Definition 
The rate of an [n,M]code which encodes information ktuples is R = K/n 
The rate of the simple code given in example 1 is 2/5.
The quantity r = nk is sometimes called the redundancy of the code.
A fundamental parameter associated with an [n,M]code C is the Hamming distance for C. Before we can define the Hamming distance for a code, we must define the Hamming distance between two codewords.
Definition 
The Hamming distance d(x,y) between two codewords x and y is the number of coordinate positions in which they differ. 
Example 2.
Over the alphabet A = (0,1}, the codewords x and y x=(10110) y=(11O11) have Hamming distance d(x,y) =3.
Example 3.
The codewords u and v over the alphabet A = (0,1,2), given by u = (21002) v=(12001) have Hamming distance d(u,v) =3.
Definition 
Let C be an [n,M]code. The Hamming distance d of the code C is d= min {d(x,y): x,y belong to C, x != y}. 
In other words, the Hamming distance of a code is the minimum distance between two distinct codewords, over all pairs of codewords.
Example 4.
Consider C = (c0,c1,c2,c3} where c0=(00000) c1 =(10110) c2= (01011) c3= (11101) This code has distance d = 3. It is the [5,4] code constructed earlier.
Suppose we have an [n,M]code C with distance d. We need to adopt a strategy for the channel decoder (or just decoder). When the decoder receives an ntuple r it must make some decision. This decision may be one of
(i) no errors have occurred; accept r as a codeword. 
(ii) errors have occurred; correct r to a codeword c. 
(iii) errors have occurred; no correction is possible. 
In general, the decoder will not always make the correct decision; for example, consider the possibility of an error pattern occurring which changes a transmitted codeword into another codeword. The goal is that the decoder take the course of action which has the greatest probability of being correct. One usually makes the assumption that errors are introduced by the channel at random, and that the probability of an error in one coordinate is independent of errors in adjacent coordinates.
The decoding strategy we shall adopt, called nearest neighbor decoding, can then be specified as follows.
Nearest Neighbor Decoding  If an ntuple r is received, and there is a unique codeword c that belongs to C such that d(r,c) is a minimun, then correct r to the c. If no such c exists, report that errors have been detected, but no correction is possible. By nearest neighbor decoding, a received vector is decoded to the codewords "closest" to it, with respect to Hamming distance. 
The figure below illustrates the concept. The red dots are the codewords, the green ones are the received words that can be corrected to the closest codewords and the blue dot is a received word that is equidistant from 2 codewords and therefore cannot be corrected.
A code is said to correct e errors if a decoder using the above scheme is capable of correcting any pattern of e or fewer errors introduced by the channel. In this case, the decoder can correct any transmitted codeword which has been altered in e or fewer coordinate positions.
Theorem  Let C be an [n,M]code having distance d = 2e+1. Then C can correct e errors. If used for error detection only, C can detect 2e errors. 
Let c be one codeword of C and S be the set of all ntuples over the alphabet of C and define
S(c) = {x belongs to S: d(x,c) <= e).
S(c) is called the sphere of radius e about the codeword c. It consists of all ntuples within distance e of the codeword c, which we think of as being at the center of the sphere.
In the figure below, the "spheres" arount the codewords of the previous figure. The radius is in this case is 1.
Given 2 codewords, their spheres don't intersect, hence if codeword c is transmitted and t<= e errors are introduced, the received word r is an ntuple in the sphere S(c) and thus c is the unique codeword closest to r. The decoder can always correct any error pattern of this type. If we use the code only for error detection, then at least 2e+1 errors must occur in a codeword to carry it into another codeword. If at least 1 and at most 2e errors are introduced, the received word will never be a codeword and error detection is always possible.
The spheres have to be disjoint in order to be able to correct errors. But if a received word happens to be in the space between the spheres, correction is not possible: that space doesn't belong to any sphere, i.e. to any codeword: the error is detected but not corrected. Therefore, we want to find codes such that the the spheres around the codewords are disjoint (to correctly correct the errors) but as close as possible to each other, to correct ALL the errors. If all the space is taken by the sphere, all the words received will fall into the sphere of a codeword and therefore can be corrected to that codeword. These are the perfect codes. Lets give the formal definition of perfect codes.
Definition  A perfect code is an eerrorcorrecting [n, M]code over an alphabet A such that every ntuple over A is in the sphere of radius e about some codework. 
In a perfect code of block length n, the sphere of radius e about codeworks are not only disjoint (since the code is eerrorcorrecting) but exhaust the entire space of ntuples.
4) Issues in the design of errorcorrecting codes
Designing a good code is a very hard problem. Also, lots of factors need to be taken in account. For instance, in practice the code should be designed appropriately depending on the expected rate of errors for the particular channel being employed. If it is known that the probability that the channel will introduce 2 errors into any transmitted codeword is extremely small, then it is likely not necessary to construct a code that will correct all 2bit error patterns. A single error correcting code will likely suffice. Conversely, if double errors are frequent and a single error correcting code is being used, then decoding errors will be frequent. From this brief introduction, a number of questions arise.
Lets examine the issue of channel decoding in greater detail. When a word is received and it's not a codeword, finding the closest codeword involves computing the distance from the received word to each codeword, requiring M comparisons. While this might be acceptable for small M, in practice we usually find M quite large. Suppose a code C is used with M = 2^50 codewords, which is not unrealistic. If we could carry out 1 million distance computations per second, it would take around 20 years to make a single correction. Clearly, this is not tolerable and more efficient techniques are required. Wishing to retain the nearest neighbor decoding strategy, we seek more efficient techniques for its implementation. The theory of errorcorrecting code is concerned with constructing codes for various values of n, M and d, and the consideration of appropriate encoding and decoding techniques. Most of the best known codes are algebraic in nature like the linear codes. Linear codes are an important and general class of errorcorrecting codes. For example, the BCH codes used for CDs are linear codes. They are heavily mathematical and an appropriate description here is not possible. The next session describes instead the last "discovery" in the theory of error correcting codes, the Turbo codes.
5) Turbo Codes
In session 3, when we introduced the Nearest Neighbour Decoding, we didn't mention the fact that what the decoding actually does is maximizing the probability P(rc) that r is received, given that c is sent. i.e. choosing the nearest codeword is equivalent to choosing the most likely input message c given the received tuple r. This decoding strategy is known as maximum likelihood decoding
We can then state the decoding problem in the following manner.
Let U be the original message, X the encoded message and Y the message received after the noisy channel. This is schematized in the following figure.
The receiver knows Y and the model of the encoder, i.e. is has a knowledge of how the original message was originally encoded.
The problem now is finding the U that maximizes the probability that U was sent given that Y was received, i.e. the probability P(UY).
Having stated the decoding problem in probabilistic terms, we can take advantage of various methods that deal with probability estimations. A class of such methods is the class of graphical models.
The simplest graphical model:
A classical problem of probability estimation is finding the probability distribution P(XY) where X and Y are 2 random variables. The graphical model framework allows for such probability estimations to be calculated in an efficient way.
Note that the calculation of P(XY) from P(YX) and P(X) is just Bayes rule  P(XY) = P(YX)P(X)/P(Y)  and that graphical model inference algorithms can be viewed as generalizations of Bayes rule to arbitrary graphs.
Lets now see how we can view error correcting codes as graphical models. In the figure below, a convolution code as a graphical model. Let U be the original message, X the encoded message and Y the message received after the noisy channel.
Now, the decoding problem of finding U that maximizes p(UY) can be tackled within the framework of graphical models.
We have introduced the convolutional codes mainly because the Turbo codes are two convolutional codes put together. The following figure illustrate the graphical representation of a turbo code.
It should be mentioned that general inference in graphical models is NPhard, and, in particular, exact inference, calculating P(U  Y), for turbo codes is infeasible and therefore an approximate inference algorithm is used.
Turbo codes were discovered very recently. There is much excitement about them. They outperform all other kinds of errorcorrecting codes in the case of long block lengths, although the reason why they do so is not yet clear.
7) References and links
An Introduction to Error Correcting Codes with Application, Vanstone, van Oorschot, Kluwer Academic Publishers
A few experts  David MacKay 
Brendan J. Frey  
Stephen B. Wicker  
Robert J. McEliece 
A few words on Quantum ErrorCorrecting Codes
Want to buy errorcorrecting codes?