A high throughput, low latency open-source LDPC decoder optimized for x86 processors.
With the increasing popularity of Software Defined Radio, there is more and more a need for high performance software implementations of signal processing. For example, an LDPC decoder implementation would be greatly beneficial to the SDR community as LDPC is used in many protocols, such as DVB 2nd gen., 802.16e (WiMAX) or 802.3an (10 Gbit ethernet).
In that respect, we propose a new software library that provides a high throughput, low latency LDPC decoder thanks to SIMD instructions on x86 processors.
LDPC stands for Low Density Parity Check codes. The idea behind this code is to have a parity check matrix H that characterizes how a message (a set of bits) is encoded. This matrix describes the relationship between bits in an LDPC codeword. The matrix is typically sparse, thus the name “Low Density”. You can think of each line of the matrix as a set of bits of the codeword : a ‘1’ means the bit is part of the set, a ‘0’ means it is not. The key property here for a codeword to be valid is that, for each set, the number of bits with value ‘1’ is even.
For a more in-depth introduction to LDPC, checkout this great video.
Error correction is usually the most computationally demanding task when decoding a digital signal. Thus, optimizations are more than welcome to speedup the computations. When they are supported on CPUs, SIMD instructions have the potential to reduce the execution time. Unfortunately, decoding LDPC codewords is not a trivial task. The process involved in decoding LDPC, based on a variant of Belief Propagation (BP), makes optimizations difficult to implement. Problems that arise are :
- non-consecutive memory accesses based on the structure of the parity check matrix
- low density arithmetic computations
- non-trivial parallelism
Nevertheless, when certain conditions were met, SIMD optimizations permitted a tenfold decrease in execution time compared to the baseline implementation on an Intel I7-3770 CPU. For example, you can expect around 50 Mb/s of decoding throughput if you decode with 4 threads enabled, 5 decoding iterations and the layered version of BP. If you want more details on the execution times, you can have a look at the performance table here.
As you can see in the next figure, the number of decoding iterations will greatly influence the bit-error rate performance of the decoder. However, execution time is proportional to the iteration count. Hence you will need to find a compromise between bit-error rate and execution time performance.
Have a look at the repository where you can find its documentation, install the library and measure the performance you get on your setup. You are more than welcome to contribute to this project : further optimize the decoding time, add support for new LDPC codes etc. For the time being, only one code (DVB-S2) with three coderates (1/2, 8/9, 9/10) is supported by the library.
Finally, we are working on integrating this library in the gnuradio gr-ccsds module currently being developed, in order to provide a missing piece to this software.
We would like to thank B. Le Gal for his work on which this library is based upon.