An Experimental Comparison of Coded Modulation Strategies for 100 Gbit/s Transceivers

Coded modulation is a key technique to increase the spectral efficiency of coherent optical communication systems. Two popular strategies for coded modulation are turbo trellis-coded modulation (TTCM) and bit-interleaved coded modulation (BICM) based on low-density parity-check (LDPC) codes. Although BICM LDPC is suboptimal, its simplicity makes it very popular in practice. In this work, we compare the performance of TTCM and BICM LDPC using information-theoretic measures. Our information-theoretic results show that for the same overhead and modulation format only a very small penalty (less than 0.1 dB) is to be expected when an ideal BICM LDPC scheme is used. However, the results obtained for the coded modulation schemes implemented in this paper show that the TTCM outperforms BICM LDPC by a larger margin. For a 1000 km transmission at 100 Gbit/s, the observed gain was 0.4 dB.


I. INTRODUCTION
A promising alternative to increase the spectral efficiency (SE) of optical transmission systems is to use higher order modulation formats. To maintain reliable communication, the decreased sensitivity caused by high order modulation formats is compensated by forward error correction (FEC). The combination of a nonbinary (NB) modulation format and FEC is known as coded codulation (CM) [1]. Most current 100G transceivers use quadrature phase shift keying (QPSK) but future 400G transceivers are expected to employ CM based on 16-quadrature amplitude modulation (QAM) [2], [3]. Using higher order modulation formats is also a topic of current research, both in point-to-point links [4]- [6] and in the context of optical networks [7]- [10].
CM can be implemented in several ways. The most typical approach is to separate the coding (decoding) from the mapping (demapping) functions at the transmitter (receiver). This separation has the advantage that the binary FEC can be designed independently of the modulation format. This structure is typically known as bit-interleaved coded modulation (BICM) [11]- [13]. Another approach to CM is to combine the FEC and mapping into a single operation at the transmitter and to pass the channel outputs directly to a NB E. Sillekens, A. Alvarado   FEC (NB-FEC) decoder to recover the data bits at the receiver. This idea dates back to Ungerboeck's celebrated trellis-coded modulation (TCM) [1].
In this work, we compare the performance of these two CM strategies for two particular implementations, as shown in Fig. 1. We consider 8-ary phase-shift keying (8PSK) as the modulation format because it offers a higher SE than QPSK, yet a lower implementation complexity than 16QAM. Additionally, the use of phase-shift keyed (PSK) modulation formats is also motivated by recent results [14], where they are shown to outperform QAM formats in highly nonlinear channels (e.g., in dispersion-managed links). Furthermore, the codes rate we consider is R = 2/3, which when combined with 8PSK results in a SE comparable to traditional QPSKbased systems.
The first strategy is shown in Fig. 1 (a) and is based on a symbol-wise receiver structure. Here, the encoder is a NB-FEC that transforms data bits (c) directly into nonbinary constellation symbols (x). After transmission, a NB-FEC decoder uses the received symbols (y) to retrieve the data bits (ĉ). The NB-FEC encoder in Fig. 1 (a) operates on a symbol level. The second strategy, shown in Fig. 1 (b), is a suboptimal implementation of the NB-FEC decoder in Fig. 1 (a). This strategy is based on a bit-wise receiver, also known as BICM [12], [13]. At the transmitter, a binary FEC encoder converts data bits (c) into encoded bits (b = [b 1 , . . . , b m ] T ), which are then mapped to constellation symbols (x) using a memoryless mapper (Φ). These symbols are then transmitted over the channel. In BICM, the demapper (Φ −1 ) computes soft information on the encoded bits (l = [l 1 , . . . , l m ] T ) using the received symbols (y). This soft information is then passed to the binary FEC decoder to retrieve the data bits (ĉ). The suboptimality of this strategy originates from the reduction of soft information caused by the bit-wise demapper, i.e., the loss caused by replacing 2 m symbol likelihoods (for every possible transmitted symbol) by 2m bit likelihoods (for every transmitted bit), thereby passing less information to the decoder to estimate the transmitted bits.
For the NB-FEC, we consider the 8PSK-based turbo trelliscoded modulation (TTCM) encoder from [15], where each transmitted symbol carries two data bits. At the receiver, we use a symbol-wise iterative decoder that approximates the maximum likelihood (ML) decision rule.
The binary FEC in Fig. 1 (b) can be any binary code. In this work, a rate R = 2/3 low density parity check (LDPC) code is considered. LDPC codes have recently received a great deal of attention due to their excellent performance [16], [17]. Furthermore, we consider an 8PSK constellation based on the binary reflected Gray code (BRGC) [18], [19]. The encoded bits (b = [b 1 , b 2 , b 3 ] T ) are then mapped to 8PSK symbols, giving a net data rate of 2 bit/symbol. This is the same rate achieved by TTCM in Fig. 1 (b).
Previously, TTCM has been shown to improve the performance of direct detection systems [20], however, its performance was only compared to uncoded transmission. In our previous work [21], the performance of the iterative TTCM scheme discussed here, and shown in Fig. 1 (a), was compared with uncoded QPSK and also with noniterative TCM with 8PSK [1] at the same information rate of 2 bit/symbol. The results of [21] showed that iterative decoding provided the largest performance gain.
In this paper we consider two schemes that employ iterative decoding, and thus, are comparable in terms of decoding complexity. We investigate the benefits of TTCM over the more popular BICM scheme. An experimental comparison between these two schemes at a net data rate of 100 Gbit/s is presented for a dual polarisation (DP) 1000 km recirculating loop setup. The main contribution of this paper is to present this comparison based on information-theoretical metrics. In this paper we also present ready-to-use Monte Carlo expressions to evaluate these information-theoretical quantities.
This paper is organized as follows. In Sec. II, the implementation of the CM coded modulation strategies is detailed. In Sec. III the system performance metrics are explained and the description of the experimental setup is given in Sec. IV. The results are presented in Sec. V and the conclusion in Sec. VI.

II. CODED MODULATION STRATEGIES
In this section, the implementation of the TTCM and LDPC schemes is described. The selection of the codeword length for both strategies is also discussed.

A. TTCM
The TTCM scheme we consider in this paper was introduced by Robertson and Worz in [15] and is shown in Fig. 2. In this scheme, two R = 2/3 recursive systematic convolutional (RSC) encoders, with 8 states, encode the same data bits. The encoder structure is shown in Fig. 2 (b), where Z are delay elements and the additions are modulo-2. The symbol-wise encoders work on pairs of data bits to create a 3-bit symbols containing 2 data bits and one parity bit. One of the encoders (RSC 1 ) works directly on the 2-bit symbols, while the second encoder (RSC 2 ) works on symbol-wise interleaved (Π s ) 2-bit . The second one works on the interleaved (Πs) bits and its output is immediately deinterleaved (Π −1 s ). The encoder outputs are then punctured and mapped to 8PSK symbols for transmission. The decoder implements a symbol-wise soft demapper and then the odd and even symbols are split and sent into two BCJR decoders that pass only soft information on data bits to each other. symbols (see Fig. 2). The output of the second encoder is then symbol-wise de-interleaved (Π −1 s ) to realign the parity bit from this encoder with the original data bit pairs. The encoded 3-bit symbols are then punctured, such that output symbols consist of the odd symbols from the first encoder and the even symbols from second encoder. The 3-bit symbols are then mapped to 8PSK symbols using a natural binary mapping. The symbolwise interleaver is random and has the constraints that it maps odd to odd and is "s-random" to ensure that the corresponding trellis diagram has no parallel transitions [22].
At the receiver, shown in Fig. 2 (c), the received symbols y are converted into 8 log likelihoods (LLs) by the symbol-wise soft demapper M −1 . Because the odd symbols are produced by RSC 1 and the even symbols by RSC 2 , at the receiver we then separate the odd and even symbols to send these to separate decoders. The two decoders are based on the Bahl, Cocke, Jelinek, and Raviv (BCJR) [23] algorithm and work independently of each other by interchanging only soft information on the data bits. The first decoder (BCJR 1 ) works on the LLs from the odd symbols, where the LLs from even symbols are substituted by 0. The first decoder also uses a priori information on the data bits provided by the second decoder (BCJR 2 ). The a priori information (L a ) is subtracted from the output of the first decoder to obtain the extrinsic information (L e ) which is then passed to the Serial-Parrallel Fig. 3. LDPC encoder implementation. The incoming data bits are sequentially deserialised into 3 separate 2 bit wide streams and fed into independent identical rate 2/3 LDPC encoders. The 3 bit wide outputs are then serialized and then bit wise interleaved before mapping 8PSK symbols using a BRGC.
second decoder. The second decoder works on the LLs from even symbols, substituting the odd symbol LLs by 0. At the first iteration-and following [15]-the a priori information at the first decoder is given by a special metric. This metric is calculated by taking sum of the LLs of the symbols whose data bits are identical. This metric is only calculated at the positions of the even symbols; at the odd positions, zeros are used. The two decoders are then run sequentially for 10 iterations passing extrinsic information at each iteration. We chose 10 iterations because that number resulted in a decoder performance within 0.1 dB of the best achievable performance (obtained with 100 iterations).

B. BICM LDPC
The LDPC coding scheme we consider in this paper is based on the LDPC from the DVB-S2 standard [24]. The employed encoder structure is depicted in Fig. 3. The data bits are deserialised into three streams each two bits wide and sent to three identical rate R = 2/3 LDPC encoders each producing three output bits. All the encoded bits are re-serialised and a bit-wise interleaver was then used to interleave the codewords of all three encoders. The interleaved bits were then mapped to 8PSK symbols using the BRGC.
The LDPC receiver is essentially the reverse of the encoder shown in Fig. 3. Similarly to the TTCM receiver, the symbols are first soft demapped into 8 LLs corresponding to the 8PSK symbols and then, unlike in the TTCM receiver, the symbolwise LLs are converted into 3 bit-wise log likelihood ratios (LLRs). The LLRs are then de-interleaved and split into the three different LDPC codewords. After 50 iterations of the LDPC decoder, the performance of the system was assessed. The 50 iterations were also chosen such that the performance was within 0.1 dB of the asymptotic performance.

C. Codeword length
Throughout this paper we use N s to denote the number of symbols in the transmitted codeword, i.e., x = [x (1) , x (2) , . . . , x (Ns) ]. 1 In this section we study the impact of the codeword length N s on the performance of both CM schemes. This was done to ensure that chosen values of N s did not have a significant impact on the obtained results. We will consider N s = 64800 and N s = 21600. For the TTCM scheme of Fig. 2 that operates on symbols, the codeword length is the length of the interleaver. For the BICM LDPC scheme of Fig. 3, the individual encoders produce bit sequences of length 64800, which after serialisation and interleaving gives N s = 64800. To generate N s = 21600 symbols, only one of the LDPC encoders was used and the interleaver was omitted. Fig. 4 shows the impact of reducing the codeword length from N s = 64800 to N s = 21600 symbols on the post-FEC bit error rate (BER) performance. These results were obtained using an additive white Gaussian noise (AWGN) channel and show that the impact of codeword length N s to both schemes is minimal in the convergence region. However, a longer codeword length reduces the error floor for the LDPC scheme. Since a threefold increase in codeword length only delivered minor improvements, which indicates this is the in the convergence region, increasing the codeword length even further will only deliver diminishing returns. Furthermore, using N s > 64800 makes the post-FEC BER below the hard decision FEC threshold of 5 · 10 −5 for a 1% overhead FEC [25]. Therefore, from now on, we use a codeword length of N s = 64800 symbols. When N s = 64800 symbols, the results in Fig. 4 also show how TTCM outperforms BICM LDPC by about 0.5 dB.

III. PERFORMANCE MEASURES
At the receiver side of the bit-wise receiver in Fig. 1 (b), the decoder works on soft information available on the encoded bits. In such a system, the most popular performance metric is the pre-FEC BER, computed after hard-decision demapping, or equivalently after hard-decisions on the LLRs l = [l 1 , . . . , l m ] T . This metric might be an accurate predictor of the performance of coded modulation for small constellation sizes and signal to noise ratios (SNRs), however, its use has no theoretical foundation. Furthermore, this metric is in general a poor predictor of performance of coded modulation, as shown in [26]- [28]. Furthermore, when considering the symbol-wise decoder in Fig. 1 (a), the encoded bits are completely absent at the receiver, and hence, pre-FEC-BER cannot be used either [27], [28]. In this work, we use an information-theoretical approach and consider achievable information rates (AIRs). In particular-following [26] and [27], [28]-we consider mutual information (MI) and generalized mutual information (GMI) to assess and compare the performance of the two systems under consideration. Furthermore, we will also consider the post-FEC MI as a way to estimate the ultimate performance of the system considering the BICM LDPC and TTCM decoders.
The channel is modelled as a multi-dimensional correlated real memoryless channel Y = X + Z with transmitted symbols x = [x 1 , x 2 , . . . , x ND ] T ∈ X ⊂ R ND , additive noise z = [z 1 , z 2 , . . . , z ND ] T ∈ R ND and received symbols y = [y 1 , y 2 , . . . , y ND ] T ∈ R ND . 2 Here, X = {x 1 , x 2 , . . . , x M } is the set of constellation points, where |X | = M = 2 m and N D is the number of dimensions. The channel transition probability is given by where Σ is the covariance matrix.
In this paper, we consider the model in (1), because it has been previously shown to accurately model the noise from the optical transmission [29]. Furthermore, this model allows us to better describe phase noise acquired during transmission due to the Kerr effect. In all the performance assessments presented in this paper, we will sweep the SNR which is defined as As mentioned before, in this work we will use three information-theoretic performance measures: MI, GMI, and post-FEC MI. MI is an AIR for CM based on NB-FEC (e.g., for TTCM and also for BICM with iterative demapping and decoding) and GMI is an AIR for CM based on binary FEC and bit-wise decoding (e.g., for the LDPC scheme we consider in this paper). The post-FEC MI is an AIR for an outer code used after the CM decoder. Please note that the AIRs used in this paper are a lower bound on the AIRs of the true channel due to the mismatch between the chosen channel law and the true channel law [30]- [33]. In the following sections, we derive a closed form expression to approximate the MI and GMI using channel observations and show an expression to calculate the post-FEC MI using bit-wise LLRs.

A. Mutual information
The mutual information is defined as [34] where f Y |X (y|x) is the channel transition probability. In this paper we consider the correlated AWGN probability density function (PDF) given by (1) and we will use a ready-touse closed-form approximation for the MI of this channel (shown below). For a sequence of transmitted symbols x (n) and received symbols y (n) with n = 1, 2, . . . , N s , the MI for the channel in (2) can be approximated as where is the set of all timeslots where the ith constellation point was sent, z (n) = y (n) − x (n) , d ij = x i − x j , and Σ is the covariance matrix. The derivation for this expression can be found in the Appendix.

B. Generalized mutual information
The GMI [35, eq. (10)] is defined as the sum of the mutual information between the encoded bits (B k ) and the received symbols (Y ), The following expression gives a closed-form approximation for the GMI of a correlated AWGN channel. For a sequence of transmitted symbols x (n) and received symbols y (n) with n = 1, 2, . . . , N s , the GMI for the channel in (6) can be approximated as where is the set of indices of constellation points where the kth encoded bit in b = [b 1 , b 2 , . . . , b m ] T has the value l and |I l,k | = M/2. The derivation of (7) can be found in the Appendix. Expressions similar to (3) and (7)  Optionally, extra ASE is added to the signal for transmitter noise loading. A 1000 km transmission is emulated using a recirculating loop. The signal is then received and is processed off-line. Optionally extra noise is loaded at the receiver before the codeword is decoded.
however, expressions (3) and (7) are the first to present closedform approximations for the MI and GMI of constellations for a multi-dimensional correlated AWGN channel and evaluate their use with experimentally obtained results.

C. Post-FEC mutual information
CM is typically designed to be combined with a low rate outer code to get the BER down to the desired level (usually 10 −15 ). In this section, we discuss two AIRs for this outer code, one for hard decision (HD) codes, and one for soft decision (SD) codes. The relevance of these metrics is that when compared to the MI and GMI, they allow us to visualize the suboptimality of particular CM implementations.
Both BICM LDPC and TTCM decoders produce soft information on the data bits. We denote this soft information as λ = [λ (1) , λ (2) , . . . , λ (Ns) ], where are the LLR of the qth data bit at the nth symbol given the sequence of received symbols y. Note that this expression depends on the sequence of received symbols. This is because the decoder can use the whole sequence of received symbols to determine the bit probability. The fact that the decoder uses all the received symbols, however, does not imply that the channel has memory. As shown in Fig. 2 (c) (for TTCM) this soft information can be converted into (hard) bits, which we denote byĉ When the outer code is SD, the information-theoretical quantity we consider is the post-FEC MI, which is defined as where C q and Λ q are the random variable which describes the data bits and the LLRs λ (n) q , respectively. The MI in (10) can be approximated as where c = [c (1) , c (2) , . . . , c (Ns) ] and c = [c 1 , c 2 ] T . The approximation in (11) is obtained by assuming that the PDF of the LLRs in (9) satisfies the so-called consistency condition [13,Def. 3.8], [38, eq. (12)] and by using a Monte Carlo approximation of the one-dimensional integral. Note that under certain assumptions, the expression in (11) can also be used to approximate the GMI in (5). This can be done by using LLRs on encoded bits b instead of data bits c, as done in [39, eq. (2)] and [26, eq. (30)]. When the outer code is HD, we consider the MI between the information bits C 1 and C 2 and their respective HD estimates after decoding, i.e., where BER q is the BER at the qth decoder output and is the binary entropy function. Because of the data processing inequality, I SD ≥ I HD .
IV. EXPERIMENTAL SETUP The experimental transmission setup is shown Fig. 5. An external cavity laser at 1550 nm is modulated by a Mach-Zehnder modulator driven by an arbitrary waveform generator at 28 Gbaud for both in-phase and quadrature. Polarisation multiplexing is emulated by splitting the signal into two identical single polarisation signals, delaying one of the two signals and then recombining using a polarisation beam splitter. Transmitter-based noise loading was used to vary the SNR, by adding additional amplified spontaneous emission (ASE) from an Erbium doped fibre amplifier (EDFA). The signal was then transmitted using a recirculating loop. The recirculating loop consisted of a 75 km standard single mode fibre (SSMF) span with both EDFA and Raman amplification and was used to emulate transmission over a total distance of 1000 km. The launch power per span was set to 0 dBm to ensure linear propagation. A bandpass filter was used to remove the out-ofband noise and two acousto-optic modulators (AOMs) were used control the loading to the signal into the loop. The signal is then received by a DP coherent receiver. Standard off-line digital signal processing (DSP) [40] was used to equalize the signals and recover the noisy 8PSK symbols. The recovered constellations are then passed to the CM decoder.
The transmitted sequences were generated by encoding identical pseudo-random bit sequences with either the TTCM or the LDPC encoder from Sec. II. Codewords consisting of N s = 64800 8PSK symbols were transmitted and at the receiver, a single trace contained 7 codewords in each polarization, yielding 2.7 × 10 6 encoded bits or 1.8 × 10 6 data bits after decoding.
To further investigate the decoding performance, receiverbased noise loading was also employed. This was implemented by obtaining an experimental trace without transmitter noise loading after transmission over 1000 km and then AWGN was added digitally to the recovered constellation before decoding. The effectiveness of noise loading at the receiver will be explained in the next section. Fig. 6 shows the post FEC-BER performance of the two coding schemes as a function of SNR, where the SNR is measured from the recovered constellations at the receiver and the covariance matrix is estimated for each SNR value. We find that the measured performance (markers) for both schemes matches well the calculated performance for the implemented schemes assuming an AWGN channel (thin lines). An implementation penalty of less than 0.1 dB for both schemes is observed. We also see that both transmitter-based noise loading and receiver-based noise loading give very similar performance. Fig. 6 also shows a theoretical lower bound on the BER for 8PSK and code rate R = 2/3. This bound is also known as the rate distortion bound [41]. At this bound the binary entropy of BER multiplied by the number of data bits matches the AIR for the given SNR, i.e., 2(1 − H b (BER)) = AIR. In the case of the TTCM, the AIR used in this equation is the MI in (2). In the case of BICM LDPC, the AIR used in the equation is the GMI in (6). The distance between the bound and the actual performance is the penalty incurred from design and implementation of the actual code. We see that the implemented coding schemes are 0.5 and 0.8 dB away from the minimum theoretical lower bounds on BER given by MI( ) and GMI( ) respectively, for TTCM( ) and LDPC( ) respectively. As we will see below, these different gaps to the theoretical bounds also appear when AIRs are considered. Note also that the losses of 0.5 and 0.8 dB are due to the particular choice of TTCM and LDPC codes we consider here. The gaps for better codes could be smaller, however, the 0.1 dB gap given by the theoretical curves in Fig. 6 will always remain the same.

V. EXPERIMENTAL RESULTS
For transmitter-based noise loading (crosses in Fig. 6) we are only able to measure BERs down to 10 −4 due to the length of the received sequence. Receiver-based noise loading on the other hand allows for the estimation of BERs down to much lower levels as it is possible to noise load a single transmitted trace with many different noise realisations to build up the statistics. In these results we used 50000 different noise realizations in order to measure post-FEC BERs between 10 −7 and 10 −8 . The results for noise loading at the transmitter are in agreement with the results for noise loading at the receiver, and therefore, from now on we only consider noise loading at the receiver for post-FEC results.
Another method of comparing the performance of the two schemes, is using AIRs, as shown in Fig. 7. Here, the thick line is the AWGN capacity log 2 (1 + SNR) [34]. We also consider the AIR for 8PSK and an AWGN channel, using the expression for MI in (3) and for GMI in (7). The curves are the results for an CS AWGN channel, while the markers are obtained by calculating (3) and (7) using transmitter noise loaded traces from the experimental setup and N D = 2. Here, the majority of the noise is generated by an EDFA and has co-propagated with the signal for 1000 km. We find that the MI and GMI calculated from the experimentally obtained traces (triangles in Fig. 7) shows excellent agreement with the CS AWGN model, indicating that the optical channel in the linear propagation regime is well approximated by the AWGN model. These results give upper bounds for 8PSK-based transmissions of TTCM and binary BICM LDPC respectively.
The post-FEC AIRs are also shown in Fig. 7, which saturate at 2 bit/sym. These metrics are calculated using (11) and (13) for both schemes. The curves are obtained by calculating (11) for an CS AWGN channel and the markers are obtained from the experimental setup. We see that for 2 bit/sym, there is only a 0.06 dB SNR penalty between MI and GMI, however, for the actual codes that were implemented, we find that at the maximum achievable rate the TTCM outperforms the LDPC by 0.4 dB. This difference in performance may be attributed to the suboptimality of the codes under consideration. With the code we implemented in this paper, the performance difference is larger than the difference between the respective bound.  Fig. 7 also shows results for an HD outer code (circles). The differences between the SD and HD bounds are only minor in the region of interest (near 2 bit/sym), and thus, we conclude that only small penalties from choosing a HD outer code are to be expected. At lower SNR, where the AIR for SD codes is significantly higher than the AIR for HD codes, one can imagine that a code with a lower code rate can approach the MI and GMI bounds far closer than the codes used in this paper do. Around 6.3 dB SNR for the TTCM scheme and around 6.8 dB SNR for the LDPC scheme, the difference in AIR for the HD with respect to the SD codes is negligible.

VI. CONCLUSIONS
In this paper, we experimentally compared the performance of nonbinary FEC based on turbo trellis-coded modulation and LDPC-based binary FEC in terms of achievable information rates. These rates were evaluated using newly developed closed-form approximations for a correlated AWGN channel.
Unlike uncoded performance metrics, an informationtheoretic analysis based on mutual information and generalized mutual information was shown to allow fair comparisons between different modulation strategies. The AIRs can be compared for different modulation formats, including geometrically-shaped and probabilistically-shaped formats. Although in this paper all the gains were reported in terms of SNR (for a given AIR), this does not always have to be the case. For example, the same methodology can be used to report gains in launch power or reach. This analysis, however, did not always exactly match the performance of the particular coded modulation implementations under consideration. This is because the information-theoretic analysis considers an idealized setup, e.g., infinite block lengths, unbounded decoding complexity, etc.
In this paper we only considered one modulation and code rate, however, we conjecture our conclusions to also hold for other spectral efficiencies. The study of different combinations of modulation and code rate is left for further investigation. APPENDIX DERIVATION OF THE AIR EXPRESSIONS 1) Derivation of the MI expression: The MI in (2) can be approximated via Monte Carlo integration for any channel law f Y |X (Y |X) using the received symbols which we denote as y = [y (1) , y (2) , . . . , y (Ns) ]. In particular, this Monte Carlo approximation gives [26, eq. (17)] where N i is given by (4), and x = [x (1) , x (2) , . . . , x (Ns) ] are the transmitted symbols. Then, by substituting (1) into (15) and using y (n) − x i = z (n) and y (n) − x j = z (n) + d ij for n ∈ N i , we obtain: Rewriting the argument of the logarithm in (16), combining the exponents, and using the distributive property of matrix multiplications, the argument of the resulting exponential is rewritten as M j=1 exp 1 2 (z (n) ) T Σ −1 z (n) − (z (n) ) T Σ −1 z (n) Any covariance matrix is Hermitian positive-definite, and thus, (z (n) ) T Σ −1 d ij = d T ij Σ −1 z (n) . Using this with (17) in (16) gives (3).
2) Derivation of the GMI expression: The GMI in (6) can be approximated via Monte Carlo integration as where N i is given by (4) and I l,k by (8). In analogy to (15), the expression in (18) is a Monte Carlo approximation of the GMI for any channel law. The expression in (7) is obtained by using (1) in (18) and by following steps similar to (16)- (17).