On the Impact of Optimal Modulation and FEC Overhead on Future Optical Networks

The potential of optimum selection of modulation and forward error correction (FEC) overhead (OH) in future transparent nonlinear optical mesh networks is studied from an information theory perspective. Different network topologies are studied as well as both ideal soft-decision (SD) and hard-decision (HD) FEC based on demap-and-decode (bit-wise) receivers. When compared to the de-facto QPSK with 7% OH, our results show large gains in network throughput. When compared to SD-FEC, HD-FEC is shown to cause network throughput losses of 12%, 15%, and 20% for a country, continental, and global network topology, respectively. Furthermore, it is shown that most of the theoretically possible gains can be achieved by using one modulation format and only two OHs. This is in contrast to the infinite number of OHs required in the ideal case. The obtained optimal OHs are between 5% and 80%, which highlights the potential advantage of using FEC with high OHs.


I. INTRODUCTION AND MOTIVATION
The rapid rise in the use of the Internet has led to increasing traffic demands putting severe pressure on backbone networks. The transport backbone of the Internet is formed of optical mesh networks, where optical fibre links connect nodes formed of reconfigurable optical add drop multiplexers (ROADMs). Studying the ultimate transmission limits of optical mesh networks as well as the optimal utilization of the installed network resources is therefore key to avoid the so-called "capacity crunch" [1], [2].
Installed optical mesh networks utilize wavelength routing to transparently connect source and destination transceivers. The quality of the optical communication signal degrades due to transmission impairments, which in turn limits the maximum achievable data rate. This degradation is usually due to the amplifiers in the link as well as nonlinear distortion due to neighboring WDM channels. Furthermore, the optical signals within a transparent wavelength routed network travel a variety of distances, and thus, experience different levels of signal degradation. The most conservative design alternative for an optical network is to choose the transceiver to operate errorfree on the worst light path, i.e., for transmission between the furthest spaced nodes [3, p. 138], [4,Sec. 1]. Under this design paradigm, any route reconfiguration can be accommodated through the network. However, this leads to over provisioning of resources.
The increase in traffic demand together with the development of software-defined transceivers that can adapt the transmission parameters to the physical channel have increased the interest in designing networks that utilize the resources more efficiently. The degrees of freedom in the transceiver include, e.g., the forward error correction (FEC) scheme, FEC overhead (OH), modulation format, frequency separation (in flex-grid networks), launch power, and symbol rate (see [5,Fig. 3]). The network resources can be better utilized if these degrees of freedom are jointly optimized in conjunction with the routing of the optical light path through the network.
To cope with increasing capacity demands, future optical networks will use multilevel modulations and FEC. This combination is known as coded modulation (CM) and its design requires the joint optimization of the FEC and modulation format (see Fig. 1). The optimum receiver structure for CM is the maximum likelihood (ML) receiver, which finds the most likely coded sequence [6,Sec. 3.1]. The ML solution is in general impractical, and thus, very often the receiver is implemented as a (suboptimal) bit-wise (BW) receiver instead [6,Sec. 3.2]. In a BW receiver, hard or soft information on the code bits is calculated first, and then, an FEC decoder is used (see the receiver side of Fig. 1). In other words, practical receivers decouple the detection process: symbols are first converted into bits, and then, FEC decoding is applied. In this paper we consider BW receivers with both soft-decision FEC (SD-FEC) and hard-decision FEC (HD-FEC). BW receivers have been studied for optical communications in, e.g., [7], [8], [9] Traditional analyses of optical networks assume a target pre-FEC bit error rate (BER), and thus, an HD-FEC with fixed OH is implicitly assumed. 1 Although the most common value for the OH is 7%, higher OHs have become increasingly popular. Furthermore, the use of SD-FEC with high OHs (typically around 20%) is considered the most promising FEC alternative for 100G and 400G transceivers. Although from a theoretical point of view, fixing the OH is an artificial constraint that reduces flexibility of the CM design and reduces the network throughput, there are good reasons for fixing the OH. The client rates are usually quantized to 10, 40, 100, 400 Gb/s for compatibility with the Ethernet standards. The symbol rate is also often fixed to accommodate the transmitted bandwidth At the transmitter, a binary FEC code is concatenated with a M -ary QAM modulator. After transmission, the noisy received symbols are demapped and then decoded by a bit-wise receiver. When the demapper makes hard decisions on the symbols, an HD-FEC is used. When the demapper computes log-likelihood ratios (soft-decisions), an SD-FEC is assumed. The achievable rates discussed in Sec. III are also shown.
into a given fixed-grid, leading to fixed OHs. Under these constraints, no full flexibility on the selection of OHs is possible. In this paper, however, we will ignore these constraints and focus on finding the theoretically maximum network throughput.
To increase the network throughput, different approaches have been investigated in the literature. For example, [11] considered mixed line rates (10, 40, 100 Gb/s) for the NSF mesh topology, [12] considered variable FEC OHs with fixed symbol rate and modulation formats, and [13] considered variable modulation format and SD-FEC OHs. Adaptive FEC based on practical codes was recently considered in [4]. Variable OHs with 16QAM and 64QAM were studied in [14], where probabilistically-shaped constellations were considered. The optimal modulation format based on an approximation for the maximum achievable rates of HD-FEC was considered in [15]. Adaptive FEC OH for time frequency packing transmission was studied in [16] and [17]. The problem of routing and spectrum assignment for flex-grid optical networks with orthogonal frequency-division multiplexing was studied in [18]. The key enabling technology for these approaches are software-defined transceivers, allowing for example to vary the modulation format and symbol rate, as done in [19], [20], [21], [22], [23]. In [24], [17], a software defined transceiver with variable FEC was experimentally demonstrated.
To optimize the network design, a physical layer model is required. While in the past very simple models (e.g., reachbased models) were considered, recently, nonlinear effects have been taken into account using the Gaussian noise (GN) model [25], [26], [27], [28], [29], [30], [31]. In [32], the closed form solution of the GN model of [31] was used to adapt the routing and wavelength assignment problem, for a target pre-FEC BER, and 4 different modulation formats. The same GN model was used in [33] and [34] to jointly optimize power, modulation format, and carrier frequencies (flex-grid) for a fixed OH. Numerical integration of the GN model was used in [35] to assess SNR and throughput optimization via power and modulation adaptation. The numerically integrated GN model in [35] was also used in [36], [37] to sequentially optimize modulation format and power for a fixed FEC OH. The potential gains of adaptive FEC OH and modulation, or adaptive launch power and modulation were studied in [38].
In this paper, we study the problem of finding the optimal modulation and FEC OH from an information theory viewpoint. In particular, we use information theoretic quantities (i.e., achievable rates) and a realistic model for the nonlinear interference to study the maximum network throughput of optical mesh networks. For point-to-point links, and under a Gaussian assumption on the channel, the solution depends only on the signal-to-noise ratio (SNR). For an optical network, however, the solution depends on the SNR distribution of the connections. Therefore, the theoretically optimum CM design is obtained when the modulation size and FEC OH are jointly designed across the network. Significant increases in network throughput are shown. Furthermore, practically relevant schemes (based on either one or two OHs) are also considered, and their gap to the theoretical maximum is quantified. This paper extends our results in [39] by considering multiple network topologies as well as both HD-and SD-FEC. This paper is organized as follows. In Sec. II the system model, network topologies, and physical layer model are described. In Sec. III the optimal selection of modulation and coding is reviewed and the maximum network throughput is analyzed. In Sec. IV practically relevant schemes are considered. Conclusions are drawn in Sec. V.

A. System Model
We consider the CM transmitter shown in Fig. 1, where a binary FEC code maps the information bits I = [I 1 , I 2 , . . . , I kc ] into a sequence of code bits B = [B 1 , B 2 , . . . , B nc ], where R c = k c /n c is the code rate. At each discrete-time instant, m code bits are mapped to a constellation symbols from a discrete constellation with M = 2 m constellation points. We consider polarization-multiplexed square QAM constellations with m = 2, 4, 6, 8, 10 (i.e., M QAM with M ≤ 1024) in each polarization. The constellations are labeled by the binaryreflected Gray code. For FEC encoder with code rate R c and a modulation format with M symbols, the spectral efficiency (SE) per two polarizations is The FEC OH is At the receiver side a bit-wise (BW) receiver is used 2 . In such a receiver, the noisy symbols are first demapped, and then, FEC decoding is performed. The FEC decoder gives an estimate of the transmitted bitsÎ. Due to the separation of the detection process, BW receivers are suboptimal. They are, however, very popular in practice due to the use of off-theshelf FEC decoders.
In this paper we consider two types of FEC decoders: hardand soft-decision FEC. This naturally leads to two different BW receiver structures, shown on the right-hand side of Fig. 1. In a hard-decision BW receiver, the demapper makes hard decision on the bits (by making hard-decision on the symbols), which are then passed to an HD-FEC decoder. We assume that there is a random bit-level interleaver between the encoder and mapper, which we consider part of the FEC encoder (decoder).
In a soft-decision BW receiver, the demapper calculates soft information on the code bits (also known as "soft bits"), which are then passed to an SD-FEC decoder. The soft information is usually represented in the form of logarithmic likelihood ratios (LLRs), defined as where B q is the qth bit at the input of the mapper, and f Y |Bq (y|b), b ∈ {0, 1} is the channel transition probability. The sign of the LLRs corresponds to a hard-decision on the bits, while its amplitude represents the reliability of the available information.

B. Network Topologies
In this paper we consider 3 networks shown in Figs. 2, 3, and 4. The first one is the exemplary network topology for Deutsche Telekom Germany (DTG) [40, Sec. II], where the two core nodes per city (see [40, Fig. 1]) were merged into one. The second one is the reference 14-node, 21-link NSF mesh topology [36, Fig. 1], [35,Fig. 7]. The last topology is the Google B4 (GB4) network connecting data centers in [41, Fig. 1]. We chose to study these three topologies because they are representative of networks at three different scales: country, continental, and global.

C. Physical Layer Model
For the analysis in this paper, it is assumed that each node in the networks described in Sec. II-B is equipped with multiple transceivers. Furthermore, it is also assumed that these transceivers are based on the structure in Fig. 1 and that they can ideally adapt the FEC OH and modulation order. We consider a fixed grid of 80 WDM channels of 50 GHz and a symbol rate of 32 GBaud.
The nodes are connected with fiber pairs with standard single-mode fiber with parameters shown in Table I. Erbiumdoped fiber amplifiers (EDFA) are regularly placed between 2 Also known as bit-interleaved coded modulation receiver [6]   the links, as shown in Figs. 2 and 3 3 . The span length is 80 km and the EDFA noise figure is 5 dB. Every link in the network is modeled using a channel model (see Fig. 1) that encompasses all the transmitter digital signal processing (DSP) used after the mapper (i.e., pulse shaping, polarization multiplexing, filtering, electro-optical conversion, etc.), the physical channel (the fiber and amplifiers), and the receiver DSP (optical-to-electrical conversion, filtering, equalization, matched filtering, etc.). This channel is modeled using a dual-polarization, discrete-time, memoryless, additive white Gaussian noise (AWGN) channel.
For each polarization, and at each discrete time k = 1, 2, . . . , n, Y k = X k + Z k , where X k is the transmitted symbol, Z k are independent, zero-mean, circularly symmetric, complex Gaussian random variables, and n is the blocklength. This GN channel model characterizes optically-amplified links dominated by amplified spontaneous emission noise where where P is the launch power per channel, P ASE is the ASE noise added after each span (in the 32 GHz signal bandwidth), η is the nonlinear coefficient (per span).
The nonlinear interference is taken as that on the worst case central DWDM channel, i.e., we assumed that all the links were fully loaded with DWDM channels. The nonlinear coefficient η is calculated using the incoherent GN model of nonlinear interference [30], SPM is assumed to be ideally compensated, and the ROADM nodes were assumed lossless. Using the parameters in Table I, we obtain η ≈ 742 W −2 . The launch power that maximizes the SNR in (4) is found to be −1 dBm. A summary of the parameters discussed in this section is given in Table I.

D. Performance Metric
Throughout this paper, the main performance metric considered is the total network throughput, which we define below and denote by Θ. The network throughput is the total traffic transported by the network that satisfies the required traffic profile.
The network is assumed to have N nodes. The required connectivity is defined by the normalized traffic profile We assume connectivity between all pairs of nodes is required Let C s,d be the total available throughput between nodes s and d (across different routes), where where C (r) s,d is the available throughput in the rth route, and R s,d is the number of active routes between nodes s and d. The network throughput Θ is then defined as Θ min In this paper, we consider a uniform traffic profile, i.e., The network throughput in this case can be expressed as where (10) follows from (6).

III. OPTIMAL MODULATION AND FEC OVERHEAD
In this section, we describe the selection of optimal FEC OH and modulation format from an information theory viewpoint. We first consider the ideal case of continuous constellations and then move to the case of modulation with discrete number of constellation points. The routing and wavelength assignment problem and the maximum network throughput are also discussed in this section.

A. Channel Capacity
The capacity of the AWGN channel (in [bit/sym]) under an average power constraint is where the pre-log factor of 2 takes into account the two polarizations. The value of C represents the maximum number of information bits per symbol that can be reliably transmitted through an AWGN channel. The capacity in (11) is achieved when the transmitted symbols are chosen from a zero-mean Gaussian distribution. In practice, however, the modulation has M discrete levels, which reduces the achievable transmission rates. This case is studied in the next section.

B. Achievable Rates for Discrete Constellations
From an information theory point of view, the optimal code rate and constellation size can be chosen from the mutual information (MI). The MI, usually denoted by I(X; Y ), is an achievable rate for an optimal receiver. 4 MI curves for square QAM constellations indicate that, regardless of the SNR, in order to maximize the SE, the densest available constellation should always be used and the code rate chosen between 0 and 1. 5 This has been shown, .e.g., in [6, Fig. 4.3], [14, Fig. 1], [30, Fig. 11].
The MI is not an achievable rate for the two receiver structures we consider in this paper (see Fig. 1). The first   Fig. 6. OHs obtained from the achievable rates in Fig. 5 for a BW receiver with HD-FEC. The black asterisks show the SNR values where the modulation size should be changed and the corresponding FEC OH. The channel capacity in (11) is shown for comparison. The SNR required for QPSK with 7% FEC OH is also shown with a green pentagon. receiver in Fig. 1 is suboptimal because it makes harddecisions on the symbols (and thus, information is lost). The second receiver is suboptimal because the LLR calculation ignores the dependency of the bits within a symbol (i.e., L q in (3) does not depend on B l for l = q). In this paper we consider two different achievable rates, one for each of these receiver structures.
For the case of a BW receiver with hard-decisions, Shannon's coding theorem state that error-free transmission is possible when n → ∞ if the rate of the encoder fulfills [15, eq. (5)] [42, eq. (8)] where I(B;B) is the MI between the transmitted and received code bits (see Fig. 1). In (12), The key difference between the achievable rates I(B;B) and I(X; Y ) is that the former cross each other for different values of M (see the asterisks in Fig. 5). An important consequence of crossing achievable rate curves is that the theoretically optimal choice of R c and M is not straightforward. For a BW receiver with HD-FEC we consider here, QPSK should be used for SNRs below SNR ≤ 5.8 dB, 16QAM for 5.8 ≤ SNR ≤ 14 dB, etc. The corresponding FEC OHs obtained via (2) are shown in Fig. 8. This figure also shows the optimum minimum and maximum OH values for each modulation format as well as the SNR required for QPSK with 7% FEC OH (green pentagon).
When the BW receiver operates with soft-decisions, and if the code bits are independent, an achievable rate is given by where the second equality holds if the LLRs are calculated via (3) and where I(B k ; L k ) is the MI between the code bits and LLRs before FEC decoding. The three achievable rates considered above (MI, (12) and (13)) are schematically shown in Fig. 1. Fig. 7 shows the GMI in (13) for different constellations. Similarly to the achievable rates for HD-FEC, the GMI curves cross each other (see black diamonds). 6 Although this effect is less noticeable, the theoretical implications are the same: different SNRs call for different modulation sizes and FEC OH. In Fig. 7, we also show (with asterisks) the crossing points of the achievable rates for HD-FEC taken from Fig. 5. We do this to emphasize that if SD-FEC is considered instead of HD-FEC, different (higher) SNR thresholds are obtained. This is also visible in Fig. 8, where the optimum FEC OHs for SD-FEC are shown. The results in Figs. 7 and 8 have been recently experimentally studied in [46], [47].

C. Routing and Wavelength Assignment Problem
The routing and wavelength assignment problem is solved numerically as an integer linear programming (ILP) problem as described in [37,Section 4.1]. In particular, we maximize the network throughput in (10) (i.e., under a uniform traffic demand), where C s,d is assumed to be given by the capacity function C in (11). The ILP solution provides the number of active light paths and their routes between each node pair. From this, the total number of active light paths in the network is obtained. This solution also provides the SNR of each active light path. 6 We again emphasize that this is only due to the fact a suboptimal receiver is considered. MI curves, on the other hand, do not cross each other for M QAM constellations, which has been known for many years (see, e.g., [44, Fig. 2], [45, Fig. 1 Fig. 8. OHs obtained from the achievable rates in Fig. 7 for a BW receiver with SD-FEC. The black diamonds show the SNR values where the modulation size should be changed and the corresponding FEC OH. The channel capacity in (11) is shown for comparison.The SNR required for QPSK with 7% FEC OH (SNR 7% ) is also shown with a green pentagon.
The SNR values obtained by solving the ILP problem for the three network topologies in Figs. 2, 3, and 4 are shown in Fig. 9. The vertical bars show the number of transceivers that need to be installed to maximize the network throughput. The number of (two-way) transceivers for the DTG, NSF, and DTG networks are 1230, 1094, and 570, resp. The vertical bars in Fig. 9 can be interpreted as the distribution of SNR across the network. By comparing these distributions, it is clear that the average SNRs across the network decreases as the size of the network increases. This is due to the presence of long links in continental and global networks (NSF and GB4). The SNR distributions in Fig. 9 also show that the spread of the SNR values is much larger for large networks. While for the DTG network, the variation in SNR is about 10 dB, for the GB4 network, this variation is about 20 dB.
Once the SNR values for the active light paths are found, the maximum network throughput can be calculated via (10). In particular, the SNRs of the routes shown in Fig. 9 are first grouped for each source destination pair. Then, the SNRs are "mapped" to throughputs via (11), and the value of (10) is obtained. The resulting throughputs are 524, 278, and 88 Tbps, for the DTG, NSF, and GB4 networks, resp. These throughput values are shown in the first columns of Table II together with the number of transceivers for each network.

D. Ideal FEC
The results in the previous section assume all the transceivers can achieve the capacity of the the AWGN channel. This is never the case in practice as it requires the use of continuous constellations. In this section we we consider the case where all transceivers can choose any of the M QAM constellations considered in this paper. Although more practically relevant, we assume the code rate R c can be adjusted continuously, which is again never the case in practice. Nevertheless, the results in this section can be used  Figs. 2, 3, and 4. These values are obtained by assuming all transceivers can achieve the capacity C in (11). The horizontal lines show the SNR ranges in which different formats should be used. As in Figs. 6 and 8, asterisks and diamonds represent the SNR thresholds for HD-and SD-FEC, resp. The SNR thresholds for QPSK with 7% OH are also shown with green pentagons. to estimate the penalty caused by the use of discrete (and square) constellations. The selection of code rate and modulation format is assumed to be based on the achievable rates discussed in Sec. III-B. This idea is shown schematically in Fig. 9, where horizontal lines with different colors are included. These lines show the SNR ranges where different modulation formats should be used (lines with diamonds for SD-FEC and lines with stars for HD-FEC) and are obtained from Figs. 8 and 6.
To obtain the throughput achieved by ideal HD-and SD-FEC, we again use (10) and follow similar steps to those used in Sec. III-C. Namely, the SNRs of the routes (shown in Fig. 9) are first grouped for each source destination pair and the SNRs are then "mapped" to throughputs via the achievable rates discussed in in Sec. III-B. The network throughputs obtained for HD-FEC, are 431, 217, and 64 Tbps, for the DTG, NSF, and GB4 networks, resp. For SD-FEC, these values become 488, 255, and 81 Tbps. These throughput results are shown in the last two columns of Table II. The results in Table II show that, when compared to the maximum throughput obtained via the AWGN capacity assumption (fourth column in Table II), the use of ideal SD-FEC causes a relative throughput decrease of 9%, 8%, and 7%, for the DTG, NSF, and GB4 networks, resp. This indicates a relatively constant loss across different network topologies. On the other hand, the use of ideal HD-FEC causes relative losses of 18%, 22%, and 27%. These result show an increasing loss as the network size increases, which in turn shows the importance on considering SD-FEC for large networks. We conjecture that these increasing losses are due to the different shape of the "envelopes" of the crossing achievable rates in Figs. 5 and 7.
When compared to SD-FEC, HD-FEC codes are typically low complexity and low latency. On the other hand, for the same SNR, HD-FEC codes need higher OH to operate error free, which causes a throughput loss. The relative throughput losses are approximately 12%, 15%, and 20% for the DTG, NSF, and GB4 networks, resp. This indicates that low complexity and latency can be traded by a 10 − 20% loss in throughput and that the use SD-FEC becomes more and more important as the network size increases.

IV. PRACTICAL SCHEMES
Due to the continuous code rate assumption, the throughputs in the last two columns of Table II are only upper bounds that cannot be achieved in practice. In this section we discuss practically relevant alternatives.

A. QPSK with 7% FEC OH
Probably the simplest (and most popular) alternative in terms of coding and modulation for an optical network is to consider QPSK and a fixed FEC OH of 7% across the network. In this case, if the SNR of a given route is below the required SNR for QPSK with 7% OH, the route will note be used. If the SNR is above the threshold, then the available throughput in the rth route C (r) s,d will be given by the SE in (1) times the symbol rate. The total available throughput (in [Tbps]) in (7) is then given by where SNR (r) s,d is the SNR of the rth route and SNR 7% is the SNR required for QPSK with 7% OH (shown with green pentagons in Figs. 6, 8, and 9).
The total network throughput in (10) (in Tbps) is then where I [ν] is an indicator function, i.e., I [ν] = 1 if ν is true, and I [ν] = 0 otherwise. The SNR threshold SNR 7% in (15) is different for HD-and SD-FEC, and thus, the resulting throughputs might also be different. However, for both the DTG and NSF networks, all the route SNRs are above both thresholds, and thus, the total network throughput in (15) is The minimum number of routes for any source destination pair for the NTG and NSF networks are 14 and 4, resp., which combined with the number of nodes in the network, give throughputs of 120 and 87 Tbps. For QPSK and 7% FEC OH, SD offers a theoretical maximum sensitivity increase of about 1.15 dB 7 . However, for the DTG and NSF networks, there is no difference between HD and SD in terms of network throughput as the minimum SNR of all routes is above the SNR threshold. This result highlights the fact that under these conditions and traffic assumptions, upgrading all transceivers from HD to SD FEC without changing the OH might not bring any benefit for small networks.
When the GB4 network is considered, however, most of the routes are in fact below the SNR required for QPSK with 7% OH (see Fig. 9). For this case, the network throughput given by (15) is in fact zero. This can be intuitively explained by the fact that there are node pairs that are very far apart, and thus, the uniform throughput constraint and full network connectivity cannot be satisfied.

B. One M Schemes
As an alternative to the QPSK with 7% FEC OH approach, in this section we consider two approaches, both of them based on the philosophy that only one modulation format should be implemented across the network.
The first scheme assumes all transceivers implement one modulation format and one code rate. We call this scheme 1R c 1M . In this scheme, the code rate is chosen such that the network throughput (based on achievable rates) is maximized. The results obtained for 1R c 1M are shown in Fig. 10 with red triangles for HD-FEC and with blue triangles for SD-FEC. The top figure shows network throughput, while the bottom ones show optimum code rates. The different networks under consideration are shown from left to right. In the throughput results, we also include the ideal network throughputs in Table II (solid horizontal lines) as well as the results obtained 7 This can be obtained by comparing the green pentagons for HD and SD FEC in Figs. 6 and 8. with QPSK and 7% FEC OH from Sec. IV-A (green pentagons).
The results in Fig. 10 show that for HD-FEC and 1R c 1M , there is always a modulation format that is optimum: M = 64 for DTG, M = 16 for NSF, and M = 4 for GB4. Nevertheless, for both HD-and SD-FEC, the gains obtained (with respect to QPSK with 7% OH) by using 1R c 1M are quite large. Interestingly, the optimum code rates for HD-FEC and 1R c 1M are R c ≈ 0.8 for DTG, R c ≈ 0.62 for NSF, and R c ≈ 0.56 for GB4 (24%, 61% and 79% FEC OH, resp.). This indicates a clear potential benefit of using large FEC OH when HD-FEC is used in a network context and where the modulation format is fixed across the network.
When it comes to 1R c 1M with SD-FEC, the optimality of a given modulation format is less clear, as the throughput curves in this case do not have a clear peak. This is due to the fact that the crossings between the GMI curves is not as pronounced as the crossings for HD-FEC. Nevertheless, by observing the trend of the curves, a good compromise would be to choose the same modulation formats as for HD-FEC, i.e., M = 64 for DTG, M = 16 for NSF, and M = 4 for GB4. In this case, the corresponding optimum code rates are R c ≈ 0.89, R c ≈ 0.75, and R c ≈ 0.68 (12%, 33% and 47% FEC OH, resp.). In general, the optimum code rates in this case are slightly higher than the ones for HD-FEC. The gains of SD-FEC over HD-FEC in this case are approximately 50 Tbps for the DTG and NSF networks. On the other hand, only small gains are observed for the GB4 network.
As mentioned before, the throughput for QPSK with 7% FEC OH for the GB4 network is zero. On the other hand, for this network, 1R c 1M gives throughputs of about 20 Tbps. This is obtained by using QPSK and an increased OH. This result highlights the need for considering high FEC OH in large networks.
The second scheme we consider in this section is called 2R c 1M . In this case, the transceivers are equipped with two code rates but only one modulation format. Again, the code rates are optimized so that the network throughput is maximized.
The results obtained for 2R c 1M are shown in Fig. 10 with circles (red for HD-FEC and blue for SF-FEC). These result show that, regardless of the network under consideration, half of the gap between the ideal FEC limit (horizontal lines) and the throughput obtained by 1R c 1M can be harvested by using an extra code rate. These results highlight the advantage of adapting the code rate to the variable channel conditions across the network. These results also suggest that a good complexityperformance tradeoff is obtained by using one modulation format and two code rates.

C. Variable M Schemes
In this section we consider a design alternative where the modulation format is variable. In particular, we assume all the transceivers can choose any modulation format in between QPSK and a given maximum value of M , which we denote bŷ   The first scheme we consider assumes only one code rate is implemented across the network, and that the code rate is optimized to maximize the network throughput. We denote this scheme 1R c VarM . The results obtained using this scheme are shown in Fig. 11 (squares) and indicate that, in general, constellation sizes beyond 256QAM give little throughput increases (the throughput curves flatten out for large values ofM ).
Figs. 12 and 13 show the percentage of transceivers using different modulation formats for 1R c VarM , for HD-and SD-FEC, resp., and forM = 16, 64, 256. The value in the middle of each chart is the total throughput obtained. The general trend in these results is that QPSK (green) is only useful in the very large network (GB4). These results also show that, considering 256QAM (light blue) gives very little throughput increases for the NSF and GB4 networks, for both HDand SD-FEC. On the other hand, 256QAM gives a relevant throughput increase for the DTG network.
To compare the throughput contribution of different modulation formats for 1R c VarM , we show in Fig. 14 the obtained throughputs for both HD-and SD-FEC. Apart from showing the relative contributions, this figure also shows how the contributions change when SD-FEC is considered instead of HD-FEC. The throughput gains due to SD-FEC are also clearly visible.
The results in Figs. 11, 12, 13, and 14 show that 1R c VarM gives similar throughput results to those obtained by 2R c 1M (two code rates and one modulation format) shown in Fig. 10. This indicates that similar (large) gains, can be obtained by having either multiple code rates or multiple modulation formats.
The second scheme we consider in this section is one where two code rates are implemented across the network (and the modulation can be varied too). We call this scheme 2R c VarM . The results are shown with plus-circles in Fig. 11. These results indicate that including a second code rate gives a clear advantage with respect to 1R c VarM (squares). This gain is particularly visible for large values ofM and large networks. In particular, for the GB4 network andM = 256, the gains are approximately 15 Tbps for both HD-and SD-FEC.
We conclude this section by comparing the optimal code rates of two-rate versus the one-rate schemes. In particular, we observe from Figs. 10 and 11 that the smallest code rate for the two-rate schemes is always quite close to the optimal code rate of the corresponding one-rate scheme. The intuition behind this is that when the transmitters are equipped with two code rates, one low code rate can be used for the worst performing connection, while the other rate is used to increase the overall network throughput.
V. CONCLUSIONS Optimal constellation sizes and FEC overheads for optical networks were studied. Joint optimization of the constellation and FEC OHs was shown to yield large gains in terms of overall network throughput. The optimal values were shown to be dependent on the SNR distributions within a network. Two code rates and a single constellation (which varies as a function of the network size) gave an good throughputcomplexity tradeoff.   In this paper we studied the problem from the point of view of largest achievable rates. Practical FEC implementations, however, will operate a few decibels (or fractions of decibels) away from these achievable rates. Nevertheless, if these penalties are known a priori, the methodology used in this paper can be straightforwardly used to consider practical codes. This is left for future investigation.