InP photonic integrated multi-layer neural networks

We demonstrate the use of a wavelength converter, based on cross-gain modulation in a semiconductor optical amplifier (SOA), as a nonlinear function co-integrated within an all-optical neuron realized with SOA and wavelength-division multiplexing technology. We investigate the impact of fully monolithically integrated linear and nonlinear functions on the all-optical neuron output with respect to the number of synapses/neuron and data rate. Results suggest that the number of inputs can scale up to 64 while guaranteeing a large input power dynamic range of 36 dB with neglectable error introduction. We also investigate the performance of its nonlinear transfer function by tuning the total input power and data rate: The monolithically integrated neuron performs about 10% better in accuracy than the corresponding hybrid device for the same data rate. These all-optical neurons are then used to simulate a 64:64:10 two-layer photonic deep neural network for handwritten digit classification, which shows an 89.5% best-case accuracy at 10 GS/s. Moreover, we analyze the energy consumption for synaptic operation, considering the full end-to-end system, which includes the transceivers, the optical neural network, and the electrical control part. This investigation shows that when the number of synapses/neuron is > 18, the energy per operation is < 20 pJ (6 times higher than when considering only the optical engine). The computation speed of this two-layer all-optical neural network system is 47 TMAC/s, 2.5 times faster than state-of-the-art graphics processing units, while the energy efficiency is 12 pJ/MAC, 2 times better. This result underlines the importance of scaling photonic integrated neural networks on chip.


I. INTRODUCTION
Massive volume of data demands wider capacity and higher speed of information processing. The extraction of effective information from databases remains a challenge as it requires huge power and processing time. The new computing paradigm of nonvon-Neumann architectures has begun to unfold, 1 leading to the development of large neuromorphic machines that now exceed the energy and size-efficiency walls of classical platforms, 2-8 because of their inherent parallel computational schemes. These deployments are mainly based on the spiking architectural model 9 that very recently have shown the potential to outperform multi-layer perceptron (MLP) models. 10 Nevertheless, being more complex, these models are still not fully understood, unlike the more advanced Deep Learning (DL) models. The rich DL model portfolio can be indeed utilized in digital graphics processing unit (GPU) and tensor processing unit (TPU) engines as well as in the constantly growing number of emerging artificial neural network (ANN)-based analog electronic AI chipsets: Mythic's architecture, 11 for example, can yield high accuracy inference applications within a remarkable energy efficiency of just 0.5 pJ/MAC. However, the size and energy advantages of electronic processing elements are naturally counteracted by the speed and power limits of the electronic interconnects inside the circuits due to RC parasitic effects, with current machines hardly exceeding GHz clock frequencies, exacerbating power dissipation issues, and limiting the achievable data throughput. 12 Neuromorphic approach has been applied to optical computing: In contrast to electronics, there is negligible energy overhead for moving light encoded information around, which enables unprecedented circuit interconnectivity and speed. Moreover, bitrate agnostic photonics has the potential to enable higher bandwidth applications. A number of photonic accelerators have been proposed based on discrete optical components and micro-optics as well as on photonic integrated devices. [13][14][15] This emerging technology is capable of producing high processing bandwidths with high power efficiency. 16 The large parallelism, energy efficiency, and ease of broadcast/multicast capabilities of photonics are well suited for the design of highly efficient and scalable neural network accelerators. By exploiting the properties of photonics, linear transformations can efficiently be performed at high data rates without consuming significant power. 17,18 The advantages of the parallel nature of light are now being exploited via coherent electrical field summation 19,20 and wavelength-division multiplexing (WDM) optical power addition based photonic integrated networks; [21][22][23][24] however, crosstalk, noise accumulation, and low dynamic range prevent further scalability, even when using phase change materials for zero-electrical power computation. 23 Recently, we have proposed a new deep neural network (DNN) architecture that exploits indium phosphide based photonic integrated circuits. 24,25 By setting the gain of the semiconductor optical amplifier (SOA) as the (trained) weighted factor to the WDM input, the cross-connect is used as an analog engine with off-line nonlinear functions. Feeding the layer output back to the optical input and reconfiguring the on-chip weight matrix, a feed-forward photonic neural network is demonstrated. 25 A linear synaptic function, also called weighted addition, and a nonlinear function, known as activation function, are the base functions of an artificial neuron. Photonic integrated linear 19,26,27 and nonlinear functions 26,28 have been recently demonstrated, relying on hybrid integration schemes or involving electro-optical conversions, preventing further scalability of photonic neural networks.
In this paper, we analyze the performance of a deep neural network architecture concept based on the use of photonic crossconnects, where the combination of space and wavelength selection is exploited to implement, respectively, the axon terminals and the synaptic operations in a photonic artificial neural network. After the description of the overall computational architecture in Sec. II, the co-integration of SOA-based synaptic operations [in the form of a combination of SOAs and array waveguide gratings (AWGs)] and nonlinearity [in the form of a fully integrated wavelength converter (WC) based on cross-gain modulation (XGM)] is studied in Sec. III to enable a fully monolithically integrated all-optical neuron and therefore an all-optical neural network (AONN). In Sec. IV, we simulate the all-optical network to solve the handwriting digit classification problem to evaluate final accuracy. Finally, in Sec. V, we analyze the energy consumption of the complete end-to-end system.

II. ALL-OPTICAL SOA-BASED DNN
The overall envisaged all-optical deep neural network scheme based on the use of the cross-connect circuitry is depicted in Fig. 1. The wavelength division multiplexed (WDM) signal from N input neurons (one wavelength from each input neuron) is fan-out toward the following M neurons of the first hidden layer. At each ith neuron of this layer (highlighted through an orange box), the multiplewavelength signal is demultiplexed into N signals, which are multiplied, via an SOA, by the weight w i,j,k of the ith neuron from the jth axon (λj) and in the kth layer. The weighted signals, being encoded in different colors, are then summed up via an AWG-based multiplexer. This first circuitry (black dashed box in Fig. 1) corresponds to the linear part of the ith neuron, whose output is sent out to a nonlinear function block NL i,k of the same neuron of the kth hidden layer. This is realized via an SOA (red dashed box in Fig. 1), where the enabled XGM is used to output a wavelength-converted light, modulated by the total power of all WDM channels at the input of the SOA-based wavelength converter (SOA-WC), for the conversion into a different single wavelength, λi ′ , which represents the output of this neuron. The outputs from all the neurons of the kth hidden layer are then combined and broadcast again toward all the neurons of the next hidden layer, and so on and so forth. It is important to note that the shuffle network here is obtained by combining AWGs and one big 1:M splitter (for example, moving from the first to the second hidden layer), in place of M times AWGs, which would deteriorate crosstalk as well as introduce a deleterious path dependent loss.
Here, the SOA technology is exploited in combination with the AWG technology for multiple reasons: The optical amplifiers are employed for setting the weight matrix and providing on-chip gain for scalability, while the AWGs are used to filter out the outof-band noise built up by cascading multiple stages of SOAs in order to increase the weight resolution as well as to carry out the needed multiplexing and demultiplexing functions. In Ref. 25, we have demonstrated the synaptic operation using an 8 × 8 InP SOAbased cross-connect chip, followed by an array of photodetectors, to further process the signals in the electrical domain and to perform an analysis of the sources of error. Indeed, a reduction in accuracy happened, which was dominated by the electro-optical conversions needed to move from the optical linear function to the electrical nonlinear function as well to progress from one layer on chip to the next one, which suggested moving to an all-optical approach. 25 We now investigate other sources of errors and scalability properties of the linear neuron, specifically the crosstalk as a function of the number of channel inputs (axons). The optical crosstalk, coming from the AWGs, limits the linear circuitry scalability as soon as the number of neuron inputs (channels) increases. For this reason, we analyze the normalized root mean square error (NRMSE) of the synaptic unit of the neuron, as shown in Fig. 1 (black dashed box), while tuning the total input power and the total number of neuron inputs for a channel spacing of 100 GHz. This has been analyzed via the VPIphotonic Design Suite (using parameters as detailed in Ref. 29). In Fig. 2(a), the colored lines represent the error obtained after the synaptic unit for 4, 8, 16, 32, and 64 inputs/neuron while tuning the total input power of the WDM input to the neuron from −25 to 20 dBm. The results show that there is an optimized optical power operation point for reaching the minimum error. It is notable how this point shifts down right side for a larger number of neuron inputs. To better visualize and explain this trend, the same NRMSE is plotted as a function of the number of the input channels for a fixed input power of 5 dBm at the AWG input [see Fig. 2 When increasing the number of neuron inputs, the error decreases, as shown in blue line, while it starts slightly increasing only for a number of channels higher than 32: The vertical scalability of the neural network (height), and therefore a higher channel number, results in an increase in the resolution of the linear summation output, since more channels at the input contribute to increasing the total number of the output signal levels within the same dynamic range, resulting in a smoother output signal pattern. In particular, the error is found to increase for 64 channel inputs due to the limited modeled SOA bandwidth (71.5 nm 3-dB gain bandwidth); in fact, 64 channels spaced 100 GHz already fill up 51.2 nm bandwidth. The red line in Fig. 2(b) plots the input power dynamic range (IPDR) as a function of the number of the input channels per neuron and for an NMRSE <0.09: for this level of error, we have previously shown that a three-layer neural network results in <5% degradation of the prediction accuracy for an image classification problem. 29 The IPDR increases from 25 to 36 dB, which is partly attributable to the large SOA linear regime (−5 dBm input saturation power) but also to the fact that with the increasing number of input channels, the power fed to the individual weight SOA will be much lower than the input saturation power, making the SOAs working in the linear regime for a wider input power range. The trend slows down when the number of channels approaches 64 since we come closer to the bandwidth edges of the SOA.

III. SOA-BASED INTEGRATED ALL-OPTICAL NEURON
So far, we have proposed to use SOA and AWG to implement optical linear neurons. 24 In this section, we investigate the possibility of realizing an all-optical SOA-based neuron to realize multi-layer neural networks and avoid electro-optical conversions for improving energy efficiency while still guaranteeing a good accuracy. To this aim, the optical output of the synaptic operation is input straight to an SOA-based nonlinear function. The exploitation of SOA-based circuits for both the linear and nonlinear functions of an artificial photonic neuron enables the monolithic integration of both functionalities to overcome optical loss issues deriving from a hybrid approach. We first study the nonlinear function based on a wavelength converter (Sec. III A), and then we investigate the overall performance of the complete neuron, integrating the optical linear neuron with the SOA-WC optical nonlinear function (Sec. III B).
Before describing the experimental measurements and simulations in Subsections III A and III B, it is important to discuss the assumptions made on any four-wave mixing (FWM) effect happening within the nonlinear SOA. Depending on the wavelength separation, input power, and the number of WDM channels, the FWM inside the SOA may have a non-negligible influence on the overall performance. However, this is not considered in the simulation, neither is observed in the experimental phase. In fact, this effect is neglectable when the detuning between the probe channel and the pump frequency Δf ≫ 1/(2π ⋅ τ), where τ is the carrier lifetime of the used SOA. In this work, the carrier lifetime is estimated to be 200 ps in the worse case, 30 and the channel spacing at the input is 100 and 400 GHz for the simulations and the experimental work, respectively, which results in detuning that is far greater than 1/(2π ⋅ τ) ≈ 1 GHz. By exploiting the methods in Refs. 31 and 32, we estimate that the conjugate signal generated by the FWM effect has a power of the order of <−64 dBm when the detuning used in simulation is >100 GHz, which is even lower than the spontaneous emission noise at the neuron output. Moreover, in order to suppress the FWM for larger number of input channels, we control the total input power of the neuron by defining an appropriate scaling of the neural network. In our approach, the network size is considered scaling up with input channels N and with the same number of neurons M. In this way, the total input power to the neuron will stay constant when scaling N and M, i.e., for each channel power p 0 , the total input power at the layer input N ⋅ p 0 will be split toward M neurons as N ⋅ p 0 /M. When ARTICLE scitation.org/journal/app setting N = M, the total power to each neuron (yellow box in Fig. 1) will be p 0 , and the power for each channel will be p 0 /N. For such a condition, the FWM effect results reduced due to the decrease in the input signal power as well as to further detuning of the individual input channels. Finally, using unequal channel spacing at the WDM inputs, 33,34 the FWM effect can be further eliminated.
The experimental setup used for the SOA-based all-optical neuron investigation is depicted in Fig. 3(a) with a micrograph of the fabricated chip shown in Fig. 3(b). A four-channel WDM optical input is composed of signal wavelengths set at 1544.0, 1546.0, 1551.0, and 1554.0 nm in order to match the nominal 3.2 nm channel separation of the on-chip AWGs with a 3-dB bandwidth of 0.8 nm and to maximize each channel optical power output. The input is modulated with pseudorandom binary sequence (PRBS) on-off keyed (OOK) data, generated by the arbitrary waveform generator (Tektronix, AWG7122B), and sent to the input of the integrated alloptical neuron after de-correlation. The WDM optical inputs are equalized and set at −12 dBm power per channel.
Inside the neuron in Fig. 3(b), the inputs are amplified with a booster SOA, which is utilized to optimize the total power at the input of the weighting SOAs. Then, the signals are weighted with individual SOAs after channel demultiplexing by the AWG and combined again with an AWG-based multiplexer and fed to the SOA-WC based optical nonlinear function, whose pump laser is fully integrated on chip. This provides a converted output at 1549.0 nm, chosen to be close to the center of the WDM channel bandwidth for optimizing the wavelength conversion.
The weighting SOAs are controlled by a weighting current controller (Thorlabs, MLC8200CG), with 50 μA resolution, to provide the weights in 10-bit precision, which exceeds the required precision for image classification. The current synapse control is envisioned to be realized by means of an field-programmable gate array (FPGA) controlling multi-channel current drivers 35 when further scaling the number of neuron synapses. In the future, the parallel development of ultra-compact driver ICs, of new electronic interface techniques, and of cleaver electrical control schemes seems to be a viable route toward enabling control of larger size photonic networks on chip.
The individual weight SOAs are calibrated to compensate for the wavelength conversion non-uniformity among the different channels: This calibration happens prior to the assignments of the actual weighting factors. The noise figure and the saturation output power of these SOAs are 7 dB and 8 dBm, respectively. The output of the all-optical neuron is detected by a linear avalanche photodetector (PD) and the time trace is recorded by a digital phosphor oscilloscope. The performance of the neuron is again evaluated by calculating the NRMSE between the recorded and the expected time traces at the output of the NL function, calculated using the reference pre-recorded inputs. The synaptic operation of the neuron can be expressed as a weighted addition of parallel inputs: y = ∑wixi, where wi is the ith weight element for input xi, and the final output of the neuron is o = φ(y), where φ is the nonlinear transfer function of the SOA-WC.

A. Integrated SOA-based non-linear function
The wavelength converter is the nonlinear device that we exploit as an optical nonlinear function within the neuron. The SOA-WC is also integrated on the InP platform, with an on-chip tunable laser. 36 The integration of the all-optical nonlinear function allows us to demonstrate a monolithically integrated SOA-based alloptical neuron. 37 In order to measure only the transfer function of the nonlinear part working at first as a simple inverter, we record the PRBS OOK input of the neuron and the output of the SOA-WC. The correlation map of the two is the nonlinear transfer function (NL-TF), which we can use to calculate the expected output for the entire all-optical neuron. The blue line time trace in Fig. 4(a) plots the pre-recorded 2 Gbit/s single channel input signal. Figure 4(b) presents the detected output of the SOA-WC based NL-TF in the blue line and the expected inverted signal [calculated from Fig. 4(a), with the linear transform as reference-Lin. Ref.] in the red line, resulting in an error of 0.14. By plotting the correlation map between the input and the output of the integrated SOA-WC detected at the PD, the optical nonlinear transfer function is illustrated in Fig. 4(d), where the blue crosses are the data, the black line is the linear transform, and the red line is the third-order polynomial fitted nonlinear transform. The nonlinear function shape is mainly due to the contribution of the nonlinear response of the SOA used as the wavelength converter when the booster works in transparency, with a current density of 1 kA/cm 2 and a weighting current density at 3 kA/cm 2 on average (linear regime). Then, the same nonlinear function shape is utilized as a nonlinear reference (Nonlin. Ref.) to calculate the real expectation of the output, as shown in Fig. 4(c), resulting in a smaller error of 0.08.
The pump input power of the wavelength converter can be tuned by increasing the current of the booster: A different level of pump input power provides a different transfer function shape. level "−1" (for input level "1") tend to saturate when increasing the booster current because of the nonlinearity changes due to the increased input probe power to the SOA-WC and because of the nonlinearity contributed from the booster SOA itself. This confirms that we can tailor the nonlinear transfer function by acting on the booster current. We also explore the SOA-WC based NL-TF shape as a function of the data rate: Fig. 5(b) plots the nonlinear function when the input data rate is 2, 4, 6, 8, and 10 Gbit/s, with the blue, The nonlinear transfer function when the input data rate changes from 2 to 10 Gbit/s. Error obtained at the neuron output when tuning the booster current density (c) and when tuning the input data rate (d), comparing to expectation calculated using linear transform as reference (blue) and nonlinear transform as reference (red).
red, yellow, purple, and green line, respectively, when the booster is at 1 kA/cm 2 . The shape of the nonlinear function changes only slightly when increasing the input data rate. We then translate these findings into performance metrics of the optical nonlinear function by calculating the NRMSE with respect to different shapes of the nonlinear function. Figure 5(c) plots the error variations of the output of the NL-TF when tuning the injection current density of the booster SOA from 0 to 2.5 kA/cm 2 , with the blue line obtained when considering the neuron output as the linear inverted output (with linear transform as reference) and the red line when considering the nonlinear transform as reference. The booster SOA is operated in the linear regime to minimize the nonlinearities introduced at the weighting element inputs, since the overall weighted addition operation is meant to be a linear operation. By changing the booster SOA current, we can find the optimal operation point for minimized error induced by the nonlinear function, in this case corresponding to a current of 1 kA/cm 2 [ Fig. 5(c)]. The noticeable offset between the blue and red curves indicates that the nonlinearity of the SOA-WC has quite some effect on the error reduction. Figure 5(d) plots the error variation when changing the data rate of the input from 2 to 10 Gbit/s, with the blue line showing the error related to the linear transform reference and the red line showing the error related to the nonlinear transform reference. In both cases, the error of the nonlinear function increases with the input data rate. Again, the nonlinear function improves accuracy, moving from 0.08 to 0.15 NRMSE when increasing the data rate up to 10 Gbit/s. The deterioration in accuracy for the higher data-rate is mainly due to the limited carrier lifetime of the integrated SOA-WC, which cannot fully follow the speed of the incoming optical signal. The offsets between the blue and red lines in both Figs. 5(c) and 5(d) show that the use of the correct nonlinear transfer function reduces the error of up to 50%, compared to the case when we use the SOA-WC with its simply linear response.

B. All-optical monolithically integrated neuron
The monolithic integration of the synaptic operation and the optical nonlinear function allows us to investigate the performance of the SOA-based all-optical neuron concept. The four-channel WDM PRBS-OOK input is coupled at the neuron input, with a data rate of up to 10 Gbit/s. The output is detected and compared to the calculated time trace with the NL-TF obtained following the procedure explained in session III-A. Figure 6(a) plots the time traces as a linear combination of the weighted input data, where the red line presents the recorded signal and the blue line is the expected linear combination of the weighted addition, without NL-fitting, resulting in an error of 0.17. Figure 6(b) instead shows the output of the recorded output signal with the nonlinear transform reference, where the blue line is the recorded signal and the red line is the expectation, resulting in a smaller error of 0.15-a 10% error reduction. We also change the number of input channels and tune the data rate of the input signals to better analyze the performance of this all-optical neuron. Figure 6(c) illustrates the error of the complete optical neuron output. The blue circle, red triangle, and yellow square symbols represent the errors of the all-optical neuron when the input channel changes from 1, 2 to 4 channels, respectively. In line with Fig. 5(d), the curves show that the output error increases with the input data rate. Moreover, with the increase in the channel number, the error tends to increase as well. With more channel input to the SOA-WC, the nonlinearity at the SOA-WC is reduced as the increasing input probe power will push the cross-gain modulation regime toward a linear conversion regime. This means that an optimization of the operation regime of the wavelength converter is needed to help increase its nonlinearity, e.g., by tuning the power of the CW laser. Moreover, in Fig. 6(c), we also add a greenfilled triangle to show an average error of 0.15, which is obtained when combining the integrated linear unit with a discrete nonlinear SOA wavelength converter, 38 with 10 Gbit/s per channel input and two channel weighted addition. This shows that the monolithically integrated all-optical neuron performs 10% better in terms of error introduction than the hybrid case under the same data rate condition. One reason for that can be that a discrete implementation generates additional noise due to the off-chip amplification. Finally, the integration of the tunable laser and SOA-WC also reduces the total power consumption as the external laser is not required, neither the additional off-chip amplifier. Further investigation shows that by using discrete SOA-WCs with optimized carrier dynamics, the multi-level conversion brings to a calculation error less than 0.09, 39 shown as green-unfilled triangles in Fig. 6(c). In Sec. IV, we show the simulation of an all-optical multi-layer neural network by exploiting both the synaptic operation and the nonlinear function as realized and measured so far.

IV. MNIST DATASET CLASSIFICATION WITH AN SOA-BASED ALL-OPTICAL NEURAL NETWORK
The combination of the linear neuron with the wavelength converter (Sec. III) eventually converts the multiple weighted wavelength inputs, after their addition, into one single wavelength which is the actual output of the complete neuron (yellow box in Fig. 1). In particular, the recorded transfer function of the integrated SOAbased wavelength converter, shown in Fig. 4(d), has been evaluated in an analog manner, with the power summation at the input of the SOA-WC being a multi-level signal. Therefore, the same transfer function will also work with multi-level WDM signals. This nonlinear function is then used to train the neural network on the computer via TensorFlow, 40 while the pre-trained weighted matrix can be applied to the all-optical neural network to run inference and evaluate the accuracy.
The handwritten digit classification problem 41 with modified National Institute of Standards and Technology (MNIST) dataset is one of the benchmarking problems used for the performance appraisal of a neural network. The MNIST dataset contains 60 000 training samples and 10 000 testing samples and includes ten categories of digits from 0 to 9. In Sec. II, we have discussed that the linear synaptic operation of the SOA-based neuron can allow more than 64 channel inputs, with the introduction of negligible error. Here, we indeed simulate the all-optical neural network with input layer neurons with 64 channel inputs each. To encode the input image into 64 channels by means of multi-level modulation with 9-bit resolution, we preprocess the images in the dataset to reduce their resolution from 28 × 28 to 8 × 8 pixels. Figure 7(a) illustrates the data preprocessing for the input of the neural network (NN). The 256 level gray data are first converted into a black and white image with a threshold at level 128 and cropped into 24 × 24 pixels at the center. The images are then converted to 8 × 8 pixels with every 3 × 3 pixels encoded into 512 grayscale levels, i.e., 9 bits-resolution. For solving this digit classification, a two-layer NN is structured as shown in Fig. 7(b). On the first layer, there are 64 neurons where each of the weighted addition output is followed by the optical nonlinear function obtained in Sec. III, and on the second (output) layer, ten linear neurons are used to represent the ten digits, from 0 to 9. In the optical neural network (ONN) implementation, the inputs and the weights are usually normalized in order to ease the optical modulation and the dynamic weighting control. This is implemented in simulation by applying batch normalization and weight normalization. To train the NN for MNIST dataset classification, the ADAM optimizer is utilized due to its fast convergence, 42 which makes the training process more efficient.

ARTICLE scitation.org/journal/app
We train this two-layer structure with the current third order polynomial nonlinear function without noise induction as a reference. The trained weighted matrix is then applied to the ONN model to investigate the performance of the optical network under error induction and contribution from the linear and the nonlinear units. Moreover, we benchmark this same shallow neural network for the same data in Fig. 7(a), but using the sigmoid nonlinear function. The test accuracy is recorded after every update of the weighting matrices when training the neural network in TensorFlow. Figure 7(c) presents the test accuracy as a function of the training epochs for different nonlinear functions: when the nonlinear function is the conventional mathematical sigmoid function (blue), when it corresponds to the transfer function observed at 2 GS/s per channel (orange), and when it corresponds to the transfer function obtained at 10 GS/s as input (yellow). Note that here we do not consider yet the influence of the all-optical neuron impairments. The curves show that the NN is converging after 15 epochs of training and that all considered nonlinear functions yield a similar final test accuracy of ∼94.5% after training.
To take into account the error induced by the all-optical neuron, we consider the distortion contribution due to the linear part of the neuron (described in Sec. II) and the distortion contribution due to the nonlinear part of the neuron (analyzed in Sec. III). In particular, the distortions are included here as additive white Gaussian noise, assuming that the signal spontaneous emission beating noise dominates the contribution, 43 which is added after the linear output and the nonlinear output. By tuning the standard deviation of the Gaussian noise, the same error levels as the ones observed experimentally can be reproduced. The same inference is now run in the case that impairments are induced in the optical neuron: Figs. 8(a) and 8(b) illustrate the colormap of the prediction accuracy (Acc.) as a function of the noise levels of both the linear and nonlinear functions of a neuron: these are scanned from 0 to 0.5 for both 2 and 10 GS/s input per channel, respectively. The accuracy in both cases obviously decreases when increasing the error at the output of both linear and nonlinear units. The red line shapes in Figs. 8(a) and 8(b) show the expected accuracy that the AONN system will have for a measured error level ranging from 0.05 to 0.10 for the linear operation (according to the error induced with 64 channel inputs, discussed in Sec. II) and for the nonlinear errors ranging between 0.08 and 0.11 for 2 Gbit/s input and between 0.10 and 0.15 for 10 Gbit/s input, respectively, as recorded during the experiments. For these same areas, an accuracy degradation of 2%-8% and 5%-15% for 2 and 10 GS/s input, respectively, is obtained, compared to the trained accuracy of 94.5%. The elliptical shape in Fig. 8(a) is due to a different deviation of the Gaussian noise distribution on the linear and nonlinear unit, while the circular shape in Fig. 8(b) is due to a more uniform variation for both units. These suggest that with 10 GS/s input, the two-layer all-optical engine, including 64 neurons in the first layer, with 64 synapses per neuron, and 10 neurons at the second layer fully connected, can perform 4.7 × 10 13 MAC/s, which provides ∼2.5 times faster computation than the state-of-theart GPUs 44 and the same order as the TPU, 45 considering only 5% best-case accuracy degradation and 10 GHz speed nonlinear processing, which is not available in GPUs and TPUs. Training the AONN with the addition of the estimated distortion from the linear and the nonlinear unit is expected to reduce the influence of the noise and preserve the high prediction accuracy of the NN using the wavelength converter as the nonlinear function instead of the conventional sigmoid function. In the future, we envision that the scaling to 64 input neurons in our network system can be realized by interfacing the chip with high-speed state-of-the-art transceiver modules 46 or with co-packaged optics 47 in a multi-chip package.

V. SYSTEM ENERGY CONSUMPTION ANALYSIS
In this section, we estimate the power consumption on the end-to-end (digital-to-optical-to-digital) system enabling the implementation of the optical neural network. Figure 9 shows the schema of the complete ONN system, which includes the transmitter, the optical chip, the receiver, the digital signal processor, and the control unit. The system overall is controlled by the control unit (Ctrl), which is interfaced with the computer and includes a fieldprogrammable gate array (FPGA) and a digital signal processor (DSP). Here, we use an FPGA for the sake of fast development and reconfiguration flexibility. 48,49 However, application specific integrated circuits (ASICs) can also be used to reduce the power consumption even further. 50 To analyze the effective power consumption of the ONN, all the components in the system should be taken into account. The transmitter (Tx) includes lasers, modulators, and DACs, which are used to drive the modulators. The ONN includes the ONN chip and its control DACs and drivers for weighting. The receiver (Rx) consists of photodetectors and the corresponding ADCs.
The energy consumption of the system is analyzed by considering different operation modes of the ONN within the end-to-end In the E/O/E approach, the optical chip is used to calculate linear matrix multiplication, while the nonlinear function is realized on the DSP, with the data received at the PDs. In the all-optical approaches, the nonlinear function is co-integrated with the linear optical neuron, and the output (at each layer output for the AO-1L case or at the end of the complete multiple-layer NN for the AO-TL case) is obtained via linear PDs. A more general ONN with N-inputs M-outputs and T-layers is now analyzed, including the end-to-end system performance, for these three different operation modes. The operations executed by the ONN systems are different for these cases, depending on if a single layer or multiple layers are implemented. For the single-layer implementations, as in cases (1) and (2), the DNN needs to be decomposed into layers and analyzed layer by layer, which is not necessary in case (3) for the same network implementation.
For the inference of a trained DNN, the data and weight matrix are loaded to the FPGA via the interface with a computer. The FPGA generates the electrical patterns as well as the weight control currents, which feed to the modulator DACs and the weight DACs and drivers, respectively, as shown in Fig. 9. The electrical patterns are imprinted on the laser beams and sent to the optical neural network chip. The chip is controlled with the analog currents coming from the respective DACs and amplified at the drivers, with which the matrix multiplications are calculated. For the E/O/E case, the detected linear output is converted into digital signals by the ADCs, and then the DSP unit processes the signals executing the nonlinear transfer function. The outputs are then sent back to the FPGA, which generates the patterns for the next layer. The next layer follows the same procedure. At the output layer, the outputs of the last layer nonlinear functions will be further processed by the FPGA and compared with the reference labels to provide the final prediction, which is then passed to the computer. Therefore, the power consumption of the E/O/E single-layer system can be calculated as where P Tx is the power of transmitter per channel, Pw is the power for each weighting, including the power of DAC and the current driver for the ONN, PRx is the power of receiver, PeNL is the power for the electrical nonlinear function, and P ctrl is the power of the control.
For AO-1L case, the procedure is similar to the E/O/E case, with the only difference that the nonlinear function is co-integrated on the optical chip. Therefore, the DSP does not carry out the nonlinear function calculation and only calculates the final accuracy at the output layer. Hence, the power of the AO-1L system can be calculated as where PoNL is the power of the photonic nonlinear function. Finally, for the AO-TL case, the FPGA and DSP are not required to process and update the inputs and weights for the next layer, but the DSP will calculate the loss and accuracy based on the final outputs and the reference labels. Therefore, the power consumption of the AO-TL system can be calculated as The required number of components of the three different scenarios and the power values used in the system power analysis are listed in Table I. These values are considered when using state-of-the-art components that fit into the scheme of the SOA-based all-optical neural network structure as described in Sec. II. Considering the delays related to all the components, the total time for the E/O/E system to execute one epoch can be specified as for an AO-1L single layer system is calculated as and for an AO-TL multi-layer system is calculated as t AO−TL = SN/ f Tx + 1/ f Tx + t Tx + T × (toLin + toNL) + 1/ fRx + tRx + SN/ feIO + tFPGA + te−inter + tacc, (6) where SN is the number of samples per epoch at the input of each layer and f Tx and fRx are the speed of the transmitter and receiver, respectively; t Tx , toLin, toNL, tRx, teNL, and te-inter are the time delay from the transmitter, the optical linear unit, the optical nonlinear unit, the receiver, the electrical nonlinear function, and electrical interconnection, respectively; tFPGA is the computational time for the FPGA to generate the patterns and the current values for the weights; and tacc is the computational time of the DSP for the accuracy calculation. The average total energy consumption for epoch can be expressed as Esyst = Psyst ⋅ tsyst, where Esyst is the total energy consumptions for the whole neural network system per epoch, tsyst is the time for computing one epoch of samples, and Psyst is the total power of the end-to-end system, all calculated, respectively, for the three operational system cases. The energy consumption for the optical MAC operation, i.e., the synaptic operation, depends on the number of controlled elements which provide the weights if only the optical engine is considered. Here, we use the same weighting elements, i.e., the SOAs, for which the power is 30 mW on average per weight, excluding the DACs. Therefore, for an operational input data rate of 10 GHz, the resulting power consumption for one MAC is 3.0 and 5.5 pJ/MAC if we include the weight DACs. However, this estimation misses the contribution of the transceiver, the overall electrical controller, the receiver, and the off-chip computations. Therefore, the end-to-end system power and the total computational time should be considered to obtain the real performance metrics of the optical neural network. For an N-input M-neuron T-layer DNN, the total number of MAC operations is SN × M × N × T. Hence, the effective energy consumption-effective as we now include the end-to-end system overall contribution-per MAC operation is the total power of the specific end-to-end system times the total time to execute one epoch over the total number of MAC operations, The delays and computational speed for different components are listed in Table II. The values used in the calculations are considered based on off-the-shelf components. In particular, all optical delays are obtained from the actual path length, while all the electrical delays are related to the processing clock time of the off the-shelf electronics.
We first investigate the size scaling of the optical neural network. As mentioned in Sec. II, the network is considered to be scaling up with M = N, i.e., this energy analysis is done with respect to a quadratic scaling of the network. When the increasing number of neurons M, the splitting loss will increase. As a consequence, we compensate these losses with additional laser power by increasing N, the input channel number. From Table II, it is clear that the largest DNN that we will investigate is an M × N × T DNN with a maximum number of 64 input ×64 neuron/layer × 10 layers. Figure 10 illustrates E MAC-eff obtained from Eqs. (1)-(7) for different system modes of operation and looking at different parameters. Figure 10(a) illustrates the energy consumption per MAC operation

ARTICLE scitation.org/journal/app
the power consumption as well as computing time after each layer. E MAC-eff tends gradually to the asymptotic value of 14 pJ/MAC. The lower limit of energy consumption is set by the power consumption at the transmitter side and at the weighting elements (this power relates to the weight unit power, and therefore, it does not depend on the synapses number). For the AO-TL neural network system, avoiding the electronics to optics to electronics conversions when moving layer by layer, the computing time gets reduced considerably: The rate of change of E MAC-eff is faster than for the E/O/E and AO-1L cases and reaches 12 pJ/MAC for a number of 64 synapses/neuron. If the FPGA was replaced with an ASIC with optimized designs to reduce the power consumption, the effective energy consumption would have not been changed dramatically since, in these particular large-scale network systems, the elements for the control of the weight represent the main contribution. Always in Fig. 10(a), we observe that the number of synapses per neuron in the system with single layer implementations should be greater than 20 for case (1) and greater than 18 for case (2) (2), with the multi-layer case, AO-TL (3), is set by the synapse number: a bigger difference is expected for a smaller synapse number. All the graphs in Fig. 10(a) tend to an asymptotic value because the lower limit is bound to the energy consumption on each synapse control component for M = N > 64 and T > 10. Hence, we carry out all the other investigations for M = N = 64 and T = 10 while changing other parameters, such as the input sample number SN, the speed of the transceivers f Tx/Rx , and the power of the weighting elements Pw. Figure 10(b) presents E MAC-eff and the total computing time when changing input sample numbers. The power efficiency only slightly decreases with varying the input sample numbers from 10 to 100 k (solid lines) for two reasons: E MAC-eff is calculated on each MAC operation of each sample and the total processing time for computing (dashed lines) increases linearly from 20 to 200 μs for the single-layer E/O/E and AO-1L neural network. The computing time for the AO-TL neural net case, instead, is at least 10 times faster. On the other hand, Fig. 10(c) shows E MAC-eff as a function of the transmitter and receiver operation frequency. The energy consumption can be decreased 5 pJ for all the cases, when increasing the speed of the transmitter and receiver from 10 to 100 GHz, due to the reduction of the total computing time. Improvements of the SOA performance are though needed to enable high-speed all-optical signal processing: This is considered possible when exploiting concepts such as quantum dot SOAs 55 or SOAs with the carrier reservoir layer, 56 for which carrier recovery times down to 0.5-10 ps have been demonstrated, which can facilitate operation bandwidth up to 100 GHz.
Finally, we tune the power of the biased weighting elements to see the energy consumption for 64-input 64-neuron ten-layer implementation with a transceiver speed of 10 GHz. Figure 10(d) illustrates the resulting E MAC-eff when changing the power of the weighting elements from 0 to 30 mW (solid lines). The energy consumption per MAC rises linearly with the weight power from 8 to 14 pJ for the single layer cases and from 6 to 12 pJ for the AO-TL DNN so that the use of an all-optical multi-layer network gives a 14% improvement in effective energy consumption per MAC, with respect to E/O/E and the AO-1L implementations. In addition, the dashed lines show E MAC-eff for the case when a non-volatile weighting element is used, such as phase change materials: 23 For single-layer cases, the power consumption is 2.4 pJ/MAC, while for the AO-TL, an energy consumption as low as 0.7 pJ/MAC is calculated. This energy is non-zero because of the transceiver and the post-processing on the FPGA, as shown in Eqs. (1)-(3) (setting Pw = 0 and PDAC = 0). This result suggests that the current control of the weighting elements contributes 5.3 pJ/MAC more for all the cases and that the SOA weighting consumes 6.3 pJ/MAC (obtained subtracting the energy consumption at 0 mW from the energy consumption at 30 mW). When substituting volatile and current biased elements with non-volatile elements in the AO-10L neural network, we can reach up to 94% energy saving for each MAC operation. In any case, the energy consumption for AO-TL neural network outperforms single-layer neural network system implementations.

VI. CONCLUSION
We analyze the performance of an all-optical neural network structure with WDM connectivity and SOA-based all-optical neurons. The linear neural network can be easily scaled as a function of WDM signals for multi-synapsis neurons: the linear processing unit can scale up to 64 c while guaranteeing a large input dynamic range under neglectable error introduction. A fully monolithically integrated all-optical neuron is experimentally demonstrated exploiting an SOA WC-based optical nonlinear function based on cross-gain modulation. The performance of the fully integrated all-optical neuron is 10% better than the hybrid case in terms of error introduction. The all-optical neural network is simulated with noise induction for benchmarking the inference of a noisy DNN built for the MNIST handwritten digit classification problem, showing that, working with 10 GS/s inputs, the all-optical approach is about 2.5 times faster than the state-of-the-art electronic GPU while guaranteeing similar accuracies.
Furthermore, we emulate the complete end-to-end system by introducing in the overall system performance calculation also the contribution of a control unit, transmitter and receiver units, together with D/A and A/D converters. The energy consumption is analyzed at a system level when an N-input M-neuron T-layers DNN is implemented. The calculation results show that the effective energy per MAC operation for an all-optical connected DNN always outperforms the single-layer DNN system. Eventually, the energy efficiency results are constrained by the speed and power consumption of the electronic side, including the DAC/ADC at the transceivers and the control FPGA for the pattern generation and signal processing, when we increase the number of synapses/neuron. Nevertheless, the AONN still performs more than 2 times better than state-of-the-art GPUs at the server level, excluding the energy for the cooling.