Spiking Neural Network Nonlinear Demapping on Neuromorphic Hardware for IM/DD Optical Communication

Elias Arnold, Georg Böcherer, Member, IEEE, Member, Optica, Florian Strasser, Eric Müller, Philipp Spilger, Sebastian Billaudelle, Johannes Weis, Johannes Schemmel, Member, IEEE, Stefano Calabrò, Maxim Kuschnarov

Abstract—Neuromorphic computing implementing spiking neural networks (SNN) is a promising technology for reducing the footprint of optical transceivers, as required by the fast-paced growth of data center traffic. In this work, an SNN nonlinear demapper is designed and evaluated on a simulated intensity-modulation direct-detection link with chromatic dispersion. The SNN demapper is implemented in software and on the analog neuromorphic hardware system BrainScaleS-2 (BSS-2). For comparison, linear equalization (LE), Volterra nonlinear equalization (VNLE), and nonlinear demapping by an artificial neural network (ANN) implemented in software are considered. At a pre-forward error correction bit error rate of $2 \times 10^{-3}$ the software SNN outperforms LE by 1.5 dB, VNLE by 0.3 dB and the ANN by 0.5 dB. The hardware penalty of the SNN on BSS-2 is only 0.2 dB, i.e., also on hardware, the SNN performs better than all software implementations of the reference approaches. Hence, this work demonstrates that SNN demappers implemented on electrical analog hardware can realize powerful and accurate signal processing fulfilling the strict requirements of optical communications.

Index Terms—Spiking Neural Network, Optical Communication, Equalization, Data Centers, Intensity-Modulation Direct-Detection

I. INTRODUCTION

The fast-paced growth of data center traffic is the driver behind the increase in bit rate and, at the same time, the footprint reduction of the optical transceivers. This trend results in an urgent need to decrease the power consumption per bit. Whereas evolutionary steps can mitigate the problem, the exponential traffic growth asks for a paradigm shift. To resolve this dilemma, recent research envisions moving parts of digital signal processing (DSP) to analog frontends with lower power consumption.

One approach is photonic neuromorphic computing [1], which has been proposed, e.g., for chromatic dispersion (CD) compensation and nonlinear equalization in short-reach optical transmission [2], [3], [4]. However, although photons can operate faster than electronic hardware, the latter scales better in terms of footprint and power consumption.

The return to analog electrical adaptive equalizers is also gaining traction, e.g., in [5], the transmitter DSP feeds two electrical non-return-to-zero (NRZ) signals to an analog pulse-amplitude-modulation-4-level (PAM-4) encoder, whose output is filtered by a continuous-time linear equalizer (CTLE) and a 3-tap feed forward equalizer (FFE).

At the same time, the research community is striving to implement more powerful nonlinear algorithms, e.g. based on artificial intelligence (AI) techniques, on analog electronics. An important subfield is in-memory-computing (IMC) [6], which aims for efficient calculation of vector-matrix multiplications. Research on IMC is mainly driven by the urgent need for AI accelerators for artificial neural networks (ANNs). Eventually, IMC may enable the use of ANNs for signal processing in the data path of communication systems, see, e.g., [7].

Analog electronic neuromorphic computing offers an alternative path towards AI-based signal processing. Spiking neural networks (SNNs) [8] in analog hardware [9], adopt the brain’s unique power efficiency by imitating the basic functioning of the human brain. They combine the sparse representation of information by event-based spiking signals with power efficient IMC. In [10], we have shown that SNN FFs emulated in software can compensate nonlinear impairments in intensity-modulation direct-detection (IM/DD) links. In [11], SNN decision feedback equalization (DFE) is considered for compensating severe linear inter-symbol interference (ISI).

Recently, in-the-loop (ITL) training of SNNs on analog hardware [12] has shown promising results by achieving state-of-the-art performance in inference tasks [13]. In [14], we presented preliminary results on the design and evaluation of an SNN demapper on the analog neuromorphic BrainScaleS-2 (BSS-2) system [9]. Specifically, we considered the detection of a PAM-4 signal in a simulated IM/DD link, which is impaired by CD and additive white Gaussian noise (AWGN), as displayed in Fig. 1. Our results in [14] show that SNNs emulated on the neuromorphic BSS-2 hardware outperform linear equalization in software, while the gap between software and hardware SNN is slightly below 1 dB.

In this work, we detail and extend our previous work on SNN-based neuromorphic demapping [14]. For the same IM/DD link model as in [14] (see Fig. 1), we reduce the SNN software-hardware penalty to below 0.2 dB. We achieve this by optimizing the hardware operation point, tuning the training procedure, and adjusting the input-spike encoding.
compare the proposed solution with software implementations of a linear equalizer, a 5th-order Volterra nonlinear equalizer (VNLE), and a nonlinear ANN demapper. Despite the nonzero hardware penalty, our hardware SNN demapper performs better than the considered simulated reference algorithms. At the assumed forward error correction (FEC) bit error rate (BER) threshold of $2 \times 10^{-3}$, the gain over a linear equalizer is approximately 1.5 dB.

The remainder of this work is organized as follows. In Section II, we outline the IM/DD link and explain the implementation of the reference demappers. Section III details the SNN demapper and the input encoding scheme. Subsequently, we provide an overview of the BSS-2 platform in Section IV. The training procedure is explained in Section V. In Section VI, we show our results and in Section VII, we present our conclusions.

II. IM/DD MODEL AND REFERENCE DEMAPPERS

In this section, we detail our IM/DD link model and specify the reference algorithms, i.e., linear equalization (LE) and VNLE followed by hard decision (HD) demapping, and ANN nonlinear demapping. All reference demappers are simulated in double-precision floating-point arithmetic, except for the ANN, which uses single-precision floating-point arithmetic. The considered ANN and VNLE architectures are rather complex, i.e., the ANN has two nonlinear hidden layers and the VNLE uses the full filter length also for the higher order terms. The purpose of considering complex ANN and VNLE processing is to benchmark what performance we can achieve by nonlinear processing without considering resource usage, and then to compare the SNN performance to such benchmark.

A. Simulated IM/DD Link

We simulate the transmission of PAM-4 symbols in the O-band at a baudrate of 112 Gbd. Assuming an FEC overhead of 12% with a BER threshold of $2 \times 10^{-3}$, we target a corresponding net bit rate of 200 Gbit s$^{-1}$.

We display the simulated link in Fig. 1 and the corresponding parameters in Table I. At the transmitter, a bit sequence $[b_1 b_2]_n$ is mapped to a length $N$ PAM-4 signal $y = y^N$ according to a Gray-labelled alphabet $\mathcal{A} = \{-3, -1, 1, 3\}$. This signal is upsampld, root-raised-cosine (RRC) filtered, and offset by a bias. The resulting sequence is impaired by CD, modelled linearly following, e.g., [15, Sec. 3.2], to simulate the effect of the fibre on the propagating optical signal. We assume that the power dissipated into the fiber is low and we ignore fiber non-linearities in our simulated link. At the receiver, the signal goes through a photodiode (PD), which is modeled as a square-law device, and AWGN is added. The resulting signal is RRC filtered and downsampled, resulting in the received sequence $\hat{y} = y^N$. Finally, bit decisions $[\hat{b}_1 \hat{b}_2]_n$ are output by the respective device. We index the bit sequence and signal elements with $n$, $0 \leq n < N$. Note that a constellation with non-equidistant signal points to precompensate the squeezing of the PD may be beneficial, however, this is beyond the scope of this work.

B. Linear Minimum Mean Squared Error (LMMSE) Equalization

Our first reference detector consists of LE followed by HD demapping. To simplify the notation in the following, we specify the samples considered for equalizing the $n$-th sample via double-indexing,

$$\tilde{y}_n = \left[\tilde{y}_{n,0}, \tilde{y}_{n,1}, \ldots, \tilde{y}_{n,n_{up}-1}\right],$$

and

$$\hat{y}_n = \left[\hat{y}_{n-[n_{up}/2]}, \hat{y}_{n-[n_{up}/2]+1}, \ldots, \hat{y}_{n+n_{up}/2}\right].$$

Specifically, the LE calculates

$$\hat{y}_n = c + \sum_{j=0}^{n_{up}-1} \tilde{y}_{n,j} h_j,$$

where the bias $c$ accounts for residual direct current (DC) and $h$ are the filter coefficients. The number $n_{up}$ of taps is the filter width and is assumed to be odd. We use data-aided training to calculate $h$ and $c$ so as to minimize the mean squared error.

Remarks on the simulated IM/DD link parameters:

1) Wavelength and dispersion are in the range specified in [16, Table 9.6].
2) For the considered baudrate and fiber length, the dispersion in terms of delay spread between the frequency components at ± Nyquist frequency is 1.35 symbols.
3) The bias results in a carrier-to-signal-power-ratio (CSPR) of 9.6 dB.
4) The combination of CD and PD results in a band limitation, despite the fact that CD alone acts as an allpass filter. Consider $|\text{signal} + \text{carrier}|^2 = |\text{signal}|^2 + 2\text{Re}(\text{signal} \cdot \text{carrier}) + |\text{carrier}|^2$.

For the considered parameters, CD and PD cause for the linear term $2\text{Re}(\text{signal} \cdot \text{carrier})$ an attenuation of 6.2 dB at the Nyquist frequency, compared to frequency 0.

Table I

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Net bit rate</td>
<td>200 Gbit s$^{-1}$</td>
</tr>
<tr>
<td>FEC threshold</td>
<td>12%</td>
</tr>
<tr>
<td>Baudrate</td>
<td>112 Gbd</td>
</tr>
<tr>
<td>Wavelength</td>
<td>1270 nm</td>
</tr>
<tr>
<td>Dispersion $D_{CD}$</td>
<td>-5 ps nm$^{-1}$ km$^{-1}$</td>
</tr>
<tr>
<td>Fiber length</td>
<td>4 km</td>
</tr>
<tr>
<td>Alphabet $\mathcal{A}$</td>
<td>${-3,-1,1,3}$</td>
</tr>
<tr>
<td>Sequence length $N$</td>
<td>10000</td>
</tr>
<tr>
<td>Bias $b$</td>
<td>2.25</td>
</tr>
<tr>
<td>RRC roll-off $\alpha$</td>
<td>0.2</td>
</tr>
<tr>
<td>Upsampling $n_{up}$</td>
<td>3</td>
</tr>
<tr>
<td>Downsampling $n_{down}$</td>
<td>3</td>
</tr>
</tbody>
</table>

Figure 1. The simulated IM/DD link schematics. A bit sequence is mapped at the transmitter (Tx) to a PAM-4 signal and is impaired by CD in the fiber. At the receiver (Rx), after square-law detection, AWGN is added. An equalizer/demapper recovers the transmitted bits.
The demapper calculates an HD $[b_1^*b_2^*]_n$ from the equalized sample $\hat{y}_n$ via three decision boundaries, which are chosen such that the BER is minimized. Note that at the transmitter, the signal points in the PAM-4 constellation are equidistant, while the received signal points are not equidistant anymore, because of the nonlinear transfer function of the PD. The LE cannot compensate nonlinear distortions, so the received signal points remain non-equidistant after LE. This is compensated in part by the demapper, as the decision boundaries are optimized with respect to the received and equalized signal points $\hat{y}_n$, not the transmitted signal points $y_n$. In the following, we refer to the combination of a LE and a memoryless demapper by linear minimum mean square error (LMMSE) equalization.

**D. Nonlinear ANN Demapper**

We consider an ANN with $n_{\text{tap}} = 7$ input units, a first hidden layer with 40 neurons, followed by a second hidden layer with 20 neurons, both activated by the tanh function, and a linear output layer with 4 neurons. The output values are interpreted as log-probabilities providing a soft decision (SD) on the PAM-4 symbols. A symbol-wise HD is obtained by choosing the symbol of highest probability and the bitwise HD is obtained from the symbol decisions via the Gray label.

**III. SPIKING NEURAL NETWORKS FOR EQUALIZATION**

This section outlines the SNN demappers. We detail their emulation on BSS-2 in Sec. IV.

SNNs consist of neurons, evolving in time $t$, and communicating via binary spike events. The leaky integrate-and-fire (LIF) spiking neuron model [8, Sec. 1.3] captures some of the core dynamics observed in biological neurons while at the same time maintaining a tractable complexity. LIF neurons integrate synaptic input current $I(t)$ onto their internal membrane voltage state $v_m(t)$ according to the dynamics described by the ordinary differential equation (ODE)

$$\tau_m \frac{dv_m(t)}{dt} = [v_1 - v_m(t)] + R_1 \cdot I(t).$$

Here, $\tau_m$ is the membrane time constant, $R_1$ is the leakage resistance, and $v_1$ is the leakage potential. When the membrane potential reaches a threshold potential $\theta$ at time $t^*$, the neuron emits a spike $z(t) = \delta(t - t^*), with \delta$ being the Dirac delta distribution, and $v_m$ is reset to a potential $v_m(t^*) = v_r$. The synaptic current $I$ is induced by presynaptic neurons $\{v_i\}$, projecting spike events $z_i(t) = \delta(t - t^*_i)$ at times $\{t^*_i\}$ onto the postsynaptic neuron through synapses with weights $w_i$, thereby causing an exponentially decaying current described by the ODE

$$\frac{dI(t)}{dt} = -\frac{I(t)}{\tau_s} + \sum_i w_i z_i(t).$$

$\tau_s$ denotes the synaptic time constant. The LIF dynamics are exemplified in Fig. 2A. Neurons with a disabled spiking mechanism and membrane dynamics according to (12), are referred to as leaky integrator (LI) neurons [8, Sec. 1.3].

In the following, we consider an SNN with the structure outlined in [14] and depicted in Fig. 2B. It consists of one hidden layer constituted by $N^h$ LIF neurons $\{v_{ij}\}$, projecting its spike events onto one output layer with $N_c = 4$ non-spiking LI readout neurons $\{v_{kj}\}$. The hidden layer receives spike events from the input layer, encoding a set of input samples $\hat{y}_n$. The readout layer’s outputs are translated to symbol-level log-probabilities. Spike-input encoding and output decoding are explained in the following.

**a) Input Spike-Encoding:** To demap a sample $\hat{y}_n$, we consider the chunk $\hat{y}_n$ defined in (1) and assign to each sample $\hat{y}_{n,\ell}$ a set of input neurons $\{n_{\ell}^1, \ell = 1, \ldots, 3\}$, encoding the sample value in their spike times $\{s_{\ell}^1\}$. Here, $\ell$ indexes the samples within $\hat{y}_n$ and $N^h_\ell \in N$ is the number of neurons associated to sample $\hat{y}_{n,\ell}$, such that $N^h = \sum_{\ell=0}^2 N^h_\ell$ is the size of the input layer. Further, we assign each input neuron $n_{\ell}^1, \ell$ a

$y^T = \begin{bmatrix} y_{n-1}^T & y_n^T & y_{n+1}^T \end{bmatrix}^T$.
reference point $\chi_{i,\ell}$, which we choose together with $N^{\ell}_1$ to be independent of $\ell$, $\chi_{i,\ell} = \chi_i$ and $N^{\ell}_1 = N^1_i$. Finally, we compute the spike time $t_{i,\ell}^a$ by scaling the distance of $\tilde{y}_{n,\ell}$ to $\chi_i$,

$$t_{i,\ell}^a = \alpha |\tilde{y}_{n,\ell} - \chi_i| + \sigma,$$  

(14)

where $\alpha$ is a scaling factor and $\sigma$ is an offset. This spike-encoding preserves all information and encodes the value $\tilde{y}_{n,\ell}$ redundantly in $N^1_i$ spike times in order to increase the network’s activity and enrich information in time. The values $\chi_i$, $N^1_i$ and $\alpha$ are subject to tuning and are chosen to augment the network’s activity by the right amount to achieve optimal performance. Here, the $\chi_i$s are equidistantly spaced in the domain of $\tilde{y}_{n,\ell}$ and $\alpha$ is selected to obtain spike times comparable to the membrane time constants. Note, while larger $N^1_i$ increases the network’s complexity, it potentially stabilizes the network’s performance on a noisy analog substrate like BSS-2, see Section IV. We further introduce a cutoff time $t_c$ after which input neurons are not allowed to emit spike events and we do not expect the SNN to gain information afterwards. The spike encoding is illustrated in Fig. 2C. A sample $\tilde{y}_{n,\ell}$ (purple, dotted) is translated into spike times according to its distance to the reference points, e.g., the distance to $\chi_4$ (blue, solid) results in a spike from input neuron $n^{\ell}_{k_4}$ depicted in blue. The input neuron $n^{\ell}_{k_4}$, corresponding to $\chi_4$ (yellow, dotted), remains silent.

b) Output Membrane-Decoding: Each of the 4 neurons in the readout layer is assigned to one element in the PAM-4 alphabet $A$. We take the maximum membrane voltages produced over time, i.e., $s_k = \max_{i} v_k(t)$, which are interpreted as log-probabilities providing an SD on the PAM-4 symbols. Then, the symbol-wise HD is obtained by choosing the symbol of highest probability and the bitwise HD is obtained from the symbol decisions via the Gray label. Hence, the network learns to place its hidden layer spike events in time, such that the membrane trace of the correct output neuron is deflected upwards while the traces of the others are suppressed.

A. Training

Time-discretized SNNs are mathematically recurrent neural networks (RNNs) [19] and can be trained with the gradient-based backpropagation through time (BPTT). For this, the derivative of the spiking output of the LIF neurons with respect to their membrane potential has to be known. This derivative is ill-defined due to the threshold activation function. Often surrogate gradients, smoothing out the neurons’ activation functions, are used to bypass this issue and allow backpropagating the gradient. Here, we rely on the SuperSpike [19] surrogate gradient. The model parameters are optimized by the Adam optimizer [20].

In the simulation, the SNN is integrated with a step size $\Delta t = 0.5 \mu s$ for $T = 30 \mu s$, suitable for BSS-2 (see Section IV). Our simulated SNN demappers are implemented using the PyTorch-based Norse [21] framework. To estimate the hardware gradient for the SNN demappers emulated on BSS-2 in continuous time with the BPTT algorithm, we discretize the hardware observables assuming the same step size, see Section IV. We allocate $N^1_i = 10$ input neurons per sample of which only a subset is active, depending on the sample value, see Fig. 5. We use the cross entropy on the max-over-time voltage values as the objective function. The parameters of the SNN and input encoding are listed in Table II. In case of emulation on BSS-2, these parameters are used as calibration targets, resp. for ITL training (see Section IV).

IV. BrainScales-2 Neuromorphic System

We now discuss the emulation of the SNN demappers on the BrainScaleS-2 (BSS-2) system [9].
To obtain an equivalent experiment configuration on BSS-2, our software stack translates the high-level SNN experiment description to a data flow graph representation, places and routes neurons and synapses on the hardware substrate, and compiles stimulus inputs, recording settings and other runtime dynamics into an experiment program [23].

The analog circuits on BSS-2 are subject to device variations (fixed-pattern noise) that can be compensated for by calibration. Therefore, one part of the system configuration consists of a calibration data set that is loaded to obtain a chip operating point, which most closely resembles the desired target dynamics with minimal variation, e.g., with respect to model parameters such as neuron membrane time constants or synaptic efficacy.

To represent one signed software weight \( w_{\text{sw}} \) on BSS-2, two hardware synapses, with the respectively excitatory and inhibitory weights

\[
    w_{\text{hw}}^\text{inh} = \max (0, -w_{\text{sw}}) \quad \text{and} \quad w_{\text{hw}}^\text{exc} = \max (0, w_{\text{sw}}),
\]

are allocated and constitute one signed hardware weight \( w_{\text{hw}} \in [-63, 63] \). We scale each weight \( w_{\text{sw}} \) linearly into a hardware-compatible range and round it to the nearest value representable on BSS-2. The batched input spikes are injected into BSS-2 and the SNN is emulated for \( T = 30 \mu s \) per batch entry, i.e., for demapping a single sample. During emulation, spike events are recorded and the CADC samples membrane voltages of the hidden neurons and the readout neurons. After the emulation, the host computer reads back and post-processes the recorded data. The post-processing step includes a linear interpolation to convert event-based CADC recordings to a \texttt{torch::Tensor} expressed on a fixed time grid. To facilitate hardware-ITL training on BSS-2, we utilized \texttt{hxtorch.snn} [23], a PyTorch-based [24] library that automates and abstracts away hardware-specific procedures and provides data conversions from and to PyTorch.

\[ \text{V. \ \textbf{TRAINING AND TESTING}} \]

To measure the BER of the demappers against the noise-level in the IM/DD link, we train our models with successively increasing noise-levels \( \sigma^2 \). At each noise-level, we perform validation runs on independent data and store the model parameters of the best performing demapper. At the next noise-level, we restore the best model from the previous noise-level and continue training. This procedure is repeated for five different random seeds, affecting model initialization, IM/DD-data generation and sampling permutations. We select the best-performing demappers for each noise-level over the seeds according to their respective validation runs and benchmark the models on independent test data. The tests are run until a minimum of 2000 bit error events are encountered.

\[ \text{VI. \ \textbf{RESULTS}} \]

In Figure 4, we compare our 7-tap SNN demapper emulated on the analog neuromorphic BSS-2 system (SNN\textsuperscript{hw}) to a 7-tap SNN demapper simulated in software (SNN\textsuperscript{sw}) in terms of BER versus noise-level \( \sigma^2 \) in the link. We benchmark our SNN performances against the LMMSE, with 7 taps (LE7) and
Figure 4. The BER of SNN equalizers in simulation and on the BSS-2 system over the noise-levels $\sigma^2$ in the IM/DD link compared to ANN, VNLE, and LMMSE reference equalizers. The error bars denote the 99% credibility intervals.

Table III

<table>
<thead>
<tr>
<th>Name</th>
<th>Type</th>
<th>Layers</th>
<th>$n_{tap}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>LE1</td>
<td>LMMSE</td>
<td>1–1</td>
<td>1</td>
</tr>
<tr>
<td>LE7</td>
<td>LMMSE</td>
<td>7–1</td>
<td>7</td>
</tr>
<tr>
<td>ANN</td>
<td>ANN</td>
<td>7–40 20–4</td>
<td>7</td>
</tr>
<tr>
<td>VNLE</td>
<td>VNLE</td>
<td>7–1</td>
<td>7</td>
</tr>
<tr>
<td>SNN$^{sw}$</td>
<td>SNN in sim.</td>
<td>70–40 4</td>
<td>7</td>
</tr>
<tr>
<td>SNN$^{hw}$</td>
<td>SNN on BSS-2</td>
<td>70–40 4</td>
<td>7</td>
</tr>
</tbody>
</table>

without memory (LE1). As additional nonlinear references, we consider a 7-tap ANN demapper with two hidden layers (see Section II-D) and a 7-tap VNLE. All demapper configurations are specified in Table III.

Both the simulated SNN$^{sw}$ demapper and the SNN$^{hw}$ demapper on BSS-2 outperform the LMMSE demapper. At a pre-FEC BER of $2 \cdot 10^{-3}$, we observe a gain of about 1.5 dB of the SNN$^{sw}$ demapper to the LE7 demapper and a gain of 0.5 dB to the nonlinear ANN demapper. Compared to the VNLE demapper, the SNN$^{sw}$ demapper shows superior performance for noise levels higher than $\sim 21$ dB, in particular at the considered pre-FEC BER threshold, it shows a 0.3 dB improvement, however, for noise levels lower than $\sim 21$ dB the VNLE demapper achieves a lower BER.

The SNN$^{hw}$ demapper on BSS-2 approaches the performance observed with the simulated SNN and only suffers from a small hardware penalty with respect to the SNN$^{sw}$ of about 0.2 dB at a BER of $2 \cdot 10^{-3}$, outperforming all reference strategies.

In Fig. 5A, we visualize the process of joint equalization and demapping on BSS-2 on four different samples. The upper row indicates the sample set $\tilde{y}_n$, with the sample of interest $\tilde{y}_n$ highlighted. Each sample in this set is translated to spike times of 10 input neurons, depicted in the second row. For $n_{tap} = 7$, the hidden LIF layer receives spike events from 70 input neurons, of which the majority are silent due to a cutoff time of 15 $\mu$s (see Section III). These input spike events activate the 40 LIF neurons in the hidden layer, exciting them to emit spike events themselves as shown in the third row. These spikes events constitute a meaningful pattern, driving the membrane voltage of the correct LI output neuron to the maximum voltage value over time, from which the bits are inferred via an HD. This behavior is observed in the analog membrane traces in the lowermost row. The membrane voltage of the readout neuron corresponding to the estimated symbol is deflected upwards while the others drop below zero and hence do not intervene in the decision. Note that the dynamics visualized in each column from the second to fourth row all happen simultaneously in BSS-2’s analog circuits.

The weight matrices learned on BSS-2 are shown in Fig. 5B. The input-to-hidden weight matrix $w_{ij}^{h}$ shows a greater weight magnitude for rows with indices $i \in [30, 39]$. This is expected as these rows receive the input spike events encoding the most significant sample to demap $\tilde{y}_n$ in the innermost tap. For the outer rows, one can observe a pattern repeating with the number of input rows per sample, $N^i_s = 10$. The lower plot depicts the hidden-to-output weight matrix $w_{jk}^{o}$. 

VII. Conclusion

This work successfully showcases the implementation of SNN-based joint equalization and demapping emulated on the accelerated analog neuromorphic hardware system BSS-2. Our demapper on BSS-2 approaches the performance of an SNN demapper simulated in software while outperforming an LMMSE equalizer and performing better than a nonlinear ANN reference demapper, both with the same number of taps. A gain of 1.5 dB at a BER of $2 \cdot 10^{-3}$ of the simulated SNN over the LMMSE clearly demonstrates the nonlinear processing capability of the SNN demapper. A small hardware penalty of about 0.2 dB at the same BER with respect to the SNN simulated in software is observed and is attributed to hardware imperfections like noise in the physical substrate, fixed-pattern noise artifacts of the production process, and potentially a sub-optimal hardware operation point. Typically, the fixed-pattern noise effects are widely absorbed by gradient-based training. An additional cause might be the limited precision of 6-bit hardware weights and the 8-bit CADC. Despite having multiple sources of noise and loss of information owing to limited precision, the SNN demapper on BSS-2 shows an excellent performance and resilience to hardware impairments. We conjecture that the chosen size of the SNN with 40 hidden neurons ensures a robust behavior by encoding information redundantly. Accordingly, we expect to observe a larger hardware penalty as the number of hidden neurons decreases. An interesting direction for future research is to investigate how the complexity of the SNN on BSS-2 can be reduced while maintaining its performance.

With the implementation at hand, the equalization and demapping of a single sample take about $T = 30 \mu$s. Therefore, the BSS-2 platform supports a maximum symbol rate of 30kBd. However, this upper bound is due to the specific design target of BSS-2 as a general purpose experimentation platform and does not follow from an intrinsic limitation of the underlying complementary metal–oxide–semiconductor (CMOS) technology itself. Significantly faster inference, and
thus throughput, might be achieved by accelerating the emulation of the LIF dynamics. [25] presented a neuromorphic ASIC exhibiting an acceleration of up to two additional orders of magnitude (OOMs) with respect to BSS-2. Given the fact that the cited implementation was fabricated in a 180 nm CMOS process, it is reasonable to assume that a modern FinFET process could potentially gain at least another 2 OOMs. This would result in a processing time in the order of nanoseconds per sample. The throughput can further be increased by parallelization. Several spiking network cores could be deployed in parallel, of which each could process multiple samples on the same physical substrate at once. To get nanoseconds per sample to 200 Gbit/s, a parallel processing factor of a few hundreds is enough, which is similar to the time-interleaving of multiple analog-to-digital converters (ADCs) used in standard optical DSP solutions [26].

The spatio-temporal sparsity of SNNs promises an intrinsically favorable energy footprint when contrasted to traditional ANN-based solutions – especially when combined with analog IMC [6]. Currently, the power consumption is dominated by I/O as well as the clock distribution and biasing of the individual subsystems – a fact largely attributed to the flexible general-purpose approach of BSS-2. Optimizing or omitting these subsystems in future, more specialized ASICs could dramatically reduce the overall energy footprint.

Future research aims to increase hardware resource efficiency by decreasing the architectural SNN complexity and investigate feature sharing in order to increase the throughput by parallelization. Importantly, the power consumption of neuromorphic signal processing shall be analyzed, compared to a digital implementation, and optimized by minimization of the firing activity of the neurons and efficient design of the input and output interfaces of the SNN.

The presented results demonstrate that electrical neuromorphic hardware can implement signal processing with the accuracy required in optical transceivers. To successfully integrate SNN equalization in optical transceivers, efficient conversion of received signals into input spikes must be researched.

ACKNOWLEDGMENT

We thank L. Blessing, B. Cramer, and C. Pehle for insightful discussions, C. Mauch for keeping the BSS-2 system on track, and all members of the Electronic Vision(s) research group who contributed to the BSS-2 system.

FUNDING

The contributions of the Electronic Vision(s) group have been supported by the EC Horizon 2020 Framework Programme under grant agreements 785907 (HBP SGA2) and 945539 (HBP SGA3), Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2181/1-390900948 (the Heidelberg STRUCTURES Excellence Cluster), the Helmholtz Association Initiative and Networking Fund Advanced Computing Architectures (ACA) under Project SO-092.

REFERENCES

voir computing based on optical filters in a loop as a high performanceand low-power consumption equalizer for 100 Gb/s direct detection
systems,” in 2021 European Conference on Optical Communication

100Gb/s downstream PAM4 PON link with 34 dB power budget,” in
European Conference on Optical Communication (ECOC), Switzerland,
Basel, Sep. 2022, paper TuC3.

K. Virwani, M. Ishi, P. Narayanan, A. Fumarola et al., “Neuromorphic
computing using non-volatile memory,” Advances in Physics: X, vol. 2,

and U. Schlichtmann, “Power-efficient and robust nonlinear demapper
for 64QAM using in-memory computing,” in European Conference on
Optical Communication (ECOC), Switzerland, Basel, Sep. 2022, paper WeC3.

dynamics: From single neurons to networks and models of cognition.

mann, J. Weis, A. Leibfried, E. Müller, and J. Schemmel, “The BrainScaleS-2
accelerated neuromorphic system with hybrid plasticity,” Front. Neurosci.,

[10] E. Arnold, G. Böcherer, E. Müller, P. Spilger, J. Schemmel, S. Calabrò,
and M. Kuschnerov, “Spiking network equalization for IM/DD optical
communication,” in Optica Advanced Photonics Congress 2022.
Optica Publishing Group, 2022. doi: 10.1364/SPPCOM.2022.SpTu1J.2
Paper Sptu1J.2.

network decision feedback equalization,” arXiv preprint, 2022.[Online].
available: https://arxiv.org/abs/2211.04756v2

S. Hartmann, D. Husmann, K. Husmann, J. Jetsch, M. Kleider, C. Koke,
A. Kononov, C. Mauch, E. Müller, P. Müller, J. Partzsch, M. A. Petrovici,
B. Vogginger, S. Schiefer, S. Scholze, V. Thanasolus, J. Schemmel,
R. Legenstein, W. Maass, C. Mayr, and K. Meier, “Neuromorphic hard
ware in the loop: Training a deep spiking network on the BrainScale
 wafer-scale system,” in Proceedings of the 2017 IEEE International Joint
Conference on Neural Networks (IJCNN), 2017, pp. 2227–2234. doi:
10.1109/IJCNN.2017.7966125

Karasenko, C. Pehle, K. Schreiber, Y. Stradmann, J. Weis et al.,
“Surrogate gradients for analog neuromorphic computing,” Proceedings of
the National Academy of Sciences, vol. 119, no. 4, 2022.

[14] E. Arnold, G. Böcherer, E. Müller, P. Spilger, J. Schemmel,
S. Calabrò, and M. Kuschnerov, “Spiking network equalization on neuromorphic hardware for IM/DD optical communication,” in
European Conference on Optical Communication (ECOC) 2022.
/opg.optica.org/abstract.cfm?URI=ECEOC-2022-Th1C.5


optical networks (50G-PON): Physical media dependent (PMD) layer


reach optical communication: A comparison of deep neural networks
and Volterra series,” Journal of Lightwave Technology, vol. 39, no. 10,

in spiking neural networks: Bringing the power of gradient-based optimi-
ization to spiking neural networks,” IEEE Signal Processing Magazine,
vol. 36, no. 6, pp. 51–63, 2019. doi: 10.1109/MSP.2019.2931595


available: https://doi.org/10.5281/zenodo.4429205

nal of Neurophysiology, vol. 94, no. 5, pp. 3637–3642, 2005. doi:
10.1152/jn.00686.2005

[23] E. Müller, E. Arnold, O. Breitwieser, M. Czerlinski, A. Emmel,
J. Kaiser, C. Mauch, S. Schmitt, P. Spilger, R. Stock, Y. Strad
mann, J. Weis, A. Baumbach, S. Billaudelle, B. Cramer, E. Ebert,
J. Göltz, J. Ilmberger, V. Karasenko, M. Kleider, A. Leibfried, C. Pehle,
and J. Schemmel, “A scalable approach to modeling on acceler-
ated neuromorphic hardware,” Front. Neurosci., vol. 16, 2022. doi:
10.3389/fnins.2022.884128

A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in

plasticity in a VLSI spiking neural network model,” in Proceedings of
the 2006 International Joint Conference on Neural Networks (IJCNN).
IEEE Press, 2006. doi: 10.1109/IJCNN.2006.246651

[26] C. Laperle and M. O’Sullivan, “Advances in high-speed dacs, adcs, and
dsp for optical coherent transceivers,” Journal of Lightwave Technology,
vol. 32, no. 4, pp. 629–643, 2014. doi: 10.1109/JLT.2013.2284134