

**Delft University of Technology** 

#### Spin Wave Based Approximate Computing

Mahmoud, Abdulqader; Vanderveken, Frederic; Ciubotaru, Florin; Adelmann, Christoph; Hamdioui, Said; Cotofana, Sorin

DOI 10.1109/TETC.2021.3136299

**Publication date** 2022

**Document Version** Final published version

Published in IEEE Transactions on Emerging Topics in Computing

**Citation (APA)** Mahmoud, A., Vanderveken, F., Ciubotaru, F., Adelmann, C., Hamdioui, S., & Cotofana, S. (2022). Spin Wave Based Approximate Computing. *IEEE Transactions on Emerging Topics in Computing*, *10*(4), 1932-1940. https://doi.org/10.1109/TETC.2021.3136299

#### Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

**Takedown policy** Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

## Green Open Access added to TU Delft Institutional Repository

### 'You share, we take care!' - Taverne project

https://www.openaccess.nl/en/you-share-we-take-care

Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. Received 23 April 2021; revised 11 December 2021; accepted 14 December 2021. Date of publication 24 December 2021; date of current version 6 December 2022.

Digital Object Identifier 10.1109/TETC.2021.3136299

## **Spin Wave Based Approximate Computing**

# ABDULQADER MAHMOUD<sup>®</sup>, (Graduate Student Member, IEEE), FREDERIC VANDERVEKEN<sup>®</sup>, FLORIN CIUBOTARU<sup>®</sup>, (Member, IEEE), CHRISTOPH ADELMANN<sup>®</sup>, SAID HAMDIOUI<sup>®</sup>, (Senior Member, IEEE), AND SORIN COTOFANA<sup>®</sup>, (Fellow, IEEE)

Abdulqader Mahmoud, Said Hamdioui, and Sorin Cotofana are with the Computer Engineering Laboratory, Delft University of Technology, 2628, CD, Delft, The Netherlands Frederic Vanderveken, Florin Ciubotaru, and Christoph Adelmann are with IMEC, 5656 Leuven, Belgium

(Corresponding author: ABDULQADER MAHMOUD (A.N.N.Mahmoud@tudelft.nl).)

This work was supported in part by the European Union's Horizon 2020 Research and Innovation Program within the FET-OPEN Project CHIRON under Grant 801055, also in part by Imec's Industrial Affiliate Program on Beyond-CMOS logic. The work of Frederic Vanderveken supported in part by Flanders Research Foundation (FWO) through under Grant 1S05719N.

ABSTRACT By their very nature Spin Waves (SWs) enable the realization of energy efficient circuits, as they propagate and interfere within waveguides without consuming noticeable energy. However, SW computing can be even more energy efficient by taking advantage of the approximate computing paradigm as many applications, e.g., multimedia and social media, are error-tolerant. In this paper, we propose an ultra-low energy Approximate Full Adder (AFA) and an Approximate 2-bit inputs Multiplier (AMUL). AFA consists of one Majority gate whereas AMUL is built by means of 3 AND gates. We validate the correct functionality of our proposal by means of micromagnetic simulations and evaluate AFA's figures of merit against state-of-the-art accurate SW, 7nm CMOS, Spin Hall Effect (SHE), Domain Wall Motion (DWM), accurate and approximate 45nm CMOS, Magnetic Tunnel Junction (MTJ), and Spin-CMOS FA implementations. Our results indicate that AFA consumes 38% and 6% less energy than state-of-the-art accurate SW and 7nm CMOS FA implementations, respectively. Moreover, it saves 56% and 20% energy when compared with accurate and approximate 45nm CMOS counterparts, respectively. Furthermore, it provides 2 orders of magnitude energy reduction when compared with accurate SHE, accurate and approximate DWM, MTJ, and Spin-CMOS, counterparts. In addition, it achieves the same error rate as approximate 45nm CMOS and Spin-CMOS FAs whereas it exhibits 50% less error rate than the approximate DWM FA. Last but not least, it outperforms its contenders in terms of area by saving at least 29% chip real-estate. AMUL is evaluated and compared with state-of-the-art SW and 16nm CMOS accurate and approximate designs. The evaluation results indicate that AMUL energy consumption is at least 2.8x and 2.6x smaller than the one of state-of-the-art SW and 16nm CMOS accurate and approximate designs, respectively. AMUL has an error rate of 25%, whereas the approximate CMOS multiplier has an error rate of 38%, and requires at least 64% less chip real-estate than the CMOS counterpart.

**INDEX TERMS** Spin-wave, spin-wave computing, approximate computing, full adder, multiplication, energy consumption

#### I. INTRODUCTION

In the last decades, CMOS downscaling enabled the implementation of high performance computing platforms required to process the huge data amount originating from the information technology revolution [1]. However, it became difficult to maintain the downscaling pace due to [2]: (i) leakage wall, (ii) reliability wall, and (iii) cost wall. This implies that Moore's law will come to an end sooner or latter and, as a result, researchers have started to explore different technologies, e.g., memristors [3]–[6], graphene devices [7]–[9], and spintronics [10]–[13]. Among them, Spin Wave (SW) technology stands apart as one of the most promising due to its [14]–[18]: (i) Ultra-low energy consumption - SW computing depends on wave interference instead of charge movements, (ii) Acceptable delay, and (iii) High scalability - SW wavelengths can reach the nanometer range.

Driven by the potential to build energy efficient circuits, several SW based logic gates and circuits have been reported

2168-6750 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. [16]-[30]. The Mach-Zehnder interferometer was utilized to build a SW NOT gate, which is considered as the first SW computing device [19]. Moreover, XNOR, (N)AND, and (N) OR gates were reported by making use of this Mach-Zehnder interferometer [20]-[22]. Whereas the Mach-Zehnder interferometer utilises SW amplitude to perform the logic operations, other devices utilize the SW phase or both phase and amplitude to build fanout enabled Majority, (N)AND, (N) OR, and X(N)OR gates [18]. The SW frequency was utilised as an additional parameter to improve data storage and computing capabilities of multi-frequency Majority and X(N)OR gates [17], [23]. In addition, physical realizations of spin wave devices were reported in [24]-[27]. Furthermore, SW circuits were proposed at conceptual level, i.e., without simulation or experimental results, [28], [29], at simulation level, e.g., 2-bit inputs SW multiplier [16] and magnonic half-adder [31], as well as practical mm range prototypes [30].

All the aforementioned logic gates and circuits were designed to provide accurate results. However, many current applications like multimedia and social media processing are error tolerant and, within certain bounds, are not fundamentally perturbed by computational errors [32]. Therefore, such applications can benefit from approximate computing circuits, which can save significant amounts of energy, delay, and area, while providing acceptable accuracy. In view of this, this paper introduces a novel energy efficient Approximate SW-based Full Adder (AFA) and an Approximate 2-bit inputs Multiplier (AMUL). The main contributions can be summarized as follows:

- Developing and designing a SW based approximate FA which consists of one Majority gate and provides a 25% error rate.
- Developing and designing a SW based Approximate 2bit inputs MUL which consists of 3 AND gates and provides a 25% error rate.
- Validating the functionality and demonstrating the superiority: We validated the proposed approximate circuits by means of MuMax3 simulations, assess their performance, and compared them with accurate and approximate state-of-the-art counterparts. Our results indicate that AFA consumes at least 6% less energy and provides at least the same error rate as approximate state-of-the-art counterparts. In addition, the evaluation results point out that AMUL consumes at least 2.6x less energy, and has lower error rate (25% vs 38%) than the state-of-the-art designs. Moreover, the proposed designs require the smallest chip real-estate.

The paper is organized as follows. Section II provides SW computing background. Section III introduces the proposed approximate circuits. Section IV presents the simulation setup and results. Section V provides performance evaluation and discusses variability and thermal noise effects. Finally, Section VI concludes the paper.

#### **II. SPIN WAVE BASED COMPUTING**

In this section we briefly discuss SW technology basics and the associated computing paradigm.



FIGURE 1. a) Constructive and Destructive Interference. b) Spin Wave Device.

#### A SPIN WAVE FUNDAMENTALS

The Landau-Lifshitz-Gilbert (LLG) equation describes the magnetization dynamics when the magnetic material magnetization is out of equilibrium [14]

$$\frac{d\vec{M}}{dt} = -|\gamma|\mu_0\left(\vec{M}\times\vec{H}_{eff}\right) + \frac{\alpha}{M_s}\left(\vec{M}\times\frac{d\vec{M}}{dt}\right),\tag{1}$$

where  $\gamma$  is the gyromagnetic ratio,  $\alpha$  the damping factor, M the magnetization,  $M_s$  the saturation magnetization, and  $H_{eff}$  the effective field which contains the different magnetic interactions. In this work, the effective field is the summation of the external field, exchange field, demagnetizing field, and magneto-crystalline field.

For small magnetic perturbations, Eq. (1) can be linearized and results in wave-like solutions which are known as Spin Waves (SWs). They can be seen as collective excitations of the magnetization within the magnetic material. Just like any other wave, a SW is completely described by its amplitude *A*, phase  $\phi$ , frequency *f*, wavelength  $\lambda$ , and wavenumber  $k = \frac{2\pi}{\lambda}$ . The relation between frequency *f* and wavenumber *k* is called the dispersion relation and is very important for the design of the magnonic devices [14].

#### **B** SW COMPUTATION PARADIGM

The SW amplitude and phase can be used to encode information at different frequencies, which enables parallelism [14], [17]. The interaction between multiple SWs present in the same waveguide is based on the interference principle. Figure 1(a) presents an example of interaction between 2 SWs excited with the same A,  $\lambda$ , and f in the same waveguide. If the 2 SWs have the same phase  $\Delta \phi = 0$ , they interfere constructively resulting in a SW with higher amplitude. On the other hand, if they are out of phase  $\Delta \phi = \pi$ , they interfere destructively, resulting in approximately zero amplitude SW. Moreover, SW interference provides natural support for Majority function evaluation as if an odd number of SWs interfere the resultant SW is obtained by a Majority decision. For example, if 3 SWs with the same A,  $\lambda$ , and f interfere, then the resultant SW either has a phase of 0 if at most 1 input SW has a phase of  $\pi$  or has a phase of  $\pi$  if at most 1 input SW has a phase of 0. Note that such a 3-input



FIGURE 2. Approximate Spin Wave Based FA.

Majority gate in CMOS implementation requires 18 transistors whereas in SW technology it is implemented using one waveguide only. More complex interference cases exist if the propagating SWs have different A,  $\lambda$ , and f, which might be of interest for designing novel magnonic computing systems. However, in this paper, we focus on the simplest case where all the excited SWs have the same A,  $\lambda$ , and f and can take two discrete phases  $\phi = 0$  and  $\phi = \pi$ . Logic 0 refers to a SW with  $\phi = 0$ , and a logic 1 refers to a SW with  $\phi = \pi$ .

Figure 1(b) presents a generic SW logic device that consists of four regions: Excitation Stage I, Waveguide B, Functional Region FR, and Detection Stage O [14]. In I, SWs are generated by means of, e.g., microstrip antennas [14], magnetoelectric cells [14], or Spin Orbit Torques [14]. B is the medium for SW propagation and can be made of different magnetic materials, e.g., Permalloy, Yttrium iron garnet (YIG), or CoFeB [14]. The waveguide material is an important parameter as it fundamentally determines the SW properties. In the FR, SWs can be amplified, normalized or interfere with other SWs. In O, the output SW is captured and converted to the electrical domain using the same type of cells as in *I*. Two main SW detection techniques are in place [14]: phase and threshold based. In phase detection, the output is determined by comparing the detected SW phase with a predefined phase. For example, if the detected SW has a phase of  $0/\pi$  the output is logic 0/1, respectively. On the other hand, threshold detection determines the output by comparing the detected SW amplitude with a predefined threshold. For instance, if the detected SW amplitude is larger than the predefined threshold, the output is logic 1 whereas it is logic 0 otherwise.

#### **III. SW APPROXIMATE FUNCTIONS**

In this section, we introduce and analyse the SW-based Approximate Full Adder (AFA) and the 2-bit inputs Multiplier (AMUL).

TABLE 1. Accurate and approximate SW-based FA.

| XYC <sub>i</sub> | $C_o$ | $S_{ac}$ | $S_{ap}$ |
|------------------|-------|----------|----------|
| 000              | 0     | 0        | 1        |
| 001              | 0     | 1        | 1        |
| 010              | 0     | 1        | 1        |
| 011              | 1     | 0        | 0        |
| 100              | 0     | 1        | 1        |
| 101              | 1     | 0        | 0        |
| 110              | 1     | 0        | 0        |
| 111              | 1     | 1        | <u>0</u> |

#### A SW APPROXIMATE FULL ADDER

Figure 2 presents the proposed Approximate FA (AFA) structure, consisting of 3 inputs *X*, *Y*, and *C<sub>i</sub>*, and 2 outputs *S* and *C<sub>o</sub>*. Its functionality corresponds to a 3-input Majority gate that evaluates  $S = \overline{C_o} = \overline{MAJ(X, Y, C_i)}$  as suggested in [33]. The AFA generates *C<sub>o</sub>* without any error as it is detected as the Majority of *X*, *Y*, and *C<sub>i</sub>*, which is also the case in accurate FAs. On the other hand, *S* is detected with a 25% error rate as  $S = \overline{MAJ(X, Y, C_i)}$  approximates the accurate FA Sum, which equals to  $S = XOR(XOR(X, Y), C_i)$ . Table 1 presents the FA and AFA truth tables, which clarifies that the approximate FA sum *S<sub>ap</sub>* is erroneous when all inputs are 0/1.

To achieve the expected AFA behaviour, the design in Figure 2 has to be properly dimensioned. The waveguide width must be smaller or equal to the SW wavelength  $\lambda$  to achieve a clear phase front. Moreover, the SW amplitude, wavelength, and frequency must be the same at every excitation cell. The structure dimensions must be precisely determined because the interference pattern depends on the location and distances between different excitation and detection cells. For example, if the constructive interference pattern is desired when the SWs have the same phase  $\Delta \phi = 0$  and destructive when the SWs are out-of-phase  $\Delta \phi = \pi$ , then  $d_1, d_2$ , and  $d_3$ must be equal and have length  $n\lambda$  (where n = 0, 1, 2, 3, ...). If the inverted Majority operation is of interest, which is the case for S, then  $d_4$  must be  $(n + 1/2) \times \lambda$ . In contrast, if the noninverted output is required, which is the case for  $C_o$ , then  $d_5$ must be  $n\lambda$ . The AFA operation principle relies on the combined process of SW propagation and interference as follows: First, SWs are excited at X and Y and propagate diagonally until they interfere constructively or destructively depending on their phases at the connection point in the FR. Then, the resulting SW propagates and interferes constructively or destructively with the SW excited at  $C_i$  at the next connection point. This interference result generates the final SW, which travels toward the outputs. Here,  $MAJ(X, Y, C_i)$  is detected at S and  $MAJ(X, Y, C_i)$  is detected at  $C_o$ .

#### **B** SW APPROXIMATE 2-BIT INPUTS MULTIPLIER

Figure 3 presents the proposed Approximate 2-bit inputs SWbased Multiplier (AMUL). Its inputs are the 2-bit operands  $X = (X_1, X_0)$  and  $Y = (Y_1, Y_0)$  and its 4-bit output is  $Q = (Q_0, Q_1, Q_2, Q_3)$ . The AMUL consists of 4 excitation





FIGURE 3. Approximate SW-based Multiplier.

cells, 4 detection cells, and 3 AND gates that evaluate AMUL outputs as  $Q_0 = AND(X_0, Y_0)$ ,  $Q_1 = Q_2 = AND(X_1, Y_1)$ , and  $Q_3 = AND(X_0, X_1, Y_1)$ .

To evaluate the error rate, we note that the accurate multiplier (MUL) output bits are computed as  $Q_0 = (X_0, Y_0), Q_1 = XOR(AND(X_0, Y_1), AND(X_1, Y_0)), Q_2 = XOR(AND(AND(X_0, Y_1), AND(X_1, Y_0)), AND(X_1, Y_1))$ , and  $Q_3 = AND(AND(X_0, Y_0), AND(X_1, Y_1))$ . AMUL and MUL output bit values for all possible input combinations are summarized in Table 2, where  $Q_0, Q_{1ac}, Q_{2ac}$ , and  $Q_{3ac}$  designate MUL outputs and  $Q_0, Q_{1ap}, Q_{2ap}$ , and  $Q_{3ap}$  AMUL outputs. We note that since  $Q_0$  is computed as  $AND(X_0, Y_0)$  in both MUL and AMUL

TABLE 2. Accurate and approximate SW-based multiplier.

| $\overline{X_1 X_0 Y_1 Y_0}$ | $Q_0$ | $Q_{1ac}$ | $Q_{1ap}$      | $Q_{1ap}*$     | $Q_{2ac}$ | $Q_{2ap}$ | $Q_{3ac}$ | $Q_{3ap}$ | $Q_{3ap}*$ |
|------------------------------|-------|-----------|----------------|----------------|-----------|-----------|-----------|-----------|------------|
| 0000                         | 0     | 0         | 0              | 0              | 0         | 0         | 0         | 0         | 0          |
| 0001                         | 0     | 0         | 0              | 0              | 0         | 0         | 0         | 0         | 0          |
| 0010                         | 0     | 0         | 0              | 0              | 0         | 0         | 0         | 0         | 0          |
| 0011                         | 0     | 0         | 0              | 1              | 0         | 0         | 0         | 0         | 0          |
| 0100                         | 0     | 0         | 0              | $\overline{0}$ | 0         | 0         | 0         | 0         | 0          |
| 0101                         | 1     | 0         | 0              | 0              | 0         | 0         | 0         | 0         | 0          |
| 0110                         | 0     | 1         | 0              | 0              | 0         | 0         | 0         | 0         | 0          |
| 0111                         | 1     | 1         | $\overline{0}$ | 1              | 0         | 0         | 0         | 0         | 0          |
| $1\ 0\ 0\ 0$                 | 0     | 0         | $\overline{0}$ | 0              | 0         | 0         | 0         | 0         | 0          |
| 1001                         | 0     | 1         | 0              | 1              | 0         | 0         | 0         | 0         | 0          |
| 1010                         | 0     | 0         | 1              | 1              | 1         | 1         | 0         | 0         | 0          |
| 1011                         | 0     | 1         | 1              | 1              | 1         | 1         | 0         | 0         | 0          |
| 1100                         | 0     | 0         | 0              | 0              | 0         | 0         | 0         | 0         | 0          |
| 1101                         | 1     | 1         | 0              | 1              | 0         | 0         | 0         | 0         | 0          |
| 1110                         | 0     | 1         | 1              | 1              | 1         | 1         | 0         | 1         | 0          |
| 1111                         | 1     | 0         | <u>1</u>       | <u>1</u>       | 0         | <u>1</u>  | 1         | 1         | 1          |

#### TABLE 3. Simulation parameters.

| Parameters                                  | Values                  |
|---------------------------------------------|-------------------------|
| Saturation magnetization $M_s$              | $1.1 \times 10^{6}$ A/m |
| Perpendicular anisotropy constant $k_{ani}$ | 0.83 MJ/m <sup>3</sup>  |
| Damping constant $\alpha$                   | 0.004                   |
| Exchange stiffness Aexch                    | 18.5 pJ/m               |

 $Q_{0ap}$  is omitted in the Table and the erroneous AMUL output values are typeset in bold and underlined. One can observe in the Table that AMUL outputs  $Q_{1ap}$ ,  $Q_{2ap}$ , and  $Q_{3ap}$  approximate  $Q_1$ ,  $Q_2$ , and  $Q_3$ , respectively, with 31.25%, 6.25%, and 6.25% error rate. The error rates can be further reduced if threshold based output detection is utilized, as discussed in Section IV, which results in the reduction of  $Q_1$  and  $Q_3$ approximation error rate to 25% and 0%, respectively. Table 2 also includes the AMUL output values  $Q_{1ap}*$  and  $Q_{3ap}*$ obtained via threshold detection, while  $Q_{0ap}*$  and  $Q_{2ap}*$  are not reported as they are identical to  $Q_0$  and  $Q_{2ap}$ , respectively. Thus, AMUX error rate becomes 25% as it produces erroneous result for 4 out of the 16 possible input combinations.

The previously mentioned design parameters hold true for the AMUL as well. However, in contrast to AFA, AMUL relies on threshold based output detection, which means that the detection cells must be as close as possible to the last interference point. Therefore,  $d_4$ ,  $d_5$ ,  $d_6$ , and  $d_7$  values should be minimized for the AMUL design.

#### **IV. SIMULATION SETUP AND RESULTS**

In this section we describe the micromagnetic simulation setup and present and discuss AFA and AMUL simulation results.

#### A SIMULATION SETUP

We make use of a 50nm width and 1nm thick  $Fe_{60}Co_{20}B_{20}$ waveguide and the parameters specified in Table 3 [34] to validate the proposed approximate designs (AFA and AMUL) by means of MuMax3 [35] simulations. As previously mentioned, the SW wavelength should be larger than the waveguide width to improve the interference pattern. Therefore, a 55nm SW wavelength was chosen. Consequently, AFA dimensions are determined as  $d_1 = 330$ nm,  $d_2 = 880$ nm,  $d_3 = 220$ nm,  $d_4 = 80$ nm, and  $d_5 = 110$ nm and AMUL dimensions as  $d_1 = 330$ nm,  $d_2 = 880$ nm,  $d_3 = 220$ nm,  $d_4 = 40$ nm,  $d_5 = 40$ nm,  $d_6 = 40$ nm, and  $d_7 = 80$ nm. Last, based on the SW dispersion relation, the SW frequency for a wavenumber  $k = 2\pi/\lambda = 50$ rad/ $\mu$ m was calculated to correspond to a SW frequency of 10GHz.

#### **B** SIMULATION RESULTS

1-Bit Approximate FA Based on Phase Detection

Figure 4(a) to 4(h) present the AFA MuMax3 simulation results for { $X, Y, C_i$ }= {0,0,0}, {0,0,0}, {0,0,1}, {0,1,0}, {0,1,1}, {1,0,0}, {1,0,1}, {1,1,0}, and {1,1,1}, respectively. Note that blue represents logic 0 and red logic 1. One can observe in the Figure that the outputs *S* and  $C_o$  are detected as expected. For instance,  $C_o = 1$  for { $I_1, I_2, I_3$ }= {0,1,1},

VOLUME 10, NO. 4, OCT.-DEC. 2022

#### EEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING



FIGURE 4. Approximate Spin Wave Based FA MuMax3 Simulation.



FIGURE 5. Normalized Q<sub>0</sub>.



**FIGURE 6.** Normalized  $Q_1$ .

{1,0,1}, {1,1,0}, and {1,1,1}, whereas  $C_o = 0$  for { $I_1, I_2, I_3$ } = {0,0,0}, {0,0,1}, {0,1,0}, and {1,0,0}. Moreover, *S* is inverted if  $C_o = 0$ .

2-Bit Inputs Approximate MUL Based on Threshold Detection

Figures 5, 6, 7, and 8 present the AMUL MuMax3 simulation results. In the figures, the *y*-axis presents the SW  $M_x$  over  $M_s$  ratio, where  $M_x$  is the magnetization projection along the *x*-direction and  $M_s$  the saturation magnetization. Inspecting Figure 5, we observe that the dynamic magnetization amplitude at the output  $Q_0$  at time 2.7ns for the input values  $X_1Y_1X_0Y_0$ = {0011,0111,1011,1111} lies between 0.001 $M_s$  and 0.01 $M_s$  for the rest of the input combinations. Thus, by setting the detection threshold to 0.001 $M_s Q_0$  is always properly detected.



FIGURE 7. Normalized  $Q_2$ .



**FIGURE 8.** Normalized  $Q_3$ .

A similar approach can be applied to Figure 6 for the determination of  $Q_1$  threshold value. For instance, the SW amplitude for the input combinations  $X_1Y_1X_0Y_0$ = {0101,0111,1001,1011,1100,1101,1110,1111} is larger than 0 when reading them at time 2.76ns. On the other hand, for the other input combinations, magnetization amplitude is less than 0. Therefore, if the detection threshold is set to 0  $Q_1$  value can be derived. Note that this approach for determining the threshold value further reduces the theoretically predicted  $Q_1$  error rate from 31.25% to 25%.

The threshold in Figure 7 is determined in the same way. The SW magnetization for input combinations  $X_1Y_1X_0Y_0$ = {1100,1101,1110,1111} is larger than 0.0005 $M_s$  when reading them at time 2.76ns, whereas, in the other cases, magnetization amplitude is less than 0.0005 $M_s$ . Therefore, if the detection threshold is set to 0.0005 $M_s$   $Q_2$  is properly obtained with 6.25% error rate.

Finally, Figure 8 is analyzed in the same manner. The SWs magnetization for input combination  $X_1Y_1X_0Y_0=\{1111\}$  is larger than  $0.0014M_s$  when reading them at time 2.76ns, whereas, in the other cases, the magnetization amplitude is less than  $0.0014M_s$ . Therefore, if the detection threshold is set to  $0.0014M_s Q_3$  can be obtained with 0% error rate.

#### V. PERFORMANCE EVALUATION AND DISCUSSION

In this section, the proposed AFA and AMUL are evaluated and compared with the state-of-the-art designs. Furthermore, variability and thermal noise effects are discussed in addition to some SW technology open issues and challenges.

| Technology     | Type        | Error Rate | Energy (fJ) | Delay (ns) | Device No. |
|----------------|-------------|------------|-------------|------------|------------|
| CMOS [37]      | Accurate    | 0          | 0.066       | 0.005      | 28         |
| CMOS [40]      | Accurate    | 0          | 0.14        | 0.12       | 24         |
| CMOS [40]      | Approximate | 0.25       | 0.077       | 0.1        | 14         |
| MTJ [41]       | Accurate    | 0          | 5685        | 3.019      | 29         |
| MTJ [41]       | Approximate | 0.5        | 5109        | 3.016      | 25         |
| MTJ [41]       | Approximate | 0.5        | 2471        | 3.152      | 29         |
| SHE [38]       | Accurate    | 0          | 4970        | 7          | 26         |
| DWM [39]       | Accurate    | 0          | 74.5        | 0.877      | 26         |
| Spin CMOS [33] | Accurate    | 0          | 166.7       | 3          | 34         |
| Spin CMOS [33] | Approximate | 0.25       | 58          | 2          | 34         |
| Spin Wave [36] | Accurate    | 0          | 0.1         | 2.86       | 7          |
| Spin Wave      | Approximate | 0.25       | 0.062       | 1.84       | 5          |

#### Performance Evaluation

To gain more insight into the practical implications of our proposal, we compare the AFA with the state-of-the-art accurate SW [36], 7nm CMOS [37], SHE [38], DWM [39], accurate and approximate 45nm CMOS [40], MTJ [41], and Spin-CMOS [33] counterparts in terms of energy, delay, and area (the number of utilized devices). We base our evaluation on the following assumptions: (i) Magnetoelectric (ME) cells are utilized for SW excitation and detection. ME power consumption and delay are 34nW and 0.42ns, respectively [42]. (ii) During propagation and interference, SWs consume negligible amount of energy. (iii) Pulse signals are used to excite SWs. The energy and delay of the pulse signal generation in addition to the synchronization are not taken into consideration in the energy and delay calculations because it is not yet known which transducer will be utilized to excite the spin waves. We note that due to SW technology early stage of development the aforementioned assumptions might need to be re-evaluated when it becomes more mature.

The AFA delay is calculated by adding two ME cells delay to the 1.84ns SW propagation delay through the waveguide determined by means of micromagnetic simulations. Table 4 presents the results of the evaluation and comparison. Inspecting the Table, it is clear that AFA outperforms state-of-the-art 7nm CMOS [37] accurate FA by an energy reduction of approximately 6%, while exhibiting a more than 2 orders of magnitude larger delay. Furthermore, AFA saves approximately 56% and 20% energy while requiring 15x and 18x larger delay when compared with 45nm CMOS based accurate and approximate FAs, respectively, while having the same error rate as the approximate FA in [40]. When compared with other emerging technology-based designs, AFA consumes 5 orders of magnitude less energy than MTJ based accurate and approximate FAs while exhibiting 42% lower delay and having 50% better error rate than the MTJ approximate FA in [41]. Moreover, AFA consumes 5 and 3 orders of magnitude less energy than SHE- and DWM- based accurate FAs, respectively, has 3.8x lower and 52% more delay than SHE [38] and DWM [39] based FAs, respectively. Furthermore, AFA consumes approximately 4 and 3 orders of magnitude less energy while providing 38% and 8% lower delay in comparison with the accurate and approximate Spin-CMOS

| TABLE 5. | 2-bit inputs multi | plier performance | comparison. |
|----------|--------------------|-------------------|-------------|
|----------|--------------------|-------------------|-------------|

| Design                | CMOS                 | [43], [44] | SW                        | Proposed<br>SW MUL           |             |
|-----------------------|----------------------|------------|---------------------------|------------------------------|-------------|
| Implemented<br>method | -                    |            | Coupler<br>Cascad-<br>ing | Conversion<br>Cascad-<br>ing | -           |
| Туре                  | Accurate Approximate |            | Accurate                  | Accurate                     | Approximate |
| Error Rate            | 0                    | 0.38       | 0                         | 0                            | 0.25        |
| Energy<br>(aJ)        | 959                  | 300        | 320                       | 430                          | 115         |
| Delay (ns)            | 0.1                  | 0.06       | 21                        | 1.68                         | 3.6         |
| Device No.            | 52                   | 30         | 22                        | 30                           | 8           |

based FAs, respectively, while having the same error rate as the approximate FA in [33]. Last but not least, AFA outperforms the SW based accurate FA [36] by diminishing the energy consumption and delay with 38% and 35%, respectively. Note that, as a chip real-estate estimation, the proposed approximate FA requires the lowest number of devices.

Under the same assumptions AMUL delay is 3.3ns and we compare it with the state-of-the-art SW [16] and CMOS [43] counterparts. As delay figures are not mentioned for the approximate multiplier in [43], its energy consumption was estimated based on the 16nm CMOS figures provided in [44]. Table 5 presents the results of the evaluation and comparison. Inspecting the Table, it is clear that AMUL outperforms accurate 16nm CMOS [43] and approximate 16nm CMOS [43] counterparts by diminishing the energy consumption by 8x and 2.6x while exhibiting 36x and 60x larger delay, respectively. AMUL provides an error rate of 25% while 38% is the error rate for the approximate CMOS counterpart [43]. Note that the error rate is calculated by determining the total number of erroneous multiplication results generated by the multiplier divided by the total number of cases which is 16 in this case.

When compared with accurate MUL SW implementations, AMUL provides 2.8x and 3.7x energy reduction and has approximately 6x lower and 2.5x higher delay in comparison with the SW coupler and conversion based MUL implementations, respectively. We note that the SW propagation delay is neglected into the evaluation of the SW conversion based MUL in [16]. One can observe from the Table that the proposed MUL requires less ME cells than the SW designs in [16], which indicates that the design in [16] has a larger area and by implication a larger delay when also SW propagation is considered.

#### Variability and Thermal Effect

In this paper, the main target is to propose and validate by means of micromagnetic simulations the approximate FA and MUL as proof of concept without considering the impact from the thermal noise and its variability. However, it was reported that thermal noise has limited effect on the gate functionality and consequently the gate works correctly at different temperatures [45]. In addition, the effect of the edge roughness and the waveguide trapezoidal cross section were demonstrated [45]. It was suggested that both effects are very small and the gate operates correctly at their presence as well [45]. Therefore, we don't expect neither the thermal



b)

FIGURE 9. a) Cascaded MAJ3 Gates, b) Spin Wave Waveform Analysis at  $I_1I_2I_3I_4I_5 = 00011$ .

noise nor the geometrical variability to have large impact on the proposed circuits. However, we plan to investigate these phenomena in the future.

Discussion

Although our evaluation demonstrates that the SW technology fulfills the requirements to improve the state-of-theart in terms of energy as well as area consumption, but a number of open issues are still to be solved [14].

Cascadability. The design and realization of SW based circuits requires, apart of SW based gate structures availability, gate cascading techniques. Note that gate cascading into the SW domain is not as straightforward as in CMOS where logic values 0 and 1 are represented as 0V and VDD, respectively, at both gate inputs and output. Thus, CMOS gate outputs can directly drive other gates inputs without requiring any type of post processing. Unfortunately, this is not the case for SW gates operating with phase encoded information. SW interference happening in such a gate always generates the correct output phase but the output SW may have different strengths (amplitude), i.e., strong SW, if the interference has been constructive (the interfering input waves have the same phases) or weak SW, if it was destructive (at least one of the interfering waves has a different phases). For example, if two inputs of a SW majority gate are 0, while the third one is 1, a weak 0 SW (amplitude A) is generated while if all inputs are 0 a strong 0 SW output (amplitude 3A) is produced. Thus, if two majority gates are cascaded, the amplitude difference at the driving gate output can induce wrong results at the driven gate

output, which has been designed to operate on A amplitude input SWs and cannot properly accommodate a 3A SW input. For example, let us assume the circuit in Figure 9 and the input combination  $I_1I_2I_3I_4I_5 = 00011$ . Given that the first gate is producing a strong 0 the second gate output will not be 1, as it should, but 0 as the 0 3A SW input is dominant. Therefore, a certain mechanism for SW amplitude normalization is required at SW gate output in order to guaranty proper circuit behavior. Two intermediate transducers or a repeater can be utilized to normalize the output of the first gate before passing it to the next gate, but such an approach is power inefficient. Alternatively, normalization can be achieved by means of a Directional Coupler (DC) [16], but it adds large delay and area to the circuit. Thus, to enable the realization of low power SW circuits, more efficient DC implementations or alternative amplitude normalization approaches should be identified.

*Fanout Achievement.* The realization of any relevant circuit requires gate fanout capabilities as one gate output is often utilized as input for more than one gate. Majority gates with special topologies that provide up to 4 fanout, which is sufficient for most of practical circuit implementations, have been proposed in [18]. Note that, larger than 4 fanout capabilities could be potentially achieved by means of a SW amplitude and a splitter, but power efficient SW splitters and amplifiers are still to be developed.

Clocking and Operation Mode. Clocking, the necessary evil without which the large majority of computation platforms cannot properly function, is also an important contributor to the overall circuit complexity and performance. If information transfer from SW to charge and back is performed at each and every circuit gate output a complex clocking system is required to control the gate output sampling process. However, if SW amplitude normalizers are at hand only the SW circuit output have to be sampled, in the same way pipeline stages outputs are sampled in a processor pipeline. This substantially diminishes the clock distribution network complexity and allows for lower clock frequency utilization, which significantly reduces the energy consumption. Another essential aspect for SW circuit energy consumption is the transducer operation mode. If input SWs excitation is performed by the continuous application of voltages to the input transducer's the overall energy consumption is determined by the transducer power and the island critical path delay. However, if transducers are operated in pulses the energy becomes island delay independent as it is mainly determined by the transducer power and delay, thus pulse operation should be targeted.

*Immature Technology.* It seems that ME cells are the right option to excite and detect SWs based on phase or threshold detection due to their ultra-low energy consumption, acceptable delay, and scalability [14]. For phase-based detection the transducer can be general purpose as it only needs to tell if the sensed SW is in phase or out of phase with a certain reference SW. For threshold detection, the transducer needs to embed a built-in threshold value according to the gate which output it senses. Conceptually speaking, threshold value tuning can be achieved by properly adjusting transducer's dimensions and

stack layered structure, however, the actual threshold value adjustment is still an open issue as up to date no scalable energy efficient spin wave transducers have been experimentally realized.

*Scalability*. In terms of area, SW circuits have great scaling potential as for proper functionality SW device dimensions must be greater or equal than the SW wavelength, which can reach down to the nm range. Several SW circuit area benchmarkings have been reported [42] which indicate that hybrid spin-wave–CMOS circuits have very small area. Although the assumptions in the benchmarking might not be fully realistic, they give an indication regarding the expected area. For example, the area of a 32-bit divider (DIV32) implemented in hybrid SW-CMOS is roughly about 3.5x smaller than the one of the 10nm CMOS counterpart.

#### **VI. CONCLUSION**

We proposed and validated, by means of micromagnetic simulations, a novel approximate energy efficient spin-wave based Full Adder (AFA) and a 2-bit inputs multiplier (AMUL). Both designs were evaluated and compared with the state-of-the-art counterparts. AFA saves 35% and 6% energy when compared with the state-of-the-art SW and 7nm CMOS, respectively, and 56% and 20% in comparison with accurate and approximate 45nm CMOS, respectively. In addition, it saves more than 2 orders of magnitude when compared with accurate Spin Hall Effect (SHE), and accurate and approximate Domain Wall Motion (DWM), Magnetic Tunnel Junction (MTJ), and Spin-CMOS FAs. Moreover, it achieves the same error rate as approximate 45nm CMOS and Spin-CMOS FA whereas it exhibits 50% less error rate than approximate DWM FA and requires at least 29% less chip real-estate in comparison with the other state-of-the-art designs. At its turn, AMUL energy consumption is at least 2.5x smaller the one of state-of-the-art accurate SW designs and 16nm CMOS accurate and approximate designs. Moreover, AMUL exhibits an error rate of 25%, while the approximate CMOS MUL one of 38%, and requires at least 64% less chip real-estate.

#### REFERENCES

- N. D. Shah, E. W. Steyerberg, and D. M. Kent, "Big data and predictive analytics: Recalibrating expectations," *J. Amer. Med. Assoc.*, vol. 320, pp. 27–28, 2018.
- [2] D. Mamaluy and X. Gao, "The fundamental downscaling limit of field effect transistors," *Appl. Phys. Lett.*, vol. 106, no. 19, 2015, Art. no. 193503.
- [3] Y. Halawani, B. Mohammad, M. Al-Qutayri, and S. F. Al-Sarawi, "Memristor-based hardware accelerator for image compression," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 26, no. 12, pp. 2749–2758, Dec. 2018.
- [4] I. Vourkas, D. Stathis, and G. H. Sirakoulis, "Massively parallel analog computing: Ariadne's thread was made of memristors," *IEEE Trans. Emerg. Top. Comput.*, vol. 6, no. 01, pp. 145–155, Jan. 2018.
- [5] M. Maestro-Izquierdo *et al.*, "Experimental verification of memristorbased material implication nand operation," *IEEE Trans. Emerg. Top. Comput.*, vol. 7, no. 04, pp. 545–552, Oct. 2019.
- [6] P. Pouyan, E. Amat, and A. Rubio, "Memristive crossbar memory lifetime evaluation and reconfiguration strategies," *IEEE Trans. Emerg. Top. Comput.*, vol. 6, no. 02, pp. 207–218, Apr. 2018.
- [7] R. N. Sajjad and A. W. Ghosh, "Novel switching mechanism with angle dependent transmission through graphene based pn junction," in *Proc. 71st Device Rese. Conf.*, 2013, pp. 107–108.

- [8] Y. Banadaki and A. Srivastava, "Scaling effects on static metrics and switching attributes of graphene nanoribbon fet for emerging technology," *IEEE Trans. Emerg. Top. Comput.*, vol. 3, no. 04, pp. 458–469, Oct. 2015.
- [9] A. Nishad and R. Sharma, "Performance improvement in SC-MLGNRS interconnects using interlayer dielectric insertion," *IEEE Trans. Emerg. Top. Comput.*, vol. 3, no. 04, pp. 470–482, Oct. 2015.
- [10] S. Agarwal *et al.*, "International roadmap of devices and systems 2017 edition: Beyond CMOS chapter," Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep. SAND2018-3550R, 2018. [Online]. Available: https://irds.ieee.org/images/files/pdf/2017/2017IRDS\_BC.pdf
- [11] M. Zabihi, Z. Chowdhury, Z. Zhao, U. R. Karpuzcu, J. Wang, and S. S. Sapatnekar, "In-memory processing on the spintronic CRAM: From hardware design to application mapping," *IEEE Trans. Comput.*, vol. 68, no. 08, pp. 1159–1173, Aug. 2019.
- [12] Y. Bai, R. F. DeMara, J. Di, and M. Lin, "Clockless spintronic logic: A robust and ultra-low power computing paradigm," *IEEE Trans. Comput.*, vol. 67, no. 05, pp. 631–645, May 2018.
- [13] V. Vyas, L. Jiang-Wei, P. Zhou, X. Hu, and J. S. Friedman, "Karnaugh map method for memristive and spintronic asymmetric basis logic functions," *IEEE Trans. Comput.*, vol. 70, no. 01, pp. 128–138, Jan. 2021.
- [14] A. Mahmoud *et al.*, "Introduction to spin wave computing," *J. Appl. Phys.*, vol. 128, no. 16, 2020, Art. no. 161101. [Online]. Available: https://doi. org/10.1063/5.0019328
- [15] A. Barman et al., "The 2021 magnonics roadmap," J. Phys., Condens. Matter, vol. 33, 2021, Art. no. 413001.
- [16] A. N. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Cotofana, and S. Hamdioui, "Spin wave normalization toward all magnonic circuits," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 68, no. 1, pp. 536–549, Jan. 2021.
- [17] A. N. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui, and S. Cotofana, "Multi-frequency data parallel spin wave logic gates," *IEEE Trans. Magn.*, vol. 57, no. 5, pp. 1–12, May 2021.
- [18] A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui, and S. Cotofana, "4-output programmable spin wave logic gate," in *Proc. IEEE 38th Int. Conf. Comput. Des.*, 2020, pp. 332–335.
- [19] M. P. Kostylev, A. A. Serga, T. Schneider, B. Leven, and B. Hillebrands, "Spin-wave logical gates," *Appl. Phys. Lett.*, vol. 87, no. 15, 2005, Art. no. 153501. [Online]. Available: https://doi.org/10.1063/1.2089147
- [20] T. Schneider, A. A. Serga, B. Leven, B. Hillebrands, R. L. Stamps, and M. P. Kostylev, "Realization of spin-wave logic gates," *Appl. Phys. Lett.*, vol. 92, no. 2, 2008, Art. no. 022505. [Online]. Available: https://doi.org/ 10.1063/1.2834714
- [21] K.-S. Lee and S.-K. Kim, "Conceptual design of spin wave logic gates based on a mach-zehnder-type spin wave interferometer for universal logic functions," *J. Appl. Phys.*, vol. 104, no. 5, 2008, Art. no. 053909. [Online]. Available: https://doi.org/10.1063/1.2975235
- [22] I. A. Ustinova, A. A. Nikitin, A. B. Ustinov, B. A. Kalinikos, and E. Lähderanta, "Logic gates based on multiferroic microwave interferometers," in *Proc. 11th Int. Workshop Electromagn. Compat. Integr. Circuits.*, 2017, pp. 104–107.
- [23] A. Khitun, "Multi-frequency magnonic logic circuits for parallel data processing," J. Appl. Phys., vol. 111, no. 5, p. 054307, 2012. [Online]. Available: https://doi.org/10.1063/1.3689011
- [24] K. Vogt, F. Y. Fradin, J. E. Pearson, T. Sebastian, S. D. Bader, B. Hillebrands, A. P. Hoffmann, and H. Schultheiss, "Realization of a spin-wave multiplexer," *Nature Communications*, vol. 5, p. 3727, 2014.
- [25] M. Balinskiy, H. Chiang, and A. Khitun, "Realization of spin wave switch for data processing," *AIP Advances*, vol. 8, no. 5, p. 056628, 2018. [Online]. Available: https://doi.org/10.1063/1.5004992
- [26] M. Balynskiy, H. Chiang, D. Gutierrez, A. Kozhevnikov, Y. Filimonov, and A. Khitun, "Reversible magnetic logic gates based on spin wave interference," *J. Appl. Phys.*, vol. 123, no. 14, pp. 144501, 2018. [Online]. Available: https://doi.org/10.1063/1.5011772
- [27] P. Fischer, D. Sanz-Hern ández, R. Streubel, and A. Fern ández-Pacheco, "Launching a new dimension with 3D magnetic nanostructures," APL Mater., vol. 8, no. 1, p. 010701, 2020.
- [28] A. Khitun and K. L. Wang, "Non-volatile magnonic logic circuits engineering," J. Appl. Phys., vol. 110, no. 3, 2011, Art. no. 034306. [Online]. Available: https://doi.org/10.1063/1.3609062
- [29] M. Rahman, S. Khasanvis, J. Shi, and C. A. Moritz, "Wave interference functions for neuromorphic computing," *IEEE Trans. Nanotechnol.*, vol. 14, no. 4, pp. 742–750, Jul. 2015.
- [30] F. Gertz, A. Kozhevnikov, Y. Filimonov, and A. Khitun, "Magnonic holographic memory," *IEEE Trans. Magn.*, vol. 51, no. 4, pp. 1–5, Apr. 2015.

#### IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING

- [31] D. Petti, "Building a half-adder based on spin waves," *Nature Electron.*, vol. 3, no. 12, pp. 736–737, 2020.
- [32] S. Mittal, "A survey of techniques for approximate computing," ACM Comput. Surv., vol. 48, no. 4, pp. 1–33, 2016.
- [33] S. Angizi, H. Jiang, R. F. DeMara, J. Han, and D. Fan, "Majority-based spin-CMOS primitives for approximate computing," *IEEE Trans. Nanotechnol.*, vol. 17, no. 4, pp. 795–806, Jul. 2018.
- [34] T. Devolder et al., "Time-resolved spin-torque switching in mgO-based perpendicularly magnetized tunnel junctions," *Phys. Rev. B*, vol. 93, Jan 2016, Art. no. 024420. [Online]. Available: https://link.aps.org/doi/ 10.1103/PhysRevB.93.024420
- [35] A. Vansteenkiste, J. Leliaert, M. Dvornik, M. Helsen, F. Garcia-Sanchez, and B. Van Waeyenberge, "The design and verification of MuMax3," *AIP Adv.*, vol. 4, no. 10, 2014, Art. no. 107133. [Online]. Available: https:// doi.org/10.1063/1.4899186
- [36] A. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana, and S. Hamdioui, "Spin wave based full adder," in *Proc. IEEE Int. Symp. Circuits Syst.*, 2021, pp. 1–5.
- [37] T. F. Canan, S. Kaya, A. Karanth, and A. Louri, "Ultracompact and low-power logic circuits via workfunction engineering," *IEEE J. Explor. Solid-State Comput. Devices Circuits*, vol. 5, no. 2, pp. 94–102, 2019.
- [38] A. Roohi, R. Zand, D. Fan, and R. F. DeMara, "Voltage-based concatenatable full adder using spin hall effect switching," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 36, no. 12, pp. 2134–2138, Dec. 2017.
- [39] A. Roohi, R. Zand, and R. F. DeMara, "A tunable majority gate-based full adder using current-induced domain wall nanomagnets," *IEEE Trans. Magn.*, vol. 52, no. 8, pp. 1–7, Aug. 2016.
- [40] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, "IMPACT: IMPrecise adders for low-power approximate computing," in *Proc. IEEE/ ACM Int. Symp. Low Power Electron. Des.*, 2011, pp. 409–414.
- [41] H. Cai, Y. Wang, L. A. De BarrosNaviner, and W. Zhao, "Robust ultra-low power non-volatile logic-in-memory circuits in FD-SOI technology," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 4, pp. 847–857, Apr. 2017.
- [42] O. Zografos et al., "Design and benchmarking of hybrid CMOS-Spin wave device circuits compared to 10nm CMOS," in *Proc. IEEE 15th Int. Conf. Nanotechnol.*, 2015, pp. 686–689.
- [43] P. Kulkarni, P. Gupta, and M. Ercegovac, "Trading accuracy for power with an underdesigned multiplier architecture," in *Proc. 24th Int. Conf. Very Large Scale Integration Des.*, 2011, pp. 346–351.
- [44] Y. Chen, A. Sangai, M. Gholipour, and D. Chen, "Schottky-barrier-type graphene nano-ribbon field-effect transistors: A study on compact modeling, process variation, and circuit performance," in *Proc. IEEE/ACM Int. Symp. Nanoscale Architect.*, 2013, pp. 82–88.
- [45] Q. Wang, P. Pirro, R. Verba, A. Slavin, B. Hillebrands, and A. V. Chumak, "Reconfigurable nanoscale spin-wave directional coupler," *Sci. Adv.*, 2018. [Online]. Available: https://advances.sciencemag.org/content/4/1/e1701517



**ABDULQADER MAHMOUD** (Graduate Student Member, IEEE) received the bachelor of science degree in electrical and electronics engineering and the MS degree in electrical and computer engineering from Khalifa University, Abu Dhabi, United Arab Emirates, in 2015 and 2017, respectively. He is currently working toward the PhD degree at the Delft University of Technology, Delft, The Netherlands, where he focuses on the spin wave based circuit design. His MSc research focused on lowpower mixed signal integrated circuit design including DC–DC power converters targeting high-power efficiency.



**FREDERIC VANDERVEKEN** received the BS degree in electrical engineering and the MS degrees in nanotechnology, nanoscience, and nano-engineering from KULeuven University, Leuven, Belgium, in 2016 and 2018, respectively. He joined Imec as PhD researcher to expand the knowledge of magnetoelastic coupling for high-frequency applications. His current research interests include modelling of magnetoelastic coupling, simulations of spintronic devices, optimizations of magnetoelectric transducers, and spin logic applications.



FLORIN CIUBOTARU (Member, IEEE) received the BSc and MSc degrees in physics from Alexandru Ioan Cuza University, Iasi, Romania, in 2003 and 2005, respectively, and the PhD degree in physics from Technische Universität Kaiserslautern, Kaiserslautern, Germany, in 2012. He was the beneficiary of an EU Marie Curie Fellowship. In 2014, he joined KU Leuven and IMEC, Belgium as a postdoctoral research fellow. In 2016, started as a senior scientist with IMEC and from 2019 became principal member of Technical Staff. He currently

works on the development of logic, radio-frequency, and sensor devices based on magnetic spin-related phenomena. He is the co-author of 26 papers, two granted patents, and two patent applications.



**CHRISTOPH ADELMANN** received the PhD degree in condensed matter physics from the Université Grenoble Alpes, Grenoble, France, in 2002 for work at the CEA Grenoble. Until 2006, he was a post-doctoral research associate with the Department of Chemical Engineering and Materials Science, University of Minnesota, working on spintronic materials and devices. He subsequently joined imec, where he is currently working as a scientific director with the Thin Films Group on metallic and dielectric materials for logic, interconnects, and memory as well as on

novel devices for nanoelectronic applications. He is the technical lead for magnetoelectric logic at imec and has authored or co-authored more than 270 scientific publications in peer-reviewed journals and conference proceedings (h-index of 40), as well as 10 granted patent families and 18 pending patent applications.



SAID HAMDIOUI (Senior Member, IEEE) received the MSEE and PhD degrees (both with honors) from TU Delft, Delft, The Netherlands. He is currently chair professor on Dependable and Emerging Computer Technologies, head of the Computer Engineering Laboratory (CE-Lab), and also serving as head of the Quantum and Computer Engineering Depatment, Delft University of Technology, The Netherlands. Prior to joining TUDelft as a professor, he worked with Intel Corporation (Califorina, USA), with Philips Semiconductors R&D (Crolles, France)

and with Philips/ NXP Semiconductors (Nijmegen, The Netherlands). His research focuses on two domains: Dependable CMOS nano-computing (including testability, reliability, hardware security) and emerging technologies and computing paradigms (including memristors for logic and storage, computionin-memory, neuromorphic computing, etc). He owns two patents, has published one book and contributed to other two, and more than 200 conference and journal papers. He has been on the editorial board of many journals, and is the recipient of many international/national awards.



**SORIN COTOFANA** (Fellow, IEEE) received the MSc degree in computer science from the "Politechnica" University of Bucharest, Bucharest, Romania, and the PhD degree in electrical engineering from the Delft University of Technology, Delft, The Netherlands. He is currently with the Electrical Engineering, Mathematics and Computer Science Faculty, Delft University of Technology, The Netherlands. His current research is focused on: (i) emerging nano-devices based unconventional computing, (ii) design of dependable/reliable systems out of unpredictable/

unreliable components, and (iii) ageing assessment/prediction and lifetime reliability aware resource management. He coauthored more than 250 papers in peer-reviewed international journal and conferences, and received 12 international conferences best paper awards. He served as, e.g., associate editor of the *IEEE Transactions on Circuits and Systems I: Regular Papers*, IEEE CASS Nano-Giga TC chair, and reviewer, TPC member/chair, and general (co)-chair, for numerous international conferences. He is currently editor in chief of the *IEEE Transactions on Nanotechnology*, associate editor of the *IEEE Transactions on Computers*, CASS distinguished lecturer, and CASS BoG member. He is a HiPEAC member.