

**Delft University of Technology** 

### A 200Gb/s PAM-4 Transmitter with Hybrid Sub-Sampling PLL in 28nm CMOS Technology

Wang, Zhongkai; Choi, Minsoo; Kwon, Paul; Lee, Kyoungtae; Yin, Bozhi; Liu, Zhaokai; Park, Kwanseo; Biswas, Ayan; Du, Sijun; More Authors

DOI

10.1109/VLSITechnologyandCir46769.2022.9830237

Publication date 2022

**Document Version** Final published version

Published in 2022 IEEE Symposium on VLSI Technology and Circuits, VLSI Technology and Circuits 2022

### Citation (APA)

Wang, Z., Choi, M., Kwon, P., Lee, K., Yin, B., Liu, Z., Park, K., Biswas, A., Du, S., & More Authors (2022). A 200Gb/s PAM-4 Transmitter with Hybrid Sub-Sampling PLL in 28nm CMOS Technology. In *2022 IEEE Symposium on VLSI Technology and Circuits, VLSI Technology and Circuits 2022* (pp. 34-35). (Digest of Technical Papers - Symposium on VLSI Technology; Vol. 2022-June). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/VLSITechnologyandCir46769.2022.9830237

### Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

#### Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

# Green Open Access added to TU Delft Institutional Repository

## 'You share, we take care!' - Taverne project

https://www.openaccess.nl/en/you-share-we-take-care

Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

# A 200Gb/s PAM-4 Transmitter with Hybrid Sub-Sampling PLL in 28nm CMOS Technology

Zhongkai Wang<sup>1</sup>, Minsoo Choi<sup>2</sup>, Paul Kwon<sup>1</sup>, Kyoungtae Lee<sup>3</sup>, Bozhi Yin<sup>1</sup>, Zhaokai Liu<sup>1</sup>, Kwanseo Park<sup>1</sup>, Ayan Biswas<sup>1</sup>,

Jaeduk Han<sup>4</sup>, Sijun Du<sup>5</sup>, Elad Alon<sup>1</sup>

<sup>1</sup>University of California, Berkeley, CA, <sup>2</sup>Samsung Semiconductor, Inc., San Jose, CA,

<sup>3</sup>University of California, San Francisco, CA, <sup>4</sup>Hanyang University, Seoul, South Korea,

<sup>5</sup>Delft University of Technology, Delft, Netherlands, zhongkai@berkeley.edu

#### Abstract

This paper presents a complete 200Gb/s PAM-4 transmitter (TX) in 28nm CMOS technology. The transmitter features a hybrid sub-sampling PLL (SSPLL) with a delta-sigma ( $\Delta\Sigma$ ) modulator, clock distribution network with flexible timing control, and data path with a hybrid 5-tap Feed-Forward Equalizer (FFE) and T-coil for bandwidth extension. The prototype chip achieves 4.69 pJ/bit efficiency, 54mV eye height, 0.27UI eye width, and 97% RLM under ~6dB channel loss at 50GHz.

Keywords: SerDes, Transmitter, Sub-sampling PLL, 28nm, CMOS.

#### Introduction

As the data rate of ultra-high-speed wireline IOs doubles every 3-4 years, the transmitters are required to support 200+Gb/s [1-2] data rates, which is extremely challenging for standard planar CMOS technologies and necessitates significant design effort and advanced techniques. Therefore, we present a complete 200Gb/s PAM-4 TX in 28nm CMOS technology to address the stringent bandwidth and jitter requirements by utilizing a low-jitter hybrid sub-sampling PLL with delta-sigma ( $\Delta\Sigma$ ) modulator, a clock distribution network with flexible timing control, and a high-bandwidth data path with a 5-tap FFE. The design is implemented using a generatorbased flow [3] to enhance design productivity.

### **TX Architecture and Circuit Design**

The overall TX architecture is shown in Fig. 1. The SSPLL and quadrature clock generator produce the low-jitter 4-phase 25GHz clocks, while quadrature and the duty cycle errors are corrected by the current-starving variable delays and accoupling resonant buffers, respectively. The buffers drive the 4-to-1 multiplexers (MUXs) in the data path and the C<sup>2</sup>MOS clock dividers in the clock distribution network. The phases of the divided clocks are adjusted by the inverter-based phase interpolator and digital-controlled delay line to maximize the time margin between the retimers, 8-to-4 MUXs, and 4-to-1 MUXs across PVT corners. The data path consists of a pattern generator, 128:8 serializer, thermometer encoder, retimer, and 18 FFE driver segments. To extend the data path bandwidth, T-coils are added at the output node to isolate the MUX/driver cells and ESD diodes (supporting >1-KV HBM and 250-V CDM levels) from the capacitive loading of the resistors, pullup current sources, and PADs. The flexible 5-tap coarse-fine FFE is adjusted to cancel the pre-and post-inter-symbol interferences (ISIs).

A. Quadrature Clock Generation: One of the main considerations related to clock generation/distribution in planar technologies is the topology of voltage-controlled oscillator (VCO) and quadrature clock generation. A quadrature VCO generating 4-phase clocks directly occupies large layout area with two coupled resonators. On the other

hand, quadrature clock phases can be generated by a single VCO and clock divider operating at half rate (50GHz), which significantly increases power and complexity. In this design, a differential LC VCO at quarter-rate and quadrature clock generation circuit is selected to address the aforementioned issues. As shown in Fig. 2, a Class-C VCO with a tail inductor is used to reduce the noise contribution of the current source, which improves the phase noise by ~5dB. The VCO tank includes a 4-bit capacitor DAC, fine-tuning varactor, and proportional varactor. An injection-locked quadrature generator [4] is used to generate the 4-phase clocks.

B. Hybrid SSPLL: In Fig. 2, a hybrid SSPLL is designed based on an analog proportional path and a digital integral path. The VCO generates differential clocks with frequency between 23.6GHz and 28.6GHz and drives an ac-coupled TIA-based buffer. The buffer output is directly sampled by the subsampling phase detector (SSPD) and also connected to the divider chain, composed of a C<sup>2</sup>MOS divider and static flipflops (FFs) dividers. The divider output and reference clock go into the dead-zone phase detector (DZPD) for frequency locking. For the proportional path, the SSPD and DZPD drive the sub-sampling charge pump (SSCP) and the traditional charge pump (CP), respectively, whose currents are summed into the analog loop filter (LF). In the integral path, the outputs of the SSPD are sampled by a comparator, and the DZPD output is synchronized by a reference clock. These digital outputs are added together and integrated by the digital LF. The hybrid loop reduces the capacitor area of the LF and provides more programmable integral control. A  $\Delta\Sigma$  modulator is inserted between the digital filter and the 9-bit DAC to increase the integral control accuracy and reduce ripples on the proportional and integral control signals.

C. Feed-Forward Equalizer: Fig. 3 shows the circuit diagram of the coarse-fine FFE scheme in a FFE driver segment. For coarse coefficient adjustment, the 1-UI delay is implemented in quarter rate by multiplexing proper data (D8<0:7>) and clock (C8<0:3>) into the 8-to-4 MUX, which increases the time margin for the MUX and enables the implementation of the 5-tap FFE at a lower data rate. The fine FFE is adjusted in the 4-to-1 MUX/driver stage by tuning the gate voltage of the cascode transistor (M2) with a 7-bit DAC, which provides a tap weight resolution of 0.6mVppd.

#### **Measurement Results**

The proposed TX was designed, generated, and fabricated in a 28nm planar CMOS technology using an open-source circuit generation framework [1] (Fig. 4). Due to the bandwidth of probes, cables, DC blocks and scope headers, the total channel loss at baud-rate frequency is ~6dB. Fig. 5 shows the measurement results of the PLL phase noise and spectrum at the C<sup>2</sup>MOS divider output. The phase noise of the 12.5GHz clock output at 1MHz offset frequency is -120dBc/Hz. By

integrating the phase noise from 1kHz to 100MHz, the integrated RMS jitter is 62.97 fs, and the spur from the reference coupling is -62.33dBc. The PLL power consumes 17mW, corresponding to -251.7 dB FOM, which is comparable to state-of-the-art customized PLLs. Fig. 6 compares the pulse response without FFE and with FFE. By properly setting the 1tap and 2-tap post cursor FFE, the ISI is eliminated and the FFE improves the loss by 4.5dB at baud-rate frequency. Fig. 7 shows the measured eye-diagram of PRBS7 NRZ and PAM-4 patterns with FFE. The measured eye-heights and eye-widths of 200Gb/s PAM-4 data are 62/54/60mV and 3.5/2.7/3.3ps, respectively, with 96.7% level mismatches (RLM). The whole TX consumes 937mW (4.69pJ/b, 17mW in SSPLL, 348mW in clock distribution, and 566mW in data path). Table I compares the TX performance with other state-of-the-art designs. This design achieves the highest data-rate with an on-chip PLL in a standard planar CMOS process.

### References

- [1] J. Kim et al., ISSCC 2021, pp. 126-128.
- [2] M. Choi et al., ISSCC, 2021, pp. 128-130.
- [3] E. Chang *et al.*, *CICC* 2019.
- [4] J. Kim et al., ISSCC, 2018, pp. 102-103.
- [5] P. -J. Peng et al., ISSCC, 2020, pp. 130-131.
- [6] M. A. Kossel et al., ISSCC, 2021, pp. 130-132.
- [7] R. Yousry et al., ISSCC, 2021, pp 180-182.



Fig. 1 TX architecture.



Fig. 2 Structure of hybrid SSPLL and quadrature clock generation.



Fig. 3 Coarse and fine FFE scheme in a FFE driver segment.



|   | Block        | Layout      | Area (mm²)<br>0.1357 |  |  |
|---|--------------|-------------|----------------------|--|--|
| 1 | Patten Gen.  | Synthesized |                      |  |  |
| 2 | Data Path    | Generated   | 0.1292               |  |  |
| 3 | Bias Circuit | Generated   | 0.1646               |  |  |
| 4 | Clock Dis.   | Generated   | 0.0472               |  |  |
| 5 | PLL          | Generated   | 0.0766               |  |  |
| 6 | PLL Digital  | Generated   | 0.0084               |  |  |
|   | Тор          | Manual      | 0.5617               |  |  |

Fig. 4 Die micrograph.



Fig. 5 Phase noise and spectrum of hybrid SSPLL at the divider output.



Fig. 6 Pulse response and channel loss w/o and w/ FFE.



Fig. 7 Eye-diagram of NRZ pattern and PAM-4 pattern

TABLE I: PERFORMANCE SUMMARY AND COMPARISON

|                                   | ISSCC'21[1]         | ISSCC               | `21[2] | ISSCC'18[4]          |      | ISSCC'20[5]          | ISSCC'21[6]   | ISSCC'21[7]   | This work            |      |
|-----------------------------------|---------------------|---------------------|--------|----------------------|------|----------------------|---------------|---------------|----------------------|------|
| Technology                        | 10nm<br>FinFET      | 28nm<br>CMOS        |        | 10nm<br>FinFET       |      | 40nm<br>CMOS         | 7nm<br>FinFET | 7nm<br>FinFET | 28nm<br>CMOS         |      |
| Architecture                      | Quarter-rate        | Quarter-rate        |        | Quarter-rate         |      | Quarter-rate         | Quarter-rate  | Quarter-rate  | Quarter-rate         |      |
| Clock<br>Source                   | On-Chip             | External            |        | On-chip              |      | External             | External      | External      | On-chip              |      |
| Output<br>Swing w/o<br>FFE        | 1.0V <sub>ppd</sub> | 0.8V <sub>ppd</sub> |        | 0.75V <sub>ppd</sub> |      | 0.56V <sub>ppd</sub> | -             | -             | 0.75V <sub>ppd</sub> |      |
| FFE                               | 8-tap               | 5-tap               |        | 3-tap                |      | 8-tap                | 8-tap         | 5-tap         | 5-tap                |      |
| ESD                               | Yes                 | Yes                 |        | Yes                  |      | No                   | Yes           | Yes           | Yes                  |      |
| RJ (fs_rms)                       | 65<br>(4MHz CDR)    | 204                 |        | 150                  |      | -                    | -             | -             | 134                  |      |
| Signaling                         | PAM-4               | PAM-4               | NRZ    | PAM-4                | NRZ  | NRZ                  | PAM-4         | PAM-4         | PAM-4                | NRZ  |
| Data Rate<br>(Gb/s)               | 224                 | 180                 | 90     | 112                  | 56   | 100                  | 112           | 112           | 200                  | 100  |
| Efficiency<br>(pJ/bit)            | 2.25                | 4.59*               | 9.18*  | 1.72                 | 3.44 | 6.19*                | 1.40*         | 1.71*         | 4.69                 | 9.37 |
| Eye Height<br>(mV)                | 90                  | 53                  | 234    | 30                   | 260  | 73                   | 59            | 46            | 51                   | 270  |
| Active Area<br>(mm <sup>2</sup> ) | 0.088               | 0.432               |        | 0.0302               |      | 0.504                | 0.032         | 0.228         | 0.541                |      |

\*Excluding PLL