## Designing VLSI Interconnects with Monolithically Integrated Silicon-Photonics

### Vladimir Stojanović MIT





#### SSCS DL series – Santa Clara, CA, November, 2012

## Acknowledgments

- Rajeev Ram, Henry Smith, Hanqing Li (MIT), Milos Popović (Boulder), Krste Asanović (UC Berkeley)
- Jason Orcutt, Jeffrey Shainline, Christopher Batten, Ajay Joshi, Anatoly Khilo
- Karan Mehta, Mark Wade, Erman Timurdogan, Stevan Urosevic, Jie Sun, Cheryl Sorace, Josh Wang
- Michael Georgas, Jonathan Leu, Benjamin Moss, Chen Sun
- Yong-Jin Kwon, Scott Beamer, Yunsup Lee, Andrew Waterman, Miquel Planas
- DARPA, NSF and FCRP IFC
- IBM Trusted Foundry, Solid-State Circuits Society

## Chip design is going through a change

- Already have more devices than can use at once
- Limited by power density and bandwidth









Oracle T5 Nvidia Fermi 16 cores, 128 Threads 540 CUDA cores

IBM Power 7 8 cores, 32 threads

Intel Knights Corner 50 cores, 200 Threads





"The Processor is the new Transistor" [Rowen]

#### Bandwidth, pin count and power scaling



#### **Memory interface scaling problems: Energy-cost and bandwidth density**



Energy cost [p]/bit]

#### Power and pins required for 10TFlop/s



## Monolithic Si-Photonics for core-to-core and core-to-DRAM networks



Bandwidth density – need dense WDM Energy-efficiency – need monolithic integration

#### Integrated photonic interconnects



#### **Monolithic CMOS photonic integration**



Thin BOX SOI CMOS Electronics

**Bulk CMOS Electronics** 

#### Si and polySi waveguide formation





### Single channel link tradeoffs



#### **Resonance sensitivity**



- Process and temperature shift resonances
- Direct thermal tuning cost prohibitive

Georgas CICC 2011, Sun NOCS 2012

#### **Smarter wavelength tuning**



## Need to optimize carefully



- Laser energy increases with data-rate
  - -Limited Rx sensitivity

-Modulation more expensive -> lower extinction ratio

- Tuning costs decrease with data-rate
- Moderate data rates most energy-efficient

assuming 32nm CMOS

#### Georgas CICC 2011

Laser

Buffer

lock

## **DWDM link efficiency optimization**



#### **Optimize for min energy-cost**

## Bandwidth density dominated by circuit and photonics area (not coupler pitch)

- 10x better than electrical bump limited
- 200x better than electrical package pin limit

# Photonic memory interface – leveraging optical bandwidth density



#### Important Concepts

- Power/message switching (only to active DRAM chip in DRAM cube/super DIMM)

- Vertical die-to-die coupling (minimizes cabling - 8 dies per DRAM cube)

- -Command distributed electrically (broadcast)
- Data photonic (single writer multiple readers)

Enables energy-efficient throughput and capacity scaling per memory channel

#### Beamer ISCA 2010

#### **Laser Power Guiding Effectiveness**



Enables capacity scaling per channel and significant savings in laser energy Beamer ISCA 2010

## **Optimizing DRAM with photonics**

![](_page_17_Figure_1.jpeg)

### **Design Space Exploration of Networks Tool**

DSENT – A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks Modeling

![](_page_18_Figure_2.jpeg)

## Significant integration activity, but hybrid and older processes ...

![](_page_19_Picture_1.jpeg)

#### [Luxtera/Oracle/Kotura]

![](_page_19_Figure_3.jpeg)

[Intel]

![](_page_19_Picture_5.jpeg)

[IBM]

![](_page_19_Picture_7.jpeg)

[Watts/Sandia/MIT] [Lipson/Cornell]

[Kimerling/MIT]

![](_page_19_Figure_10.jpeg)

[HP]

# EOS Platform for Monolithic CMOS photonic integration 2011

![](_page_20_Figure_1.jpeg)

**65 nm bulk CMOS** Texas Instruments **Create integration platform to accelerate technology development and adoption** 

#### **EOS Platform: EOS8 fabricated in IBM12SOI**

![](_page_21_Figure_1.jpeg)

Orcutt et al, Optics Express, 2012

3 x 3 mm die

45nm Thin Box SOI Technology (used for Power 7 and Cell processors)

**3M Transistors** 

400 Pads

ARM Standard Cells and custom link circuits

#### **EOS8 performance summary**

![](_page_22_Figure_1.jpeg)

Fiber-to-chip grating couplers with 3.5 dB insertion loss

Waveguides under 4dB/cm propagation loss

10 dB extinction optical modulators

8 channel wavelength division multiplexing filter bank with <-20 dB cross talk

>20 GHz SiGe photodetectors

All integrated with electronic circuits

### Full integration of photonics into VLSI tools

![](_page_23_Figure_1.jpeg)

### **Platform Organization**

![](_page_24_Figure_1.jpeg)

#### **Chips fully packaged**

![](_page_25_Picture_1.jpeg)

# Best waveguide losses ever reported in a sub-100nm production CMOS line

- Body-Si waveguides
  3-4dB/cm loss
- Poly waveguides
  50dB/cm loss

- Body-Si ring Q factor
  - 227k @ 1280nm
  - 112k @ 1550nm

![](_page_26_Figure_6.jpeg)

#### **Exceptional dimensional control in 45nm node**

![](_page_27_Figure_1.jpeg)

- 8-wavelength filterbank results
  - Filter channels fabricated in order
  - Less than 1nm variation
- Excellent channel isolation (>20dB at 250GHz spacing) 28

### Integrated thermal tuning circuits

![](_page_28_Figure_1.jpeg)

![](_page_28_Figure_2.jpeg)

- 10mW required to retune all 8 rings
  - Negligible overhead of tuning circuits (thermal BW < 500kHz)</li>
  - Tuning efficiency 130uW/K (32.4mW/2π) fully substrate released chips

#### Low-power current-sensing optical receiver

![](_page_29_Figure_1.jpeg)

Georgas ESSCIRC 2011, JSSC 2012

## **Optical modulator design**

2

-16

-20

 $10 \mu m$ 

1534.0

1534.4

1534.8 Wavelength [nm]

#### Shainline, Popovic

![](_page_30_Figure_2.jpeg)

10µm

1535.2

1535.6

- Extinction ratio 19dB
- 45GHz 3dB optical bw ullet

#### at 1280nm

- Extinction ratio 9dB
- 60GHz 3dB optical bw

#### **Optical modulator – electrical tests**

![](_page_31_Picture_1.jpeg)

- Carrier-lifetime 2-3ns
- Diffusion time constant affected by
  - Recombination time
  - Drift conditions

![](_page_31_Figure_6.jpeg)

### First dynamic electro-optic test in 45nm SOI

![](_page_32_Figure_1.jpeg)

#### Memory interface scaling problems: **Energy-cost and bandwidth density**

![](_page_33_Figure_1.jpeg)

Energy cost [p]/bit]

#### Power and pins required for 10TFlop/s

![](_page_34_Figure_1.jpeg)

#### **Uncooled laser sources for system efficiency**

![](_page_35_Figure_1.jpeg)

## Laser Source Options (Uncooled)

- Multi-λ PIC
- FP Comb Source
- Binned DFB Bars
- Injection-Locked FP

#### $\lambda = 1.2$ -1.3 $\mu m$ Target

- Lower Laser Threshold
- Higher Published Efficiency
- Uncooled MQW Operation
- Quantum Dot Gain Media
- Larger Resonator FSR
- Smaller Optical Components

# Laser reliability – Si-photonics needs fewer lasers than VCSEL links

 $FIT = \frac{\# Failures}{\# Devices} \cdot \frac{1 \times 10^9 Hours}{Hours of Operation}$ 

Mean Time Between  $\frac{1 \times 10^9 \text{ Hours}}{FIT}$ 

#### **VCSEL Laser Reliability Concerns**

- Finisar 10Gb study = 2.3 FIT
- Linear data rate increases cause super-linear reliability reductions
- 100 Tbps = 10,000 VCSELs
- MTBF = 2.3 years
- Intel MoBo MTBF = 19-24 years (2009-2011 Server Data)

IBM's Blue Waters required 1M VCSELs: Expected MTBF = 18 days

#### Silicon Photonics Reliability Overview

- Laser power is split for many links
- CW laser operation eliminates overdrive reliability degradation
- CyOptics 1310nm uncooled DFBs <15 FIT (200B field hour 0°C-85°C) including direct-mod. operation
- 100 Tbps = 64 DFBs (1 laser per  $\lambda$ )
- MTBF @ 15 FIT/laser = 120 Years  $\lambda$ =0.98µm Pump Laser Reliability

![](_page_36_Figure_16.jpeg)

## Packaging

- CPU package
  - Flip-chip <5um C4 tolerance o.k. for coupling
- DRAM package
  - Die on board
  - Connector-to-fiber alignment <2um</li>

![](_page_37_Figure_6.jpeg)

## Summary

- Silicon-photonics can push both critical dimensions
  - Energy-efficiency monolithic integration
  - Bandwidth Density dense WDM
- Need to optimize across layers
  - Connect devices to circuits, and links to networks
- Building early technology development platforms
  - Feedback to device and circuit designers
  - Accelerated adoption
- EOS Platform designed for multi-project wafer runs
  - Best end-of-line passives in sub-100nm process (3-4dB/cm loss)
  - 50 fJ/b receivers with uA sensitivities
  - Record-high tuning efficiency with undercut ~ 25uW/K
  - First modulation demonstrated in 45nm process