

Fraunhofer Institute for Reliability and Microintegration IZM

Keynote

Wafer Scale Integration for High Performance Computing

Dr. Frank Windrich
Deputy Head of Fraunhofer IZM-ASSID

**Electronic Packaging Days 2025** 

### Wafer Scale Integration for High Performance Computing

OUTLINE

1 Future prospects and technological challenges

2 Wafer scale integration introduction

Wafer scale integration – industry examples

Wafer Scale Integration – how to contribute

#### Future prospects and technological challenges

Al Driving Hyper-Exponential Demand for High-Performance Compute



FLOPS + Memory Wall + I/O bandwidth gap

Amir Gholami, DOI: 10.1109/MM.2024.3373763

### AMDA



Massive compute demand → more data centers → more power required → power generation and grid limits will set a ceiling in growth



#### Future prospects and technological challenges

#### Al Driving Hyper-Exponential Demand for High-Performance Compute

#### THE EVOLUTION OF NVIDIA CHIP TDP1 & COOLING REQUIREMENTS (WATT)





Source: Nvidia, Altman Solon



<sup>1</sup> Thermal Design Power: maximum amount of heat generated by the GPU that the cooling system is designed to dissipate under typical load conditions. It provides an estimate of the power consumption of the GPU under normal workload

#### What is it?

Instead of using single individual chips in a chiplet

based system, the entire wafer is treated as one single

system.

A long history back to the 80s

GENERATE CITATIONS

IFERENCE ICEEDINGS

#### 1991 International Conference on Wafer Scale Integration

Jan. 29 1991 to Jan. 31 1991

Table of Contents

# WSI stands for wafer-scale-integration and packaged, the entire wafer is a single device. Such technology

has been explored and developed in the past since the 1970s, and the continual failures have led to many projects being scrapped, or forgotten. Even Clive Sinclair, the inventor of the ZX computer line (such as the ZX Spectrum), explored the idea of wafer-scale-integration as an alternative to individual dies.

Microelectronics Journal

Volume 19, Issue 2, March–April 1988, Pages 4-35



Moores Law proceed by scaling for 40 years review

WAFER-SCALE INTEGRATION OF

Early commercial attempts in the 1980s failed and start

ups were abandoned by the industry for decades

Get rights and content ↗

But, the death of Carbon Machine Car

Hiroaki Kitano and Moritoshi Yasunagat

Center for Machine Translation

Carnega All 2 3 John Ohiroaki@cs.cmu.edu, myasunag@cs.cmu.edu

The ever-increasing size and complexity of integrated circuit devices seems to leadinevitably to the ultimate "chip", occupying the area of a whole wafer. This concept, which has been studied for many each, warns implications in terms of technology and

economically realised. This paper reviews the history of WSI development and describes the current state of the art, together with as-yet unresolved problems.

Multi Core SoC integration came up



# **Transistor performance improvements are slowing**

Compute performance is bound by thermal limitations, nearby memory, data bandwidth and latency







- Bandwidth and latency between chiplets gets limiting
- Each chiplet hop adds communication energy consumption
- I/O to compute performance gap



The competitive advantage of a chip company increasingly depends on its packaging capabilities.

Packaging Defines Performance



Leading edge industry examples for high performance compute systems for AI

Aug 03, 2022 Computer History Museum Honors Cerebras Systems





Everyone said wafer-scale computing was impossible - too big, too hot, too risky. We did it anyway.

In 2015, every AI researcher said the same thing: "The hardware's holding us back."

Leading edge industry examples for high performance compute systems for AI

84 chips packed as a wafer-scale system ~46000 mm<sup>2</sup> area

#### **High-speed inference and AI training**





MONOLITHIC SILICON







AMD, MI300





NVIDIA, B200 ~1600 mm<sup>2</sup> Si

Tesla Dojo:

25x D1 chips packed as a waferscale like system training tile ~16000 mm<sup>2</sup> (re-configured FO system)







Source: Tesla AI day 2021

Cerebras Wafer-Scale-Engine (WSE)



|                               | WSE-3          | WSE-2                  | WSE-1            | B200 GPU                        |
|-------------------------------|----------------|------------------------|------------------|---------------------------------|
| Transistors #                 | 4<br>Trillions | 2.6<br>Trillions       | 1.2<br>Trillions | 208<br>Billions                 |
| Cores                         | 900,000        | 850,000                | 400,000          | 16,896<br>CUDA                  |
| On-chip memory                | 44 GB          | 40GB                   | 18GB             | HBM3E<br>192 GB<br>memory       |
| Memory bandwidth              | 21 PB/s        | 20 PB/s                | 9 PB/s           | 8 TB/s                          |
| Fabric bandwidth              | 214 Pbit/s     | 220 Pbit/s             | 100 Pbit/s       | NVLink 5<br>1.8 TB/s<br>per GPU |
| Fabrication process<br>(TSMC) | 5nm            | 7nm                    | 16nm             | 4NP                             |
| Year introduction             | 2024           | 2021                   | 2019             | 2024                            |
| Size                          |                | 46,225 mm <sup>2</sup> |                  | 1,600 mm <sup>2</sup>           |



#### CS-3 Chassie





### Wafer scale integration – industry examples

Cerebras Wafer-Scale-Engine (WSE)



Wafer Scale Engine (WSE)



Engine block



Source: Cerebras Systems

Cerebras Wafer-Scale Data Centers







Cerebras Data Center Dec, 2025

**SPEED matters** 

There are two kinds of inference:

Batch jobs: these are workloads where speed doesn't matter. If you're running a job to generate 10 billion tokens of synthetic data, you don't care if it takes two days instead of one. You just care that it's cheap.

Interactive inference: these workloads are dependent on speed. Code generation, chat, copilots, search - have humans waiting on the other side of a screen.

If you're waiting for an answer, five minutes is game over.

→ For REASONING and AGENTIC AI workloads, milliseconds matter.



Grok 🧼 🗖 @grok · May 6

Not everyone's that slow

Ø ...

**SPEED matters** 

#### **Speed unlocks new business models for AI**

This was true for the internet 25 years ago and it's just as true for AI.

When the internet was slow, Netflix mailed DVDs in envelopes, today it's a movie studio.



3D Co-packaged silicon photonics - Passage™ by Lightmatter





Source: Lightmatter 04/2025



3D Co-packaged silicon photonics - Passage™ by Lightmatter



- Built-in solid state optical circuit switching
- Cross-reticle stitching
- Integrated transistor and photonics control technology to work with custom XPUs.

Source: Lightmatter

Wafer-scale silicon photonics interposer - Passage<sup>TM</sup> by Lightmatter

NOT TO SCALE



Uniform architecture allows flexible dicing based upon end application



Inter-reticle optical die to die communication

8

Die to wafer packaging







Source: Lightmatter



3D Co-packaged silicon photonics - Passage<sup>™</sup> by Lightmatter







Photonic waveguides with ~4 µm pitch.

#### Worlds 1st wafer scale programmable photonic interconnect fabric

- Enables low power (<2.3 pJ/bit) and latency (<5 ns) for data communication between custom compute ASICs
- Enables very high bandwidth for communication (~114 Tbps for 2x4 passage tiles)
- Leverages 3D D2W approaches and kind of wafer scale integration

#### Passage™ Alpha Silicon

- <50 Watts
- 32 channels per site, 1.024 Tbps
- 32 Gbps per channel NRZ
- 48 x 800mm<sup>2</sup> tiles
- 288x 50 mW Lasers
- 6,144 DACs
- 6,144 MZIs
- 150,000 photonic components
- JTAG interface
- Integrated Lasers, transistors, photonics
- Programmable interconnect topologies

#### Passage





Source: Lightmatter



### Wafer scale integration

Hyperscaling connectivity



Communication happens at the chip perimeter, but there is not enough shore line

FLOPS  $\propto$  Chip Area ( $L^2$ ) I/O Bandwidth  $\propto$  Chip Perimeter (L)



I/O to Compute Performance Gap



**Explore the 3rd dimension** 

### Wafer scale integration – how to contribute

Hyperscaling connectivity

#### How to overcome the I/O to compute performance gap → 3D wafer-on-wafer packaging

- 1. Think the entire system
- 2. Provide on-silicon inter-reticle connectivity on large areas
- 3. Provide solutions to integrate power delivery
- 5. Provide solutions to enable 3D co-packaged optics
- 6. Provide solutions to integrate cooling
- 7. Provide solutions to extend memory
- 8. Next technological hurdle ....

- → STCO: system technology co-optimization
- → Advanced i-line lithography for 3D-packaging
- → 3D power delivery (e.g. by TSVs, eDTC, elVR)
- → explore 3<sup>rd</sup> dimension with 3D silicon photonics
- → explore 3<sup>rd</sup> dimension for 3D cooling in the stack
- → explore 3<sup>rd</sup> dimension and stack memory wafers
- → use advanced memory technologies



## Wafer scale integration – how to contribute

300mm wafer-level building blocks

#### → 3D wafer-level processing

- 3D quasi-monolithic integration 3D-QMI
- Large area advanced-line lithography (sub µm scale)
- Through silicon via integration (TSV mid, TSV last)
- Double sided process
- Deposition technologies CVD, PVD, ALD

#### → Advanced wafer thinning, pre-assembly and singulation

- <10 µm thin wafer and thin die processing</p>
- BEOL layer transfer technology
- Bonding / De-Bonding
- Stress free chiplet singulation

#### → Multi die assembly at state of the art pitches

- D2W hybrid bonding sub-μm accuracy
- D2W re-configured wafer for 3D-QMI
- Damascene processing
- Mix-pitch assembly with mix interconnect technologies









#### → In-line metrology and test

- Defect inspection at CDs <1 μm</li>
- CD and OVL characterization at <1 μm</li>
- Planarity, material properties
- Electrical characterization

#### → 3D Wafer-Stacking

- Compute / Memory / Power delivery wafer stacks
- Hybrid bonding W2W <200nm accuracy</li>
- Integrated cooling
- Optical I/O communication









#### **Conclusion**

Andrew Feldman, Founder and CEO - Cerebras Systems

"Progress doesn't come from safety rails."

It comes from putting sharp tools in people's hands - and trusting them to use them."

Fraunhofer IZM can help with using very sharp tools for 3D over the entire system value chain from 200/300mm wafer to advanced substrates





#### **Contact**



Deputy head of IZM-ASSID
Wafer Level System Integration (WLSI)

Phone: +49 351 795572 - 49

E-Mail: <u>frank.windrich@assid.izm.fraunhofer.de</u>



Gustav-Meyer-Allee 25 13355 Berlin Germany +49 30 46403-100



#### Fraunhofer IZM-ASSID

Ringstraße 12 01468 Dresden-Moritzburg Germany +49 351 795572-12



Fraunhofer Institute for Reliability and Microintegration IZM



#### Fraunhofer IZM Außenstelle Cottbus

Karl-Marx-Straße 69 03044 Cottbus Germany +49 355 383 770-12



