









### A Scalable In-Context Design and Extraction Flow for Heterogeneous 2.5D Chiplet-Package Co-Optimization

MD Arafat Kabir<sup>1</sup>, Dusan Petranovic<sup>2</sup>, Yarui Peng<sup>1</sup> University of Arkansas, Fayetteville, AR, US<sup>1</sup> Siemens EDA, Fremont, CA, US<sup>2</sup>





### Introduction



#### Package becomes increasingly critical in post-Moore's Law era

- Transistor scaling is saturated, and chips are reaching reticle limit.
- 2.5D and 3D packages provide high bandwidth and compact size.
- Novel design techniques like plug-and-play, Drop-in, Hardware security
- Heterogeneous integration capabilities (AMD EPYC family, Intel Lakefield)
- Supports large systems with tens of Known-Good-Dies (AMD's EPYC 7532)





\*From public domain



### Introduction



#### Need for a cross-boundary package-aware design strategy

- In high-density packages, interactions between the package and chiplets are significant.
- These interactions affect overall system performance.
- No existing standard flow can design heterogeneous 2.5D systems with high extraction and analysis accuracy, and scalable at the same time.

### Objectives

- An accurate and scalable extraction and optimization strategy for heterogeneous 2.5D systems
- Comparative study with existing flows for accuracy and scalability





\*From public domain



### **Need for a Cross-Boundary Flow**



#### Need for a cross-boundary design strategy

- Traditional "die-by-die" flow treats each chiplet separately
- There exists significant coupling between chiplet and package in high density 2.5D packages [1,2]
- These interaction can be used in the optimization process to reduce package overhead by 60%-80% [1]
- Cross-boundary analysis is required to ensure system reliability and accurately predict the final performance.

[1] MD Arafat Kabir, and Yarui Peng, "Holistic Chiplet-Package Co-Optimization for Agile Custom 2.5D Design", IEEE Transactions on Components, Packaging, and Manufacturing Technology, vol. 11, no. 5, pp. 715–726, 2021.

[2] Wang, Chuei-Tang, Jeng-Shien Hsieh, et al. "Signal integrity of submicron InFO heterogeneous integration for high performance computing applications." In 2019 IEEE 69th Electronic Components and Technology Conference, pp. 688-694. IEEE, 2019.





### **Existing Flows: Holistic [1]**



#### Holistic flow [1] performs extraction and analysis on the entire system together

- Provides the most accurate view of the entire system.
- Can capture all interactions between all components.
- Can perform system-level optimization through iterations.

### Limitations,

- Cannot handle heterogeneous technologies using existing industry standard tools
- Not scalable: Too much complexity for a very large system
- IP sharing is an essential part





### **Existing Flows: In-Context [3]**



#### Breaks down the package into regions (contexts) around chiplets and creates an extended partition for each chiplet

- Part of the package around a chiplet is separated.
- Extraction is performed on each context and stitched later for analysis.
- Takes advantage of divide-and-conquer: scalable for a large system.

### 🗆 Limitations,

- Each context is not aware of other (even neighboring) parts of the package.
- Highly over-estimates the ground capacitances on RDLs [3] (more than 20%) due to fringe-caps at the cutting edges.

[3] MD Arafat Kabir, Dusan Petranovic, and Yarui Peng, "Coupling Extraction and Optimization for Heterogeneous 2.5D Chiplet-Package Co-Design", in Proc. International Conference on Computer-Aided Design, pp. 1–8, Nov 2020.





### **Our Proposed In-Context Flow**



# Takes advantage of the divide-and-conquer approach and improves accuracy through careful in-context extraction,

- Perform holistic planning and budgeting (in-house tool).
- Define package contexts and initial plans for all chiplets.
- Implement the package and chiplets through physical design.
- Use the package and chiplet physical designs to create incontext extraction setup for each chiplets (an elaborate step).
- Perform in-context extraction on each chiplet.
- Perform some hierarchy adjustment of parasitic netlists for stitching.
- Perform analysis and verification:
  - On the chiplet context using the in-context parasitic netlist,
  - On the entire-system using the stitched parasitic netlist.
- Perform iterative optimization
- Sign-off verifications



## Layout Reconstruction: In-Context Ext. Setup



This step generates the design files required to define the package context for the extraction tool and perform extraction within the context of a given chiplet,



0



### Layout Reconstruction: In-Context Ext. Setup



Layout

Reconstruction

#### This step generates the design files required to define the package context for the extraction tool and perform extraction within the context of a given chiplet,

- Extraction is performed on the full-in-context design; other chiplets are black-boxes
- The coupling between *in-context design* and *extraction environment* wires are converted to ground caps for the *in-context design* wire segment.







### **Scalability Features**



# The proposed flow offers scalability through divide-and-conquer, while maintaining system-level analysis accuracy.

- A per-chiplet context reduces design complexity.
- Cross-boundary analysis and iterative optimizations can be performed at the context level.
- Several design houses can collaborate, without revealing IP details.
- System-level holistic view can be created to perform full-system analysis, optimization, and verifications.



Proposed In-Context Flow





#### **ARM Cortex-M0 based micro-controller system**

- Consists of an ARM Cortex-M0 core, 16 KB memory, and some common peripheral devices
- Two-chiplet system: Core and Memory
- The 16 KB memory is divided into two parts, 8 KB each.



System architecture and chiplet partitions





UNIVERSITY OF

ARKAI



### **Technology Settings**



#### □ We use two modified versions of Nangate45nm as our PDK

- 7M3R: 7 metal layers used for chiplet routing
- 6M3R: 6 metal layers used for chiplet routing

### □ Top three layers for package RDLs

• Dimensions are similar to the TSMC 2.5D InFO technology

|           | M6   | via6 | M7  | via7 | RDL1 | viar1 | RDL2 | viar2 | RDL3 |
|-----------|------|------|-----|------|------|-------|------|-------|------|
| Height    | 2.28 | 3.08 | 3.9 | 7.5  | 12.5 | 17.5  | 22.5 | 27.5  | 32.5 |
| Thickness | 0.8  | 0.82 | 3.6 | 5    | 5    | 5     | 5    | 5     | 5    |
| Width     | 0.4  | 0.4  | 2   | 5    | 10   | 10    | 10   | 10    | 10   |
| Spacing   | 0.4  | 0.44 | 2   | 10   | 10   | 20    | 10   | 20    | 10   |





A Scalable In-Context Design and Extraction Flow for Heterogeneous 2.5D Chiplet-Package Co-Optimization

UNIVERSITY O



### **Designs for Comparative Study**



#### □ Three versions of the system are implementation for comparative study.

- Two homogeneous implementations with 7M3R and Nangate45 cell library
  - In holistic flowIn our proposed In-Context flow
- A (pseudo-)heterogeneous implementation with 7M3R and 6M3R using cells from Nangate45 and FreePDK45 cell libraries.







#### □ We perform in-context extraction on the homogeneous design for comparative study.

- Coupling capacitance (CCAP) between the package and chiplets are preserved with holistic-like accuracy
- The old flow over-estimates the RDL total capacitance up to 4.5% per layer
- Our new flow corrects it within 1.7% per layer

|           | Metal Layer | M1-M5 | M6    | M7     | R1    | <b>R2</b> | <b>R3</b> |
|-----------|-------------|-------|-------|--------|-------|-----------|-----------|
| P         | Holi        | 9275  | 1172  | 196    | 1529  | 2441      | 1685      |
| CCA       | In-C Old    | 9346  | 1181  | 188    | 1564  | 2478      | 1690      |
|           | In-C New    | 8992  | 1203  | 193    | 1517  | 2390      | 1640      |
| Total CAP | Holi        | 31056 | 3307  | 498    | 2547  | 2669      | 2209      |
|           | In-C Old    | 31140 | 3324  | 489    | 2661  | 2749      | 2251      |
|           | Old Err%    | 0.27% | 0.51% | -1.79% | 4.49% | 3.01%     | 1.91%     |
|           | In-C New    | 31238 | 3350  | 495    | 2591  | 2654      | 2192      |
|           | New Err%    | 0.59% | 1.31% | -0.59% | 1.74% | -0.55%    | -0.76%    |



## In-Context Extraction Per-Net Comparison



#### □ The accuracy improvement is more evident on per-net comparison.

The previous flow [3] has almost all 100 nets over-estimated, error up to 7%.
The proposed flow achieves holistic-like accuracy, with 1% error margin.



 $\mathbf{Q}$ 



### **Iterative Optimization Results**



# The iterative optimization results using the proposed flow very closely match with the holistic flow optimization results.

• The heterogeneous 45nm design is comparable with the homogeneous design, with slight difference due to multiple libraries used

| e           | Design              | Homog    | Heterogeneous         |            |  |
|-------------|---------------------|----------|-----------------------|------------|--|
| anc         | Iteration           | Holistic | In-Context            | (New Flow) |  |
| Performance | Initial             | 288 MHz  | 287 MHz               | 278 MHz    |  |
|             | 1st iteration       | 293 MHz  | 290 MHz               | 294 MHz    |  |
|             | 2nd/final iteration | 300 MHz  | 300 MHz               | 300 MHz    |  |
| Power       | Power Group         | Holistic | In-Context (New Flow) |            |  |
|             | Wire                | 4.35 mW  | 4.37 mW               | 4.21 mW    |  |
|             | Cell                | 6.39 mW  | 6.36 mW               | 6.20 mW    |  |
|             | Total               | 10.74 mW | 10.73 mW              | 10.41 mW   |  |







#### 

- Chiplet-Package interactions need to be considered in 2.5D systems.
- Our flow can handle heterogeneous systems and effectively captures interactions between package and chiplet designs for holistic planning and optimization.
- Unlike existing in-context flows, it provides both accuracy and scalability features for cross-boundary analysis and optimization.
- Our flow enables large scale 2.5D system design through collaboration of several design houses, maintaining IP protection and parallelism in the design process.

### Future Work

- Preserving the coupling with out-of-context package wires
- Cross-boundary RCLM extraction and study of their impacts
- Study of timing, signal and power integrity with full RCLM models
- System-level performance and SI-aware package design





10/18/202

1





## Thank You

Do you have any questions?



