Chip Architect: VLSI symposia abstracts

April 16, 2002: VLSI symposia abstracts

Abstracts from the Intel presentations.

2002 VLSI Symposia

A 4GHz 130nm Address Generation Unit with 32-bit Sparse-tree Adder Core

Sanu Mathew, Mark Anders, Ram K. Krishnamurthy and Shekhar Borkar

Circuits Research, Intel Labs, Intel Corporation, Hillsboro, OR 97124, USA, sanu.k.mathew@intel.com

This paper describes a 32-bit Address Generation Unit (AGU) designed for 4GHz operation in 1.2V, 130nm technology. The AGU utilizes a 152ps dual-Vt sparse-tree adder core to achieve 20% delay reduction, 80% lower interconnect density and a low (1%) active energy leakage component. The semi-dynamic implementation enables an average energy profile similar to static CMOS, with good sub-130nm scaling trend.

Dual Supply Voltage Clocking for 5GHz 130nm Integer Execution Core

Ram K. Krishnamurthy, Steven Hsu, Mark Anders, Brad Bloechel, Bhaskar Chatterjee*, Manoj Sachdev*, Shekhar Borkar Circuits Research, Intel Labs, Intel Corporation, Hillsboro, OR 97124, USA, ramk@ichips.intel.com *University of Waterloo, Ontario N2L3G1, Canada, bhaskar@vlsi.uwaterloo.ca

This paper describes dual-Vcc clocking on a 1.2V, 5GHz integer execution core fabricated in 130nm CMOS to achieve up to 71% measured clock power (including 15% active leakage) reduction. A write-port style pass-transistor latch and split-output level-converting local clock buffer are described for robust, DC power free low-Vcc clock operation.

A 4.5GHz 130nm 32KB L0 Cache with a Self Reverse Bias Scheme

Steven K. Hsu, Atila Alvandpour, Sanu Mathew, Shih-Lien Lu, Ram K. Krishnamurthy, Shekhar Borkar

Circuits Research, Intel Labs, Intel Corporation, Hillsboro, OR 97124, USAsteven.k.hsu@intel.com

This paper describes a 32KB dual-ported L0 cache for 4.5GHz operation in 1.2V, 130nm CMOS. The local bitline uses a Self Reverse Bias scheme to achieve ?220mV access transistor underdrive without external bias voltage or gate-oxide overstress. 11% faster read delay and 104% higher DC robustness (including 7x measured active leakage reduction) is achieved over optimized high-performance dual-Vt scheme.

Designing a 3GHz, 130nm, Intel® Pentium®4 Processor

Daniel Deleganes, Jonathon Douglas, Badari Kommandur, Marek Patyra Intel Architecture Group, 2501 NW 229 th Ave. MS RA2-401 Hillsboro, OR, 97124, USA

The design of an IA32 processor fabricated on state-of-the art 130nm CMOS process with improved six layers of dual-damascene copper metallization is described. Engineering an IA32 processor for server, desktop, and mobile platforms, particularly meeting diverse power & thermal constraints, poses numerous challenges. This presentation focuses on methods applied to achieve high frequency and low power on the same chip, particularly, the use of Dual Vt process, clock skew design, and thermal management techniques.

Forward Body Bias for Microprocessors in 130nm Technology Generation and Beyond

Ali Keshavarzi, Siva Narendra, Bradley Bloechel, Shekhar Borkar and Vivek De Microprocessor Research, Intel Labs, Hillsboro, OR, USA

Device and testchip measurements show that forward body bias (FBB) can be used effectively to improve performance and reduce complexity of a 130nm dual-VT technology, reduce leakage power during burn-in and standby, improve circuit delay and robustness, and reduce active power. FBB allows performance advantages of low temperature operation to be realized fully without requiring transistor redesign, and also improves VT variations, mismatch, and gm x ro product.

A 6GHz, 16Kbytes L1 Cache in a 100nm Dual-VT Technology Using a Bitline Leakage Reduction (BLR) Technique

Yibin Ye, Muhammad Khellah, Dinesh Somasekhar, Ali Farhang and Vivek De Microprocessor Research, Intel Labs, Hillsboro, OR, USA

A L1 cache testchip with dual-VT cell and a bitline leakage reduction (BLR) technique has been implemented in a 100nm dual-VT technology. Area of a 2KBytes array is 263.m X 204.m, which is virtually the same as the best conventional design with high-VT cell. BLR eliminates impacts of bitline leakage on performance and noise margin with minimal area overhead. Bitline delay improves by 23%, thus enabling 6GHz operation. Energy consumption per cycle is 15% higher.

A Leakage-Tolerant Dynamic Register File Using Leakage Bypass with Stack Forcing (LBSF) and Source Follower NMOS (SFN) Techniques

Stephen Tang, Steven Hsu, Yibin Ye, James Tschanz, Dinesh Somasekhar, Siva Narendra, Shih-Lien Lu, Ram Krishnamurthy and Vivek De Microprocessor Research, Intel Labs, Hillsboro, OR, USA

LBSF and SFN leakage-tolerant techniques improve robustness of leakage-sensitive and performance-critical wide dynamic circuits in the local and global bitlines of a 256X32b register file in a 100nm dual-VT technology. The full LBSF design improves clock frequency by 50% or reduces energy by 37%, compared to the best dual-VT (DVT) design. Performance advantages of LBSF and SFN become more significant as leakage increases.

Four-Way Processor 800 MT/s Front Side Bus with Ground Referenced Voltage Source I/O

Thomas P. Thomas, Ian A. Young Intel Corporation Portland Technology Development RA1-309, 5200 NE Elam Young Parkway Hillsboro OR 97124, USA

A 40cm multi-drop bus shared by 5 test chips to emulate 4 processors and a chipset runs error free at 800MT/s with 130mV margin using Ground Referenced Voltage Source (GRVS) I/O scheme. For comparison, when the same test chip is programmed to use Gunning Transceiver Logic (GTL), the bus speed is 500 MT/s for the same 130mV margin under identical conditions.

Static Pulsed Bus for On-Chip Interconnects

Muhammad Khellah, James Tschanz, Yibin Ye, Siva Narendra and Vivek De Circuit Research, Intel Labs, Hillsboro, OR, USA

A Static Pulsed Bus (SPB) technique offers significant advantages over conventional static bus (SB) in delay, energy, total device width and peak VCC current for 1500mm to 4500mm long M4 buses in a 100nm technology. These improvements are due to reduction in effective coupling capacitance and repeater skewing enabled by monotonic signal transition. Unlike dynamic schemes, energy savings of SPB are maintained across all activity factors without any clock power or routing overhead.

A Transition-Encoded Dynamic Bus Technique for High-Performance Interconnects

Mark Anders, Nivruti Rai*, Ram Krishnamurthy, Shekhar Borkar Circuit Research, Intel Labs Intel Corporation, Hillsboro, OR 97124, USA mark.a.anders@intel.com *Desktop Products Group Intel Corporation, Hillsboro, OR 97124, USA nivruti.rai@intel.com

A transition-encoded dynamic bus technique enables interconnect delay reduction while maintaining the robustness and switching energy behavior of a static bus. Efficient circuits, designed for a drop-in replacement, enable significant delay and peak-current reduction even for short buses, while obtaining energy savings at aggressive delay targets. In a 180nm 32-bit microprocessor, 79% of all global buses exhibit 10%-35% performance improvement.

An Accurate and Efficient Analysis Method for Multi-Gb/s Chip-to-chip Signaling Schemes

Bryan K. Casper, Matthew Haycock, Randy Mooney Circuit Research, Intel Labs bryan.k.casper@intel.com Hillsboro, OR

This paper introduces an accurate method of modeling the performance of high-speed chip-to-chip signaling systems. Implemented in a simulation tool, it precisely accounts for intersymbol interference,

cross-talk and echos as well as circuit related effects such as thermal noise, power supply noise and

receiver jitter. We correlated the simulation tool to actual measurements of a high-speed signaling system

and then used this tool to make tradeoffs between different methods of chip-to-chip signaling with and

without equalization.

We present a technique to enable the integration of sensitive analog circuits with a high performance microprocessor (Pentium . 4), on a lossy substrate.

We show that by exploiting the spectral content of substrate noise, and the use appropriately tuned analog amplification it is possible to limit the isolation requirements to 70dB. By using a combination of measurement and field solver results, we show that a minimal process enhancement (i.e. a deep nwell) will yield 50 dB of isolation, and the remainder can be achieved by layout and differential circuit techniques.

Selective Node Engineering for Chip-Level Soft Error Rate Improvement

Tanay Karnik, Sriram Vangal, V. Veeramachaneni, Peter Hazucha, Vasantha Erraguntla, Shekhar Borkar Circuit Research, Intel Labs, Hillsboro, OR, U.S.A.

This paper presents a technique to selectively engineer sequential or domino nodes in high performance circuits to improve soft error rate (SER) induced by cosmic rays or alpha particles. In 0.18 µm process, the SER improvement is as much as 3X at the cell-level, 1.8X at the block- level and 1.3X at the chip-level without any penalty in performance or area, and <3% power penalty. The node selection, hardening and SER quantification steps are fully automated.

Design Optimizations of a High Performance Microprocessor Using Combinations of Dual-VT Allocation and Transistor Sizing

James Tschanz, Yibin Ye, Liqiong Wei 1 , Venkatesh Govindarajulu, Nitin Borkar, Steven Burns 2 , Tanay Karnik, Shekhar Borkar and Vivek De Microprocessor Research, 1 Mobile Architecture, 2 Strategic CAD, Intel Labs Hillsboro, OR, USA

Joint optimizations of dual-VT allocation and transistor sizing for a high performance microprocessor reduce low-VT usage by 36%-64%, compared to a design where only dual-VT allocation is optimized. Designs optimized for minimum power (DVT+S) and minimum area (L-SDVT) reduce leakage power by 20%, with minimal impact on total power and die area. An enhancement of the optimum DVT+S design allows processor frequency to be increased efficiently during manufacturing through low-VT device leakage push only.

Design & Validation of the Pentium® III and Pentium® 4 Processors Power Delivery

Tawfik Rahal-Arabi, Greg Taylor, Matthew Ma, and Clair Webb Intel Corporation / Logic Technology Development 5200 NE ElamYoung Parkway Hillsboro, Oregon, 97124 Email: Tawfik.r.Arabi@intel.com

In this paper, we present an empirical approach for the validation of the power supply impedance model. The model is widely used to design the power delivery for high performance systems. For this purpose, several silicon wafers of the Pentium ® III and Pentium ® 4 processors were built with various amount of decoupling. The measured data showed significant discrepancies with the model predictions and provided useful insights in investigating the model regions of validity.

Effectiveness of Adaptive Supply Voltage and Body Bias for Reducing Impact of Parameter Variations in Low Power and High Performance Microprocessors

James Tschanz, James Kao 1 , Siva Narendra, Raj Nair and Vivek De Microprocessor Research, Intel Labs, Hillsboro, OR, USA 1 Massachusetts Institute of Technology

Testchip measurements show that adaptive VCC is useful for reducing impacts of die-to-die and WID parameter variations on frequency, active power and leakage power distributions of both low power and high performance microprocessors. Using adaptive VCC together with adaptive VBS or WID-VBS is much more effective than using any of them individually. Adaptive VCC+WID-VBS increases the number of dies accepted in the highest two frequency bins to 80%

HOME

***