





## A Serial Comparator Consists of Three Parts

Let's analyze the area of a serial comparator.

We have:

- one bit slice,
- ° two flip-flops, and
- two 2-input NOR gates (selection logic).

## A Serial Comparator Contains One Bit Slice

Assume the smaller version of the bit slice. So we need **six 2-input NAND gates** and **two inverters**.



ECE 120: Introduction to Computing



| A Serial Comparator Consists of Three Parts                                                                                                                                                                            | Serial Design is Smaller for $N \ge 4$                                                                                                                                                                                                                 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Let's analyze the area of a serial comparator.<br>We have:<br>• one bit slice,<br>• two flip-flops, and<br>• two 2-input NOR<br>gates (selection logic).<br>Total: 6+16+2 = 24 2-input gates and<br>2+4 = 6 inverters. | To handle <b>N-bit</b> operands,<br>a bit-sliced design requires:<br>• 6N 2-input gates, and<br>• 2N inverters.<br>A serial design (independent of N) requires<br>• 24 2-input gates, and<br>• 6 inverters.<br>The serial design is smaller for N ≥ 4. |
| ECE 120: Introduction to Computing © 2016 Steven S. Lumetta. All rights reserved. slide 11                                                                                                                             | ECE 120: Introduction to Computing © 2016 Steven S. Lumetta. All rights reserved. slide 12                                                                                                                                                             |

| <ul> <li>Serial Designs are Slower than Bit-Sliced Designs</li> <li>The tradeoff? Serial designs are slower than bit-sliced designs.</li> <li>Why?</li> <li>There are three reasons: <ol> <li>All paths matter.</li> <li>Selection logic and flip-flops add to delay.</li> <li>Other logic may further reduce the speed of the common clock.</li> </ol> </li> <li>Let's look at each in more detail.</li> </ul> | <ul> <li>All Paths Matter in a Serial Design</li> <li>In an N-bit bit-sliced design, <ul> <li>All external inputs appear at time 0,</li> <li>So only the slice-to-slice paths in the bit slice contribute to the multiplier on N.</li> <li>Other paths contribute only constant time to the overall delay in the design.</li> </ul> </li> <li>In a serial design, all paths matter. <ul> <li>All input bits arrive in the cycle in which they are consumed, so</li> <li>long paths from any input can slow down the design overall.</li> </ul> </li> </ul> |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ECE 120: Introduction to Computing © 2016 Steven S. Lumetta. All rights reserved. slide 13                                                                                                                                                                                                                                                                                                                      | ECE 120: Introduction to Computing © 2016 Steven S. Lumetta. All rights reserved. slide 14                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |

| <ul> <li>Flip-Flops and Selection Logic Add to Delay</li> <li>Flip-flops take time <ul> <li>To store values,</li> <li>To produce values.</li> </ul> </li> <li>And the selection logic sits between the flip-flops and the bit-slice inputs.</li> <li>The clock cycle <ul> <li>must be long enough</li> <li>to account for all of these delays.</li> </ul> </li> </ul> | Clock Speed is Determined by the Slowest Logic<br>The longest path through combinational logic<br>determines the speed of the common clock.<br>In practice,<br>• engineers identify complex and/or important<br>elements and<br>• work hard to make them fast or<br>• to split them into several cycles.<br>Even if a serial design's logic needs only<br>0.1 clock cycles, operating on N-bit operands<br>still takes N clock cycles. |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ECE 120: Introduction to Computing © 2016 Steven S. Lumetta. All rights reserved. slide 15                                                                                                                                                                                                                                                                            | ECE 120: Introduction to Computing © 2016 Steven S. Lumetta. All rights reserved. slide 16                                                                                                                                                                                                                                                                                                                                             |





## When Are Results Stored?

A rising edge arrives at t = 0 (gate delays).



| Serial Design is At Least 5.5x Slower<br>To handle N-bit operands, a bit-sliced design<br>requires 2N + 1 gate delays.<br>For a serial design,<br>• the clock cycle must be<br>at least 11 gate delays, and<br>• we must execute for N cycles, so<br>• N-bit operands require<br>at least 11N gate delays.<br>The serial design is at least 5.5x slower.<br>(And may be even slower!) | <ul> <li>Bit-Sliced and Serial Designs are Extrema</li> <li>Both designs are simple.</li> <li>Serial designs are relatively small, but slow.</li> <li>Bit-sliced designs are fast, but large.</li> <li>But we can build anything in between: <ul> <li>2 bit slices per cycle,</li> <li>3 bit slices per cycle,</li> <li>and so forth.</li> </ul> </li> <li>And/or optimize more than one bit slice (increase complexity).</li> </ul> |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ECE 120: Introduction to Computing © 2016 Steven S. Lumetta. All rights reserved. slide 21                                                                                                                                                                                                                                                                                            | ECE 120: Introduction to Computing © 2016 Steven S. Lumetta. All rights reserved. slide 22                                                                                                                                                                                                                                                                                                                                           |

## An Example of Partial Serialization in Practice

In one generation of Intel processors, • the designers included **16-bit adders** • clocked at twice the main clock speed (6 GHz instead of 3 GHz). These adders could be used to ... • perform a single 32-bit add (two cycles at 6 GHz), or • perform two 16-bit adds for multimedia codes.

slide 23