-
Notifications
You must be signed in to change notification settings - Fork 2
/
cs150.html
1996 lines (1992 loc) · 124 KB
/
cs150.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<div><div class='wrapper'>
<p><a name='1'></a></p>
<h1>CS 150 - Digital Design</h1>
<h2>August 23, 2012</h2>
<p>30 lab stations. Initially, individual, but later will partner up. Can
admit up to 60 people -- limit. Waitlist + enrolled is a little over
that. Lab lecture that many people miss: Friday 2-3. Specific lab sections
(that you're in). Go to the assigned section at least for the first several
weeks.</p>
<p>This is a lab-intensive course. You will get to know 125 Cory very
well. Food and drink on round tables only. Very reasonable policy.</p>
<p>As mentioned: first few are individual labs, after that you'll pair up for
the projects. The right strategy is to work really hard on the first few
labs so you look good and get a good partner.</p>
<p>The book is Harris & Harris, Computer Design & Architecture.</p>
<p>Reading: (skim chapter 1, read section 5.6.2 on FPGAs) -- H&H, start to
look at Ch. 5 of the Virtex User's Guide.</p>
<p>H&H Ch. 2 is on combinational circuits. Assuming you took 61C, not doing
proofs of equivalence, etc.</p>
<p>Ch. 3 is sequential logic. Combinational is history-agnostic; sequential
allows us to store state (dynamical time-variant system).</p>
<p>With memory and a NAND gate, you can make everything.</p>
<p>Chapter 4 is HDLs. Probably good to flip through for now. We're going to
use Verilog this semester. Book gives comparisons between Verilog and VHDL.</p>
<p>First lab next week, you will be writing simple Verilog code to implement
simple boolean functions. 5 is building blocks like ALUs. 6 is
architecture. 7, microarchitecture: why does it work, and how do you make
pipelined processors. May find there's actually code useful to final
project. Chapter 8 is on memory.</p>
<p>Would suggest that you read the book sooner rather than later. Can sit down
in first couple of weeks and read entire thing through.</p>
<p>Lecture notes: will be using the whiteboard. If you want lecture notes, go
to the web. Tons of resources out there. If there's something particular
about the thing Kris says, use <em>Piazza</em>. Probably used several times by
now, so not an issue.</p>
<p>Cheating vs. collaboration: link on website that points to Kris's version
of a cheating policy.</p>
<p>Grading! There will be homeworks, and there will probably be homework
quizzes (a handful, so probably 10 + 5%). There will be a midterm at least
(possibly two), so that's like 15%. Labs and project are like 10 and 30%,
and the final is 30%.</p>
<p>Couple things to note: lab is very important, because this is a lab
course. If you take the final and the midterm and the quizzes into account,
that's 50% of your grade.</p>
<p>Lab lecture in this room (306 Soda, F 2-3p). Will probably have five weeks
of lab lecture. Section's 3-4. Starting tomorrow.</p>
<p>Office hours to be posted -- as soon as website is up. Hopefully by
tomorrow morning.</p>
<h2>King Silicon</h2>
<p>FINFETs (from Berkeley, in use by Intel). Waht can you do with 22nm tech?
Logic? You get something more than <mathjax>$10^6$</mathjax> digital gates per <mathjax>$mm^2$</mathjax>. SRAM,
you get something like <mathjax>$10 Mb/mm^2$</mathjax>, Flash and DRAM, you get something like
<mathjax>$10 MB/mm^2$</mathjax>. You want to put your MIPS processor on there, or a 32-bit ARM
Cortex? A small but efficient machine? On the order of <mathjax>$10^5$</mathjax> gates, so
about <mathjax>$0.1 mm^2$</mathjax>. Don't need a whole lot of RAM and flash for your
program. Maybe a megabit of RAM, a megabyte of flash, and that adds up to
<mathjax>$0.3 mm^2$</mathjax>. Even taking into account cost of packaging and testing, you're
making a chip for a few pennies.</p>
<p>Think of the cell phone: processor surrounded by a whole bunch of
peripherals. I/O devices (speakers, screens, LEDs, buzzer; microphone,
keypad, buttons, N-megapixel camera, 3-axis accelerometer, 3-axis
gyroscope, 3-axis magnetometer, touchscreen, etc), networking devices
(cell, Wifi, bluetooth, etc), Cool thing here is that it means that you can
get all of these sensors in a little chip package. Something:
microprocessor in general will not want direct interface. A whole cloud of
"glue logic" that frees the processor from having to deal with
idiosyncrasies of these things. Lots of different interfaces that you have
to deal with. Another way of looking at this: microprocessor at the core,
glue logic around the outside of that, that is talking to the analog
circuitry, which talks to the actual transducers (something has to do
conversions between different energy domains).</p>
<p>another way of looking at this is that you have this narrow waist of
microprocessors, which connects all of this stuff. the real reason we do
this is to get up to software. one goal of this class is to make you
understand the tradeoffs between hw and sw. hw is faster, lower power,
cheaper, more accurate (several axes: timing, sw is more flexible. if we
knew everything people wanted to do, we'd put it in hw. everything is
better in hw, except when you put it in hw, it's fixed. in general, you've
got a bunch more people working in sw than in hw. this class is nice in
that it connects these two worlds.</p>
<p>if you can cross that bridge and understand how to understand a software
problem in hardware design stages, or solve a hw bug through software, you
can be the magician.</p>
<p>what we're going to do this semester in the project is similar to previous
projects in that we'll have a mips processor. looks like a 3-stage pipeline
design, and we'll have you do a bunch of hardware interfaces (going across
the hw/sw boundary, not into analog obviously). we have, for example,
video: we might do audio out. we'll do a keyboard for sure and a
timer. there's some things in here; all of this will end up being
memory-mapped io so that software on the processor can get to it, and it'll
also have interrupts (exceptions). not that many people who understand
interrupts and can design that interface well so that sw people are happy
with it.</p>
<p>You will <em>not</em> be wiring up chips on breadboards (or protoboards), the way
we used to in this class. You'll be writing in Verilog. You'll basically be
using a text editor to write Verilog, which is a HDL. There's a couple of
forms: one is a structural one, where you actually specify the nodes. First
lab you'll do that, but afterwards, you'll be working at a behavioral
level. You'll let the synthesis engine figure out how to take that
high-level description and turn it into the right stuff in the target
technology. Has to do a mapping function, and eventually a place & route.</p>
<p>Lot of logic. Whole bunch of underlying techs you might map to, and in the
end, you might go to an IC where there's some cell library that the
particular foundry gives you. Very different than if you're mapping to a
FPGA, which is what you'll be doing this semester. Job of the synthesis
tool to turn text into right set of circuits, and that goes into simulation
engine(s), and it lets you go around one of these loops in minutes for
small designs, hours for larger designs, and iterate on your design and
make sure it's right.</p>
<p>Some of you have used LTSpice and are used to using drawing to get a
schematic, and that's another way to get into this kind of system as well,
but that's structural. A big part of this course, unfortunately, is
learning and understanding how these tools work: how I go through the
simulation and synthesis process.</p>
<p>The better you get at navigating the software, the better a digital
designer you'll be. Painful truth of it all. Reality is that this is
exactly the way it works in industry. Nature of the IC CAD world. Something
like a $10B/yr industry. Whole lot better than plugging into a board.</p>
<p>FPGA board: fast and cheap upfront, but expensive per part. Other part of
spectrum is to go with an IC or application-specific integrated circuit
(ASIC), which is slow and costly upfront, but cheap per unit. Something in
between: use an FPGA + commercial off-the-shelf chips, and a custom
PCB. Still expensive per part (less so), but it's pretty fast.</p>
<p>FPGA? Field-programmable gate array. The core of an FPGA is the
configurable logic block. The whole idea behind a CLB is that you have the
dataplane, where you have a bunch of digital inputs to the box, and some
number of outputs in the data plane, and there's a separate control plane
(configuration) that's often loaded from a ROM or flash chip externally
when the chip boots up. Depending on what you put in, it can look
different. Fast, since it's implemented at the HW level.</p>
<p>If you take a bunch of CLBs and put them in an array, and you put a bunch
of wiring through that array, that is configurable wiring.</p>
<p>If we made this chip and went through the process, when we turned it into a
single chip, we'd take everything we put into this, it'd be less than a
square millimeter of 22nm silicon.</p>
<p>Talk about how FPGAs make it easy to connect external devices to a
microprocessor.</p>
<p>Course material: we can start from systems perspective: systems are
composed of datapath and control (FSM), or we can start from the very
bottom: transistors compose gates, which get turned into registers and
combinational logic, which gets turned into state and next state logic,
which make up the control. Also, storage and math/ALU (from registers and
combinational logic) makes up the datapath.</p>
<p><a name='2'></a></p>
<h1>CS 150 Lab Lecture 0</h1>
<h2>August 24, 2012</h2>
<p>Note: please make sure to finish labs 2 and 4, since those will be going
into your final project. Labs will run first 6 weeks, after which we will
be starting the final project.</p>
<p>Large design checkpoint, group sizes < 3.</p>
<p>stuff.</p>
<p><a name='3'></a></p>
<h1>CS 150: Digital Design & Computer Architecture</h1>
<h2>August 28, 2012</h2>
<h2>Admin</h2>
<p>Lab: cardkey, working on that. Not a problem yet. Labs: T 5:30-8:30, W 5-8,
θ 5-8. Discussion section: looking for a room. Office hours online. θ
11-12, F 10:30-11:30. In 512 Cory.</p>
<p>Reading: Ch. 4 (HDL). This week: through 4.3 (section that talks about
structural Verilog). For next week: the rest. It is Verilog only; we're not
going to need VHDL. If you get an interview question in VHDL, answer it in
Verilog.</p>
<h2>Taxonomy</h2>
<p>you've got HDLs, and there are really two flavors: VHDL and
Verilog. Inside of Verilog you've got structural (from gates) and
behavioral (say output is equal to a & b).</p>
<h2>Abstraction</h2>
<p>Real signals are continuous in time and magnitude. Before you do anything
with signals, these days, you'll discretize in time and magnitude --
generally close enough to continuous for practical purposes. If it's a
serial line, there's some dividing line in the middle, and the HW has to
make a decision. Regular time interval called CLK; two values of
magnitude is called binary.</p>
<h2>Hierarchy</h2>
<p>Compose bigger blocks from smaller blocks. Principle of reuse -- modularity
based on abstraction is how we get things done (Liskov). Reuse tested
modules. Very important design habit to get into. Both partners work on and
define interface specification. Layering. Expose inputs and outputs and
behavior. Define spec, then divide labor.</p>
<p>One partner implements module, one partner implements test harness.</p>
<p>Regularity: because transistors are cheap, and design time is expensive,
sometimes you build smaller (simpler) blocks out of tested bigger
blocks. Key pieces of what we want to do with our digital abstraction.</p>
<p>Abstraction is not reality. Simulation: Intel FDIV bug in the original
Pentium. Voltage sag because of relatively high wire resistance.</p>
<p>Lab 0: our abstraction is structural Verilog. There are tons of online
tutorials on Verilog; Ch 4.3 in H\&H is a good reference on that; your TAs
are a good reference. Pister's not a good reference on the syntax. You're
allowed to drop a small number of different components on your circuit
board and wire them up. If you want to make some circuit, you can.</p>
<p>Powers of two!</p>
<p>FDIV bug: from EE dept at UCLA: outraged that they had not done exhaustive
testing.</p>
<p>note: <mathjax>$1 yr \approx \pi \cdot 10^7 s$</mathjax>. With this approximation, <mathjax>$\pi$</mathjax> and
<mathjax>$2$</mathjax> are about the same. Combinatorial problem.</p>
<p>Combinational logic vs. sequential logic. Combinational logic: outputs are
a function of current inputs, potentially after some delay (memoryless),
versus sequential, where output can be a function of previous inputs.</p>
<p>Combinational circuits have no loops (no feedback), whereas circuits with
memory have feedback. Classic: SR latch (2 xor gates hooked to each other).</p>
<p>So let's look at the high-level top-down big picture that we drew before:
system design comes from a combination of datapath and control (FSM). On
the midterm (or every midterm Pister's given for this course), there's
going to be a problem about SRAM, and you're going to have to design a
simple system with that SRAM.</p>
<p>e.g. Given 64k x 16 SRAM design, design a HW solution to find the min and
max two's complement numbers in that SRAM.</p>
<p>Things you need to know about transistors for this class: you already know
them.</p>
<p>Wired OR (could be a wired AND, depending on how you look at it). Open
drain or open collector: this sort of thing.</p>
<p>Zero static power: CMOS inverter. Not longer true; power going down, but
number goes up. Leakage current ends up being on the order of an amp. Also,
increasingly, gates leak.</p>
<p>switching current: charging and discharging capacitors. <mathjax>$\alpha C V^2 f$</mathjax></p>
<p>crowbar current: <mathjax>$I_{CC}$</mathjax>, While voltage is swinging from min to max or
vice versa, this current exists. All of these things come together to limit
performance of microprocessor.</p>
<p>a minterm is a product containing every input variable or its complement,
and a maxterm is a sum containing every input variable or its complement.</p>
<p><a name='4'></a></p>
<h1>CS 150: Digital Design & Computer Architecture</h1>
<h2>August 30, 2012</h2>
<h2>Introduction</h2>
<p>Finite state machines, namely in Verilog. If we have time, canonical forms
and more going from transistor to inverter to flip-flop.</p>
<p>So. The idea with lab1 is that you're going to be making a digital
lock. The real idea is that you're going to be learning behavioral Verilog.</p>
<h2>Finite State Machines</h2>
<p>Finite state machines are sequential circuits (as opposed to combinational
circuits), so they may depend on previous inputs. What we're interested in
are synchronous (clocked) sequential circuits. In a synchronous circuit,
the additional restriction is that you only care about in/out values on the
(almost always) positive-going edge of the clock.</p>
<p>Drawing with a caret on it refers to a circuit sensitive on a positive
clock edge. A bubble corresponds to the negative edge.</p>
<p>If we have a clock, some input D, and output Q, we have our standard
positive edge-triggered D flip-flop. The way we draw an unknown value, we
draw both values.</p>
<p>A register is one or more D flip-flops with a shared clock.</p>
<p>Blocking vs. unblocking assignments.</p>
<p>So. We have three parts to a Moore machine. State, Output logic, and next
state. Mealy machine is not very different.</p>
<h2>Canonical forms</h2>
<p>Minterms and maxterms. Truth table is the most obvious way of writing down
a canonical form. And then there's minterm expansion and maxterm
expansion. Both are popular and useful for different reasons. A minterm is
a product term containing every input variable, while a maxterm is a sum
term containing every input variable. Consider min term as a way of
specifying the ones in the truth table. Construction looks like disjunctive
normal form.</p>
<p>Maxterms are just the opposite: you're trying to knock out rows of the
truth table. If you've got some function that's mostly ones, you have to
write a bunch of minterms to get it, as opposed to a handful of
maxterms. Construction looks like conjunctive normal form.</p>
<p>Both maxterm and minterm are unique.</p>
<p>de Morgan's law: "bubble-pushing".</p>
<p><a name='5'></a></p>
<h1>CS 150: Digital Design & Computer Architecture</h1>
<h2>September 4, 2012</h2>
<p>FSM design: Problem statement, block diagram (inputs and outputs), state
transition diagram (bubbles and arcs), state assignment, state transition
function, output function.</p>
<p>Classic example: string recognizer. Output a 1 every time you see a one
followed by two zeroes on the input.</p>
<p>When talking about systems, there's typically datapath and the FSM
controller, and you've got stuff going between the two (and outside world
interacts with the control).</p>
<p>Just go through the steps.</p>
<h2>Low-level stuff</h2>
<p>Transistor turns into inverter, which turns into inverter with enable,
which turns into D flip-flop.</p>
<p>Last time: standard CMOS inverter. If you want to put an enable on it,
several ways to do that: stick it into an NMOS transistor, e.g. When enable
is low, output is Z (high impedance) -- it's not trying to drag output
anywhere.</p>
<p>It turns out (beyond scope of things for this class) that NMOS are good at
pulling things down, but not so much at pulling things up. Turns out you
really want to add a PMOS transistor to pull up. We want this transistor to
be on when enable is 1, but it turns on when the gate is low. So we stick
an inverter on enable. Common; called a pass-gate (butterfly gate). Pass
gates are useful, but they're not actually driving anything. They just
allow current to flow through. If you put too many in series, though,
things slow down.</p>
<p>Pass-gates as controlled inverters; can be used to create a mux.</p>
<p>SR (set/reset) latch. Requires a NOR gate. Useful thing about NOR and NAND
is that with the right constant, they can make inverters. That is why they
are useful in making latches (if we cross-couple two of them).</p>
<p>If S = R = 0, then NOR gates turn into inverters, and this thing
effectively turns into a bistable storage element. If I feed in a 1, it'll
force the output to be 0, which forces the original gate's input to be a 1.</p>
<p>Clock systems. Suppose we take our SR latch and put an AND gate in front of
S and R with an enable line on it, we can now turn off this functionality,
and when enable is low, S and R can do whatever they want; they're not
going to affect the outputs of this thing. You can design synchronous
digital systems using simple level sensitive latches.</p>
<p>Contrast with ring oscillator (3-stage; simplest). That is unstable -- if I
put an odd number of inverters in series, there is no stable
configuration. Very useful for generating a clock. Standard crystal
oscillator: Pierce configuration.</p>
<p>Odd number of stages unstable, two stages stable, more stages you have to
wrry about other things.</p>
<p>Can be clocked, but you have to be careful. For example: if I wanted to
design a 1-bit counter, with a clocked system, we can consider a
level-sensitive D latch. This is what happens when you get a latch in
Verilog. Otherwise, the synthesis tool well have it keep its previous
value. If you do that, it turns out that probably gives you enough delay
that when the clock is high, the output is 1; it'll probably oscillate. So
that's bad; maybe we'll make the enable line (the clock line) really
narrow. And not surprisingly, that's called narrow clocking. For simple
systems, you can get away with that. Make delay similar to single gate's or
a few gates' delay. However, ugly, and don't do that. Back in the day,
people did this, and they were simple enough that they could get away with
it.</p>
<p>What I really want is my state register and its output going through some
combinational logic block with some input to set my next state, and only
once per clock period does this thing happen. The problem here is in a
single clock period, I get a couple of iterations through the loop. So how
do I take my level-sensitive latch (I've turned it into a D latch with
enable, and that's my clock), and when clock is low, there's no problem. I
don't worry that my input's going to cruise through this thing; and when
it's high, I want my input (the D input) to remain constant.</p>
<p>As long as clock is high, I don't care; it'll maintain its state, since I'm
not looking at those inputs. There are a whole bunch of ways you can do it
(all of which get used somewhere), but the safest (and probably most
common) is to stick another latch (another clocked level-sensitive latch)
in front of it with an inverter. That's now my input.</p>
<p>So when the clock is low, the first one is enabled, and it's transparent
(it's a buffer). This is called an edge-triggered master/slave D flip-flop.</p>
<p>The modern way of implementing the basic D latch is by using feedback for
the storage element, and an input (both the feedback and the input are
driven by out-of-phase enable). My front end (the master) is driving the
signal line when the clock is low, and, conversely, when the clock is high,
the feedback inverter will be driving the line. Bistable storage element
maintaining its state, and input is disconnected.</p>
<p>Now, with the slave, same picture. Sensitive when clock is high, as opposed
to master, which is sensitive when clock is low. The idea is that the slave
prevents anything from getting into the storage element until it
stabilizes.</p>
<p>At the end of the day, the rising edge of the clock latched D to
Q. Variation that happened after, doesn't propagate to master; variation
that happened before, slave wasn't listening. So now we have flip-flops and
can make FSMs.</p>
<p><a name='6'></a></p>
<h1>CS 150: Digital Design & Computer Architecture</h1>
<h2>September 6, 2012</h2>
<h2>Verilog synthesis vs. testbenches (H&H 4.8)</h2>
<p>There's the subset of the language that's actually synthesizable, and then
there's more stuff just for the purpose of simulation and testing. Way
easier to debug via simulation.</p>
<p>Constructs that don't synthesize:
<em> #t: used for adding a simulation delay.
</em> ===, !==: 4 state comparisons (0, 1, X, Z).
* System tasks (e.g. \$display, prints to console in a C printf-style
format that's pretty easy to figure out; \$monitor, prints to console
whenever its arguments change)</p>
<p>In industry, it's not at all uncommon to write the spec, write the
testbench, then implement the module. Once the testbench is written, it
becomes the real spec. "You no longer have bugs in your code; you only have
bugs in your specification."</p>
<p>How do we build a clock in Verilog?</p>
<pre><code>parameter halfT = 5;
reg CLK;
initial CLK = 0;
always begin
#(halfT) CLK = ~CLK;
end
</code></pre>
<p>H&H example 4.39 shows you how to read from a file.</p>
<pre><code>silly_function(.a(a), .b(b), .c(c), .s(s));
reg [3:0] testvect[10000:0];
$readmemb("test.tv", testvect); // done when input is 4 bits of X (don't care)
always @(posedge CLK)
#1 assign {a, b, c, out_exp} = testvect[num]
always @(negedge CLK) begin
if (s !== out_exp) $display("error ... ");
num <= num + 1;
end
</code></pre>
<p>How big can you make shift registers? At some point, IBM decreed that every
register on every IBM chip would be part of one gigantic shift register. So
you've got your register file feeding your ALU; it's a 32 x 32
register. There's a test signal; when it's high, the entire thing becomes
one shift register. Why? Testing. This became the basis of JTAG. Another
thing: dynamic fault imaging. Take a chip and run it inside a scanning
electron microscope. Detects backscatter from electrons. Turns out that a
metal absorbs depending on what voltage they're at, and oxides absorb
depending on the voltage of the metal beneath them. So you get a different
intensity depending on the voltage.</p>
<p>We can also take these passgates and make variable interconnects. So if
I've got two wires that don't touch, I can put a passgate on there and call
that the connect input.</p>
<p>Last time we talked about MUXes. I can make a configurable MUX -- the MUX,
we did a two-to-one mux, and if I've got some input over here, I select
according to what I have as my select input.</p>
<p>Next time: more MIPS, memory.</p>
<p><a name='7'></a></p>
<h1>CS 150: Digital Design & Computer Architecture</h1>
<h2>September 11, 2012</h2>
<p>Changed office hours this week. CLBs, SAR, K-maps.</p>
<p>Last time: we went from transistors to inverters with enable to D-flipflops
to a shift register with some inputs and outputs, and, from there to the
idea that once you have that shift register, then you can hook that up with
an n-input mux and make an arbitrary function of n variables.</p>
<p>This gives me configurable logic and configurable interconnects, and
naturally I take the shift out of one and into another, and I've got an
FPGA.</p>
<p>The LUT is the basic building block: I get four of those per slice, and
some other stuff: includes fast ripple carry logic for adders, the ability
to take two 6s and put them together to form a 7-LUT. So: pretty flexible,
fair amount of logic, and that's a slice. One CLB is equal to two slices.</p>
<p>And I've got, what is that, 8640 CLBs per chip. Also: DSP48 blocks. 64 of
these, and each one is a little 48-bit configurable ALU. So that gives you
something like 70000 (6-LUTs + D-ff). So that's what you've got, what we
work with.</p>
<p>Now, let's talk about a successive approximation register analog to digital
converter. A very popular device. Link to a chip that implemented the
digital piece of this thirty years ago. Why are we looking at this? It's
nice; it's an example of a "mixed-signal" (i.e. analog mixed with digital
in the same block, where you have a good amount of both) system.</p>
<p>It turns out that analog designers need to be good digital designers these
days. I was doing some consulting for a company recently. They had
brilliant analog designers, but they had to do some digital blocks on their
chips.</p>
<p>"Real world" interfaces. Has some number of output bits that go into the
DAC; the DAC's output is simply "linear". You trust the analog designer to
give you this piece, and the digital comparator sample and hold circuit,
with the sample input, and here's your analog input voltage. So real quick
we'll look at what's in those blocks (even though this isn't 150
material). S/H: simplest example is just a transistor. Maybe it's a
butterfly gate; typically, there's some storage capacitor on the outside so
that if you've got your input voltage; when it goes low, that is held on
there.</p>
<p>Maybe there's some buffer amplifier (little CMOS opamp so it can drive nice
loads); capacitive input, so signal will stay there for a long time. Not
150 material.</p>
<p>The DAC, a simple way of making this is to generate a reference voltage
(diode-connected PMOS with voltage division, say). which you mirror, tied
together with a switch, and all of these share the same gate. Comparator's
a little more subtle. Maybe when we talk about SRAMs and DRAMs.</p>
<p>Anyway. So. Now we have the ability to generate an analog voltage under
digital control. We sample that input and are going to use that
signal. This tells us whether the output of the DAC is too big. That
together is called a SAR.</p>
<p>So what does that thing do? There's a very simple (dumb) SAR: a
counter. From reset in time, its digital output increases linearly; at some
point it crosses the analog <mathjax>$V_{in}$</mathjax>, and at that point, you stop. But:
that's not such a great thing to do: between 1 and 1024 cycles to get the
result. The better way is to do a binary search. Fun to do with
dictionaries and kids. Also works here. FSM: go bit-by-bit, starting with
most significant bit.</p>
<p>Better solution (instead of using oversized tree -- better in the sense of
less logic required): use a shift register (and compute all bits
sequentially). Or counter going into a decoder; sixteen outputs of which I
only need 10.</p>
<p>Next piece: another common challenge and where a lot of mistakes get made:
analog stuff does not simulate as well. While you're developing and
debugging, you have to come up with some way of simulating.</p>
<p>Good news: you can often go in and fix things like that. Sort of an aside
(although it sometimes shows up in my exams), once you put these
transistors down, and then you've got all these layers of metal on
top. Turns out that you can actually put this thing in a scanning electron
microscope and use undedicated logic and go in with a FIB (focused ion
beam) and fix problems. "Metal spin".</p>
<p>Back to chapter two: basic gates again. de Morgan's law: <mathjax>$\bar{AB} =
\bar{A} + \bar{B}$</mathjax>: <mathjax>$\bar{\Pi A_i} = \sum \bar{A_i}$</mathjax>. Similarly,
<mathjax>$\bar{\Sigma A_i} = \prod \bar{A_i}$</mathjax>. Suppose you have a two-level
NAND/NAND gate: that becomes a sum of products (SoP). Similarly, NOR/NOR is
equivalent to a product of sums (PoS).</p>
<p>Now, if I do NOR/NOR/INV, this is a sum of products, but the inputs are
inverted. This is an important one. This particular one is useful because
of the way you can design logic. The way we used to design logic a few
decades ago (and the way we might go back in the future) was with big long
strings of NOR gates.</p>
<p>So if I go back to our picture of a common source amplifier (erm,
inverter), and we stick a bunch of other transistors in parallel, then we
have a NOR gate. Remember: MOS devices have parasitic capacitance.</p>
<p>Consider another configuration. Suppose we invert our initial input and
connect to both of these a circled wire, which can be any of the following:
fuse / anti-fuse, mask-programmable (when you make the mask, decision to
add a contact), transistor with flipflop (part of shift register, e.g.), an
extra gate (double-gate transistors).</p>
<p>So now if I chain a bunch more of these together (all NOR'd together, then
I can program many functions. In particular, it could just be an inverter.</p>
<p>I can put a bunch of these together, and I can combine the function outputs
with another set of NORs, invert it all at the end, and I end up with
NOR/NOR/INV.</p>
<p>These guys are called PLAs (programmable logic arrays), and you can still
buy them, and they're still useful. Not uncommon to have a couple of
flipflops on them. Will have a homework assignment where you use a 30 cent
PLA and design something. Quick and dirty way of getting something for
cheap.</p>
<p>Not done anymore because slow (huge capacitances), but may come back
because of carbon nanotubes. Javey managed to make a nanotube transistor
with a source, drain, gate, and he got transport, highest current density
per square micron cross-section ever, and showed physics all worked, and
this thing is 1nm around. What Prof. Ali Javey's doing now is working with
nanowires and showing that you can grow these things on a roller and roll
them onto a surface (like a plastic surface), and putting down layers of
nanowires in alternating directions.</p>
<p>You can imagine (we're a ways away from this) where you get a transistor at
each of these locations, and you've got some CMOS on this side generating
signals, and CMOS on the output taking the output (made with big fat
gigantic 14nm transistors), and you can put <mathjax>$10^5$</mathjax> transistors per square
micron (not pushing it, since density can get up to <mathjax>$10^6$</mathjax>). End of road
for CMOS doesn't mean you ditch CMOS. Imagine making this into a jungle
gym; then you're talking about <mathjax>$10^8$</mathjax> carbon nanotubes per cubic micron,
etc. The fact that we can make these long thin transistors on their lengths
means that this might come back into fashion.</p>
<p><a name='8'></a></p>
<h1>CS 150: Digital Design & Computer Architecture</h1>
<h2>September 13, 2012</h2>
<p>Questions of the form 16x16 SRAM, design a circuit that will find the
smallest positive integer or biggest even number, or count number of times
17 appears in memory, etc. Kris loves these questions where you figure out
design (remember: separate datapath and control, come up with it on your
own) -- will probably show up on both midterm and final.</p>
<p>Office hours moved.</p>
<p>So... last time, we were talking about PLAs (prog logic array) and stuff
(NOR/NOR equivalent to AND/OR). You'll hear people talking about AND plane
and OR plane, even though they're both NORs. If you look at Fig 2.2.3,
they'll show the same regular and inverted signals, and they just draw this
as a line with an AND gate at the end. Pretty common way to draw this;
lines with OR gates.</p>
<p>Variant of PLA called a PAL -- subsets of "product" terms going to "OR"
gates.</p>
<p>Beginning of complex programmable logic devices (CPLDs, FPGAS). You can
still buy these registered PALs.</p>
<p>Why would you use this over a microprocessor? Faster. Niche. The "oh crap"
moment when you finish your board and you find that you left something
out.</p>
<p>I want to say a little about memory, because you'll be using block ram in
your lab next week. There's a ton of different variations of memory, but
they all have a couple of things in common: a decoder (address decoder)
where you take <mathjax>$n$</mathjax> input bits and turn them into <mathjax>$2^n$</mathjax> word lines in a
memory that has <mathjax>$2^n$</mathjax> words. Also have cell array. Going through cell array
you have some number of bit lines, we'll call this either <mathjax>$k$</mathjax> or <mathjax>$2k$</mathjax>,
depending on the memory. That goes into some amps / drivers, and then out
the other side you have <mathjax>$k$</mathjax> inputs and/or outputs. Sometimes shared
(depends whether or not there's output-enable). Write-enable,
output-enable, sometimes clock, sometimes d-in as well as d-out, sometimes
multiple d-outs (multiple address data pairs); whole bunch of variation in
how this happens. Conceptually, though all comes down to something that
looks like this.</p>
<p>So what's that decoder look like? Decoders are very popular circuits: they
generate all minterms of their input (gigantic products). Note that if you
invert all of the outputs, we get the maxterms (sums).</p>
<p>That was DRAM. Now, SRAM:</p>
<p>Still have word line going across; now I have a bit line and negated bit
line. Inside, I have two cross-coupled inverters (bistable storage
element). Four transistors in there: already down (vs. 1), and I still have
to access. Access transistor going to each side, hooked up to the word
line. When I read this thing, I put in an n-bit address, and the
transistors pull the bit lines. We want these as small as possible for the
bit density. 6T, sense amp needed. You can imagine that what you usually do
is pre-charge <mathjax>$BL$</mathjax>, <mathjax>$\bar{BL}$</mathjax>. As soon as you raise the word line for this
particular row, what you find is that one of them starts discharging, and
the other is constant. Analog sensing present so you can make a decision
much much faster.</p>
<p>That's how reads work; writes are interesting. Suppose I have some
<mathjax>$D_{in}$</mathjax>, what do I do? I could put an output-enable so that when writing,
they don't send anything to the output, but that would increase size
significantly. So what do I do? I just make big burly inverters and drive
the lines. Big transistors down there overcome small transistors up there;
and they flip the bit. PMOS is also generally weaker than NMOS, etc. Just
overpower it. One of rare times that you have PMOS pulling up and NMOS
pulling down. (notion of "bigger": <mathjax>$W/L$</mathjax>).</p>
<p>Transistors leak. They can leak a substantial amount. By lowering voltage,
I reduce power. It turns out there's a nonlinear relationship here, and so
the transistors leak a lot less.</p>
<p>So that's SRAM. The other question? What about a register? What's the
difference between this and a register file? Comes back to what's in the
cell array. We talked that a register is a bunch of flipflops with a shared
clock and maybe a shared enable. Think of a register as having the common
word line, and you've got a D flipflop in there. There's some clock shared
across the entire array, and there's an enable on it and possibly an
output, depending on what kind of system you've got set up. We've got D-in,
D-out, and if I'm selecting this thing, presumably I want output-enable; if
I'm writing, I need to enable write-enable.</p>
<p>So. You clearly have the ability to make registers on chips, so you can
clearly do this on the FPGA. Turns out there's some SRAMs on there,
too. There's an external SRAM that we may end up using for the class
project, and there's a whole bunch of DDR DRAM on there as well.</p>
<h2>Canonical forms</h2>
<p>Truth tables, minterm / maxterm expansions. These we've seen.</p>
<p>If you have a function equal to the sum of minterms 1,3,5,6,7, we could
implement this with fewer gates by using the maxterm expansion.</p>
<p>"Minimum sum of products", "minimum product of sums".</p>
<h2>Karnaugh Maps</h2>
<p>Easy way to reduce to minimum sum of products or minimum product of
sums. (Section 2.7). Based on the combining theorem, which says that <mathjax>$XA +
X\bar{A} = X$</mathjax>. Ideally: every row should just have a single value
changing. So, I use Gray codes. (e.g. 00, 01, 11, 10). Graphical
representation!</p>
<p><a name='9'></a></p>
<h1>CS 150: Digital Design & Computer Architecture</h1>
<h2>September 18, 2012</h2>
<p>Lab this week you are learning about chipscope. Chipscope is kinda like
what it sounds: allows you to monitor things happening in the FPGA. One of
the interesting things about Chipscope is that it's a FSM monitoring stuff
in your FPGA, it also gets compiled down, and it changes the location of
everything that goes into your chip. It can actually make your bug go away
(e.g. timing bugs).</p>
<p>So. Counters. How do counters work? If I've got a 4-bit counter and I'm
counting from 0, what's going on here?</p>
<p>D-ff with an inverter and enable line? This is a T-ff (toggle
flipflop). That'll get me my first bit, but my second bit is slower. <mathjax>$Q_1$</mathjax>
wants to toggle only when <mathjax>$Q_0$</mathjax> is 1. With subsequent bits, they want to
toggle when all lower bits are 1.</p>
<p>Counter with en: enable is tied to the toggle of the first bit. Counter
with ld: four input bits, four output bits. Clock. Load. Then we're going
to want to do a counter with ld, en, rst. Put in logic, etc.</p>
<p>Quite common: ripple carry out (RCO), where we AND <mathjax>$Q[3:0]$</mathjax> and feed this
into the enable of <mathjax>$T_4$</mathjax>.</p>
<p>Ring counter (shift register with one hot out), If reset is low I just
shift this thing around and make a circular shift register. If high, I clear
the out bit.</p>
<p>Mobius counter: just a ring counter with a feedback inverter in it. Just
going to take whatever state in there, and after n clock ticks, it inverts
itself. So you have <mathjax>$n$</mathjax> flipflops, and you get <mathjax>$2n$</mathjax> states.</p>
<p>And then you've got LFSRs (linear feedback shift registers). Given N
flipflops, we know that a straight up or down counter will give us <mathjax>$2^N$</mathjax>
states. Turns out that an LFSR give syou almost that (not 0). So why do
that instead of an up-counter? This can give you a PRNG. Fun times with
Galois fields.</p>
<p>Various uses, seeds, high enough periods (Mersenne twisters are higher).</p>
<h2>RAM</h2>
<p>Remember, decoder, cell array, <mathjax>$2^n$</mathjax> rows, <mathjax>$2^n$</mathjax> word lines, some number of
bit lines coming out of that cell array for I/O with output-enable and
write-enable.</p>
<p>When output-enable is low, D goes to high-Z. At some point, some external
device starts driving some Din (not from memory). Then I can apply a write
pulse (write strobe), which causes our data to be written into the memory
at this address location. Whatever was driving it releases, so it goes back
to high-impedance, and if we turn output-enable again, we'll see "Din" from
the cell array.</p>
<p>During the write pulse, we need Din stable and address stable. We have a
pulse because we don't want to break things. Bad things happen.</p>
<p>Notice: no clock anywhere. Your FPGA (in particular, the block ram on the
ML505) is a little different in that it has registered input (addr &
data). First off, very configurable. All sorts of ways you can set this up,
etc. Addr in particular goes into a register and comes out of there, and
then goes into a decoder before it goes into the cell array, and what comes
out of that cell array is a little bit different also in that there's a
data-in line that goes into a register and some data-out as well that's
separate and can be configured in a whole bunch of different ways so that
you can do a bunch of different things.</p>
<p>The important thing is that you can apply your address to those inputs, and
it doesn't show up until the rising edge of the clock. There's the option
of having either registered or non-registered output (non-registered for
this lab).</p>
<p>So now we've got an ALU and RAM. And so we can build some simple
datapaths. For sure you're going to see on the final (and most likely the
midterm) problems like "given a 16-bit ALU and a 1024x16 sync SRAM, design
a system to find the largest unsigned int in the SRAM."</p>
<p>Demonstration of clock cycles, etc. So what's our FSM look like? Either
LOAD or HOLD.</p>
<p>On homework, did not say sync SRAM. Will probably change.</p>
<p><a name='10'></a></p>
<h1>CS 150: Digital Design & Computer Architecture</h1>
<h2>September 20, 2012</h2>
<p>Non-overlapping clocks. n-phase means that you've got n different outputs,
and at most one high at any time. Guaranteed dead time between when one
goes low and next goes high.</p>
<h2>K-maps</h2>
<p>Finding minimal sum-of-products and product-of-sums expressions for
functions. <strong>On-set</strong>: all the ones of a function; <strong>implicant</strong>: one or
more circled ones in the onset; a <strong>minterm</strong> is the smallest implicant you
can have, and they go up by powers of two in the number of things you can
have; a <strong>prime implicant</strong> can't be combined with another (by circling);
an <strong>essential prime implicant</strong> is a prime implicant that contains at
least one one not in any other prime implicant. A <strong>cover</strong> is any
collection of implicants that contains all of the ones in the on-set, and a
<strong>minimal cover</strong> is one made up of essential prime implicants and the
minimum number of implicants.</p>
<p>Hazards vs. glitches. Glitches are when timing issues result in dips (or
spikes) in the output; hazards are if they might happen. Completely
irrelevant in synchronous logic.</p>
<h2>Project</h2>
<p>3-stage pipeline MIPS150 processor. Serial port, graphics accelerator. If
we look at the datapath elements, the storage elements, you've got your
program counter, your instruction memory, register file, and data
memory. Figure 7.1 from the book. If you mix that in with figure 8.28,
which talks about MMIO, that data memory, there's an address and data bus
that this is hooked up to, and if you want to talk to a serial port on a
MIPS processor (or an ARM processor, or something like that), you don't
address a particular port (not like x86). Most ports are
memory-mapped. Actually got a MMIO module that is also hooked up to the
address and data bus. For some range of addresses, it's the one that
handles reads and writes.</p>
<p>You've got a handful of different modules down here such as a UART receive
module and a UART transmit module. In your project, you'll have your
personal computer that has a serial port on it, and that will be hooked up
to your project, which contains the MIPS150 processor. Somehow, you've got
to be able to handle characters transmitted in each direction.</p>
<h2>UART</h2>
<p>Common ground, TX on one side connected to RX port on other side, and vice
versa. Whole bunch more in different connectors. Basic protocol is called
RS232, common (people often refer to it by connector name: DB9 (rarely
DB25); fortunately, we've moved away from this world and use USB. We'll
talk about these other protocols later, some sync, some async. Workhorse
for long time, still all over the place.</p>
<p>You're going to build the UART receiver/transmitter and MMIO module that
interfaces them. See when something's coming in from software /
hardware. Going to start out with polling; we will implement interrupts
later on in the project (for timing and serial IO on the MIPS
processor). That's really the hardcore place where software and hardware
meet. People who understand how each interface works and how to use those
optimally together are valuable and rare people.</p>
<p>What you're doing in Lab 4, there's really two concepts of (1) how does
serial / UART work and (2) ready / valid handshake.</p>
<p>On the MIPS side, you've got some addresses. Anything that starts with FFFF
is part of the memory-mapped region. In particular, the first four are
mapped to the UART: they are RX control, RX data, TX control, and TX data.</p>
<p>When you want to send something out the UART, you write the byte -- there's
just one bit for the control and one byte for data.</p>
<p>Data goes into some FSM system, and you've got an RX shift register and a
TX shift register.</p>
<p>There's one other piece of this, which is that inside of here, the thing
interfacing to this IO-mapped module uses this ready bit. If you have two
modules: a source and a sink (diagram from the document), the source has
some data that is sending out, tells the sink when the data is valid, and
the sink tells the source when it is ready. And there's a shared "clock"
(baud rate), and this is a synchronous interface.</p>
<ul>
<li>source presents data</li>
<li>source raises valid</li>
<li>when ready & valid on posedge clock, both sides know the transaction was
successful.</li>
</ul>
<p>Whatever order this happens in, source is responsible for making sure data
is valid.</p>
<p>HDLC? Takes bytes and puts into packets, ACKs, etc.</p>
<p>Talk about quartz crystals, resonators. <mathjax>$\pi \cdot 10^7$</mathjax>.</p>
<p>So: before I let you go, parallel load, n bits in, serial out, etc.</p>
<p><a name='11'></a></p>
<h1>UART, MIPS and Timing</h1>
<h2>September 25, 2012</h2>
<p>Timing: motivation for next lecture (pipelining). Lot of online resources
(resources, period) on MIPS. Should have lived + breathed this thing during
61C. For sure, you've got your 61C lecture notes and CS150 lecture notes
(both from last semester). Also the green card (reference) and there's
obviously the book. Should have tons of material on the MIPS processor out
there.</p>
<p>So, from last time: we talked about a universal asynchronous receiver
transmitter. On your homework, I want you to draw a couple of boxes
(control and datapath; they exchange signals). Datapath is mostly shift
registers. May be transmitting and receiving at same time; one may be idle;
any mix. Some serial IO lines going to some other system not synchronized
with you. Talked about clock and how much clock accuracy you need. For
eight-bit, you need a couple percent matching parity. In years past, we've
used N64 game controllers as input for the project. All they had was an RC
relaxation oscillator. Had same format: start bit, two data bits, and stop
bit. Data was sent Manchester-coded (0 -> 01; 1: 10). In principle, I can
have a 33% error, which is something I can do with an RC oscillator.</p>
<p>Also part of the datapath, 8-bit data going in and out. Whatever, going to
be MIPS interface. Set of memory-mapped addresses on the MIPS, so you can
read/write on the serial port. Also some ready/valid stuff up
here. Parallel data to/from MIPS datapath.</p>
<p>MIPS: invented by our own Dave Patterson and John Henessey from
Stanford. Started company, Kris saw business plan. Was confidential, now
probably safe to talk about. Started off and said they're going to end up
getting venture capital, and VCs going to take equity, which is going to
dilute their equity. Simple solution, don't take venture money. These guys
have seen enough of this. By the time they're all done, it would be awesome
if they each had 4% of the company. They set things up so that they started
at 4%. Were going to allocate 20% for all of the employees, series A going
to take half, series B, they'll give up a third, and C, 15%. Interesting
bit about MIPS that you didn't learn in 61C.</p>
<p>One of the resources, the green sheet, once you've got this thing, you know
a whole bunch about the processor. You know you've got a program counter
over here, and you've got a register file in here, and how big it
is. Obviously you've got an ALU and some data memory over here, and you
know the instruction format. You don't explicitly know that you've got a
separate instruction memory (that's a choice you get to make as an
implementor); you don't know how many cycles it'll be (or pipelined,
etc). People tend to have separate data and instruction memory for embedded
systems, and locally, it looks like separate memories (even on more
powerful systems).</p>
<p>We haven't talked yet about what a register file looks like inside. Not
absolute requirement about register file, but it would be nice if your
register file had two read and one write address.</p>
<p>We go from a D-ff, and we know that sticking an enable line on there lets
us turn this into a D-ff with enable. Then if I string 32 of these in
parallel, I now have a register (clocked), with a write-enable on it.</p>
<p>Not going to talk about ALU today: probably after midterm.</p>
<p>So now, I've got a set of 32 registers. Considerations of cost. Costs on
the order of a hundredth of a cent.</p>
<p>Now I've made my register file. How big is that logic? NAND gates to
implement a 5->32 bit decoder.</p>
<p>Asynchronous reads. At the rising edge of the clock, synchronous write.</p>
<p>So, now we get back to MIPS review. The MIPS instrctions, you've got
R/I/J-type instructions. All start with opcode (same length: 6 bits). Tiny
fraction of all 32-bit instructions.</p>
<p>More constraints as we get more stuff. If we then want to constrain that
this is a single-cycle processor, then you end up with a pretty clear
picture of what you want. PC doesn't need 32 bits (two LSBs are always 0);
can implement PC with a counter.</p>
<p>PC goes into instruction memory, and out comes my instruction. If, for
example, we want to execute <code>LW $s0 12(%s3)</code>, then we look at the green
card, and it tells us the RTL.</p>
<p>Adding R-type to the I-type datapath adds three muxes. Not too bad.</p>
<p><a name='12'></a></p>
<h1>Pipelining</h1>
<h2>September 27, 2012</h2>
<p>Last time, I just mentioned in passing that we will always be reading
32-bit instruction words in this class, but ARM has both 32- and 16-bit
instruction sets. MicroMIPS does the same thing.</p>
<p>Optimized for size rather than speed; will run at 100 MHz (not very good
compared to desktop microprocessors made in the same process, which run in
the gigahertz range), but it burns 3 mW. <mathjax>$0.06 \text{mm}^2$</mathjax>. Questions
about power monitor -- you've got a chip that's somehow hanging off of the
power plug and manages one way or the other to get a voltage and current
signal. You know the voltage is going to look like 155 amplitude.</p>
<p>Serial! Your serial line, the thing I want you to play around with is the
receiver. We give this to you in the lab, but the thing is I want you to
design the basic architecture.</p>
<p>Start, stop, some bits between. You've got a counter on here that's running
at 1024 ticks per bit of input. Eye diagrams.</p>
<p>Notion of factoring state machines. Or you can draw 10000 states if you
want.</p>
<p>Something about Kris + scanners, it always ends badly. Will be putting
lectures on the course website (and announce on Piazza). High-level, look
at pipelines.</p>
<p>MIPS pipeline</p>
<p>For sure, you should be reading 7.5, if you haven't already. H&H do a great
job. Slightly different way of looking at pipelines, which is probably
inferior, but it's different.</p>
<p>First off, suppose I've got something like my Golden Bear power monitor,
and <mathjax>$f = (A+B)C + D$</mathjax>. It's going to give me this ALU that does addition, ALU
that does multiplication, and then an ALU that does addition again, and
that will end up in my output register.</p>
<p>There is a critical path (how fast can I clock this thing?). For now,
assume "perfect" fast registers. This, however, is a bad assumption.</p>
<p>So let's talk about propagation delay in registers.</p>
<h2>Timing & Delay (H&H 3.5; Fig 3.35,36)</h2>
<p>Suppose I have a simple edge-triggered D flipflop, and these things come
with some specs on the input and output, and in particular, there is a
setup time (<mathjax>$t_{\mathrm{setup}}$</mathjax>) and a hold time (<mathjax>$t_{\mathrm{hold}}$</mathjax>).</p>
<p>On the FPGA, these are each like 0.4 ns, whereas in 22nm, these are more
like 10 ps.</p>
<p>And then the output is not going to change immediately (going to remain
constant for some period of time before it changes), <mathjax>$t_{ccq}$</mathjax> is the
minimum time for clock to contamination (change) in Q. And then there's a
maximum called <mathjax>$t_{pcq}$</mathjax>, the maximum (worst-case) for clock to stable
Q. Just parameters that you can't control (aside from choosing a different
flipflop).</p>
<p>So what do we want to do? We want to combine these flipflops through some
combinational logic with some propagation delay (<mathjax>$t_{pd}$</mathjax>) and see what our
constraints are going to be on the timing.</p>
<p>Once the output is stable (<mathjax>$t_{pcq}$</mathjax>), it has to go through my
combinational logic (<mathjax>$t_{pd}$</mathjax>), and then counting backwards, I've got
<mathjax>$t_{setup}$</mathjax>, and that overall has to be less than my cycle. Tells you how
complex logic can be, and how many stages of pipelines you need. Part of
the story of selling microprocessors was clock speed. Some of the people
who got bachelors in EE cared, but people only really bought the higher
clock speeds. So there'd be like 4 NAND gate delays, and that was it. One
of the reasons why Intel machines have such incredibly deep pipelines:
everything was cut into pieces so they could have these clock speeds.</p>
<p>So. <mathjax>$t_{pd}$</mathjax> on your Xilinx FPGA for block RAM, which you care about, is
something like 2 ns from clock to data. 32-bit adders are also on the order
of 2 ns. What you're likely to end up with is a 50 MHz part. I also have to
worry about fast combinational logic -- what happens as the rising edge
goes high, my new input contaminates, and it messes up this register before
the setup time? Therefore <mathjax>$t_{ccq} + t_{pd} > t_{hold}$</mathjax>, necessarily, so we
need <mathjax>$t_{ccq} > t_{hold}$</mathjax> for a good flipflop (consider shift registers,
where we have basically no propagation delay).</p>
<p>Therefore <mathjax>$t_{pcq} + t_{setup} + t_{pd} < t_{cycle}$</mathjax>.</p>
<p>What does this have to do with the flipflop we know about? If we look at
the flipflop that we've done in the past (with inverters, controlled
buffers, etc), what is <mathjax>$t_{setup}$</mathjax>? We have several delays; <mathjax>$t_{setup}$</mathjax>
should ideally have D propagate to X and Y. How long is the hold
afterwards? You'd like <mathjax>$D$</mathjax> to be constant for an inverter delay (so that it
can stop having an effect). That's pretty stable. <mathjax>$t_{hold}$</mathjax> is something
like the delay of an inverter (if you want to be really safe, you'd say
twice that number). <mathjax>$t_{pcq}$</mathjax>, assuming we have valid setup, the D value
will be sitting on Y, and we've got two inverter delays, and <mathjax>$t_{ccq}$</mathjax> is
also 2 inverter delays.</p>
<p>Good midterm-like question for you: if I have a flipflop with some
characteristic setup and hold time, and I put a delay of 1 ps on the input,
and I called this a new flipflop, how does that change any of these things?
Can make <mathjax>$t_{hold}$</mathjax> negative. How do I add more delay? Just add more
inverters in the front. Hold time can in fact go negative. Lot of 141-style
stuff in here that you can play with.</p>
<p>Given that, you have to deal with the fact that you've got this propagation
time and the setup time. Cost of pipelined registers.</p>
<p>Critical path time, various calculations.</p>
<p><a name='13'></a></p>
<h1>Hazards, Stalls, Delay slots, Three-stage pipeline</h1>
<h2>October 2, 2012</h2>
<p>:)</p>
<p>Let's look at some hazards on the five-stage and then talk about what they
would look like in the three-stage. In the book, 7.51, this is where they
go through and look at what happens with the load word.</p>
<p>Must stall or use delay slot.</p>
<p><a name='14'></a></p>
<h1>MMIO</h1>
<h2>October 4, 2012</h2>
<p>Section 8.5. Not exactly perfect; we'll talk a bit about that this lecture,
but it gives you a good idea how that works. Talk a little bit about
3-stage pipeline and look at what happens if you put the regfile next to
the ALU instead of next to the instruction memory.</p>
<p>Last time, we had IMEM, Regfile, ALU all by itself, and then data memory
all over there. Let's now see what happens if we stick the regfile and ALU
together. Not at all clear to me on the FPGA you've got whether there's
going to be a substantial benefit to one or the other. Don't think it's
going to affect speed or complexity tremendously.</p>
<p>What you should be doing for your project is draw the basic single-cycle
MIPS (figure 7.11?), then add pipeline registers, label every single wire
in there; everything should be lined up; make sure that you put ALUOut
(there's going to be more than one of those: it's going to cross from the
execution phase to the memory phase, and so you're going to have at least
two of these.</p>
<p>Good midterm question: you decide you're going to forward. If we choose to
do that, what is <mathjax>$T_{c,min}$</mathjax> in this case?</p>
<p>Note: memory map in book is not the same as what we're using in the
project, but the concepts are all the same. So we've got a text segment
where the program actually goes; your global variables have some place in
here, and you initialize <code>$gp</code> to that, and the top chunk is the I/O (called
reserved in their diagram. You've got the heap that grows up, and the stack
that grows down. It's this region where you've got your MMIO, and I want to
make this clear what's going on.</p>
<p>From the book, figure 8.28, you've got your ALU, regfile, muxes, and
there's two things that come out of here that are important when you're
doing a read or a write. You've got the address, and you've got a <code>DataIn</code>
if you're doing a write, and there's your memory. There's also a DataOut
from memory (32 bits).</p>
<p>So far, we've been saying that this is just Dmem. In reality, Dmem is just
one of the things that lives in here. We've got a block that we've been
calling Dmem. There are other things: in particular, there's your UART
controller, and your UART controller has a bunch of lines that go to and
from the actual UART that you build, which has a single SIn and SOut. This
is what you connect your terminal to on this side, and there's a bunch of
things that go across this interface. Two sets of three lines to represent
the ready/valid interface.</p>
<p>Control line that tells the memory when to write; address and data going
into Dmem (it's only 12 bits that go in; you have to figure that out). This
guy up here, this is your decoder. You also have to have your instruction
memory live inside of here, and it for sure needs to get that Din and the
address input as well. It also only presumably has 12 bits of input for
your project, and it also has a write on it. And this controller needs to
be able to see some bits of address.</p>
<p><a name='15'></a></p>
<h1>Stack, Procedure Calls, Exceptions</h1>
<h2>October 9, 2012</h2>
<p>The homework this week is pretty much just things you're working on: how
did you implement j and jal? We are going to talk about the stack and
procedure calls, and also exceptions (interrupts!) -- 6.7.2, 7.7. Also,
look at your green sheet.</p>
<p>From the book, like we drew last time, we've got our memory allocation
system that starts at 0, and somewhere way at the top, we've got FFFFFFFC,
which gets chopped up into pieces. In particular, in the normal memory map,
all of this is reserved, which ends up being memory-mapped IO devices, and
you start off with your stack pointer pointing right here at 7FFFFFFC, and
then you've got another reserved section on the bottom for text, static,
and room for your stack to grow down and your heap to grow up, and your
stack pointer and global pointer. Your program counter ends up being
initialized at the bottom of the text section.</p>
<p>Some differences.</p>
<p>How do we do procedure calls? In our book, that's section 6.4.6. We'll look
at it from the simplest no args, no ret val, then we'll see how to do args
and return values, then local and global variables. Your code is main, just
calls a function simple, and simple, which is a void, just returns.</p>
<p>61C material. <code>jr $ra</code>. Turns out that doesn't work for a MIPS
architecture. What is the address that goes into the return address when
this is called? 0x8: book does not have delay slots. In particular, you're
going to have to put NOPs to get this to work. Actually, at memory location
1004, you just happened to have an instruction that was all 0, except it
had a 1000 at the very end. That's <code>jr</code>, so we end up with nastiness:
infinite looping, potentially.</p>
<p>So suppose I have args and stuff. We end up with <code>\$s0 = y</code>. Suppose we're
at 1000, how does this thing get called? <code>\$a</code> registers, <code>\$v</code> registers. </p>
<p>Utilization of delay slots.</p>
<p>What needs to be saved on the stack, and when?</p>
<p>Arrays on the stack, etc. When you complete a procedure, the stack pointer
should be back where it was before the call. So, you put it all together,
and you get the stack frame shown on your green card. You may have any args
above 4, you do your <code>jal</code>, <code>a0-3</code> may contain any args, <code>ra</code> has your
return address we talked about. The standard order is <code>a0-3</code>, <code>ra</code>, <code>s0-7</code>,
and local variables and <code>sp</code>, during procedure. Then you're going to do a
<code>jr ra</code> at the end, and <code>v0-1</code> contains stuff.</p>
<pre><code>void input() {
char s[20];
gets(s);
}
</code></pre>
<p>So what's the stack going to look like? Need to save <code>ra</code>, among other
things. The input that some friendly person puts in, some string of
characters here, 7fff0028, and then some string of 20 more
characters. Suppose that's my input. What's going to end up happening? I'm
going to say jump and link to <code>gets</code>, and I'm all done, so I'm going to do
a <code>jr</code> to my return address, and I'm going to have to load it back into
<code>ra</code>. I'll fix up my stack <code>sp</code>, and then I can finally return.</p>
<p>Buffer overflows (7fff0028)!</p>
<p>Notion of stack traces.</p>
<p>Finally, you've got global or static or extern variables, which go into
that part of memory called static. Pretty straightforward. Something you