-
Notifications
You must be signed in to change notification settings - Fork 1
/
psram4drv-dualCE.spin2
1373 lines (1247 loc) · 92.4 KB
/
psram4drv-dualCE.spin2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
{{
--------------------------------------------------------------------------------------------------
Propeller 2 PSRAM Driver COG
============================
This driver enables P2 COGs to access multiple PSRAM devices sharing a common data bus.
Features:
--------
* single COG PSRAM based driver
* supports single 4 bit PSRAM device(s), with common clock and chip enable
* up to 16 banks of memory devices can be mapped on 16MB/32MB/64MB/128MB/256MB boundaries
* device selected on the bus will be based on the address bank in the memory request
* configurable control pins and shared data bus group for the memory devices
* uses a 3 long mailbox per COG for reading memory requests and for writing results
* error reporting for all failed requests
* supports strict priority and round-robin request polling (selectable per COG)
* optional notification of request completion with the COGATN signal to the requesting COG
* re-configurable maximum transfer burst size limits setup per COG, and per device
* automatic fragmentation of transfers exceeding configured burst sizes or page
* sysclk/2 read/write transfer rates are supported
* provides single byte/word/long and burst transfers for reading/writing external memory
* input delay can be controlled to allow driver to operate with varying P2 clocks/temps/boards
* graphics copies/fill and other external memory to memory copy operations are supported
* request lists supported allowing multiple requests with one mailbox transaction (DMA engine)
* maskable read-modify-write support for atomic memory changes and sub-byte sized pixel writes
* unserviced COGs can be removed from the polling loop to reduce latency
Revision history:
----------------
0.91b 2 MAR 2022 rogloh -pre-release demo-
0.93b2 1 AUG 2022 rogloh -special custom build that allows syclk/3 & sysclk/4 rates-
--------------------------------------------------------------------------------------------------
}}
CON
'standard memory request masks
R_READBYTE = %1000 << 28 ' read (or RMW) a byte from device address
R_READWORD = %1001 << 28 ' read (or RMW) a word from device address
R_READLONG = %1010 << 28 ' read (or RMW) a long from device address
R_READBURST = %1011 << 28 ' read a burst of data from device into HUB RAM
R_WRITEBYTE = %1100 << 28 ' write byte(s) into device
R_WRITEWORD = %1101 << 28 ' write word(s) into device
R_WRITELONG = %1110 << 28 ' write long(s) into device
R_WRITEBURST= %1111 << 28 ' write a burst of HUB RAM data into device
'control request masks
R_GETLATENCY= %10000000 << 24 ' read driver's latency for a bank
R_GETREG = %10010000 << 24 ' read device register
R_GETPARAMS = %10100000 << 24 ' read a bank parameter long
R_DUMPSTATE = %10110000 << 24 ' dump COG+LUT RAM areas into HUB RAM for debug
R_SETLATENCY= %11000000 << 24 ' write driver's latency for a bank
R_SETREG = %11010000 << 24 ' write device register
R_SETPARAMS = %11100000 << 24 ' write a bank parameter long
R_CONFIG = %11110000 << 24 ' reconfigure COG QoS settings & poller code
'errors returned by driver
ERR_INVALID_BANK = -1 ' invalid bank is accessed
ERR_UNSUPPORTED = -2 ' operation not supported
ERR_INVALID_LIST = -3 ' invalid request in list
ERR_ALIGNMENT = -4 ' address is not aligned for type of request
ERR_BUSY = -5 ' flash is busy
'flag bits per COG
PRIORITY_BIT = 15 ' COG is strict priority polled if set
NOTIFY_BIT = 11 ' COG is also notified with COGATN if set
LOCKED_BIT = 10 ' COG's transfers are bus locked if set
LIST_BIT = 9 ' COG is executing a list when set
'flag type bits per bank
PROT_BIT = 11 ' bit is set if HyperFlash bank is exclusively protected by a COG
FLASH_BIT = 10 ' bit is set for HyperFlash or R/O RAM, cleared for R/W RAM
'driver configuration flag bits
FASTREAD_BIT = 31 ' bit set when reads are done at sysclk/1 transfer rate instead of sysclk/2
FASTWRITE_BIT = 30 ' bit set when writes are done at sysclk/1 transfer rate instead of sysclk/2
UNREGCLK_BIT = 29 ' bit set when unregistered clock pins are enabled (experimental only)
EXPANSION_BIT = 28 ' bit set to expand driver to run HUB exec code
SLOWCLK_BIT = 27
CLKSEL_BIT = 26
'reset timing
MIN_CS_DELAY_US = 150 ' minimum delay in microseconds after reset ends before first memory access
' size of the effective PSRAM page in bytes (1 device * 1kB)
PAGE_SIZE = 1024
PAGE_BITS = encod PAGE_SIZE
' number of P2 clocks in slowest path holding up real time COGs
SLOWEST_PATH = 388
' number of P2 clocks to read a burst from polling and back to polling,
' this is excluding transfer time and per fragment overheads which are added separately
BURST_READ_OVERHEAD_CLKS = 228
' additional P2 overhead clocks per fragment excluding transfer time
FRAGMENT_OVERHEAD_CLKS = 90
' number of P2 clocks to transfer a long
CLKS_PER_LONG = 16
'--------------------------------------------------------------------------------------------------
' This Cog is started by the SPIN2 memory driver API, however it can also be started independently
' in non-SPIN2 setups. The startup parameters and data formats it requires are identified below if
' started directly by PASM2 code or other environments. The HUB address of the startup parameter
' structure is to be passed into this driver COG when it starts up via PTRA.
'
' The first long of the structure (operating frequency) gets zeroed when the driver COG has
' completed its initialization so the caller can determine when it can start using the mailbox.
'
' COG startup parameter format (8 consecutive longs)
' startupParams[0]: P2 operating frequency in Hz (is zeroed after startup)
' startupParams[1]: configuration flags for driver
' startupParams[2]: port A reset mask (lower 32 P2 pins)
' startupParams[3]: port B reset mask (upper 32 P2 pins)
' startupParams[4]: base P2 pin number of data bus used by driver (0,4,8,12,...,60)
' startupParams[5]: pointer to 32 long device parameter structure in HUB RAM
' startupParams[6]: pointer to 8 long COG parameter structure in HUB RAM
' startupParams[7]: mailbox base address for this driver to read in HUB RAM (driver clear the mailbox mem)
'
'
' Configuration flags:
'
' bit31 - reserved (should be set to 0)
' bit30 - reserved (should be set to 0)
' bit29 - set 1 to enable unregistered clock pin, otherwise 0 for registered clock pins (default)
' bit28 - set 1 to enable expanded graphics functions using HUB-exec calls, 0 to disable
' bit27..0 - reserved (set to 0)
'
' Structure for COG parameters:
' 8 longs in the following format, one per COG id
' bits
' 31 16 15 12 11 10 9 8 0
' -------------------------------------------------------------
' | COG's Burst Limit | Priority | Flags | Internal |
' -------------------------------------------------------------
' COG's Burst Limit:
' This restricts a COG's maximum burst size which helps bound the latency for servicing the
' highest priority COG. A smaller burst size setup for lower priority COGs would let a video
' COG have its requests serviced sooner, for example. This burst size is measured in bytes.
'
' Priority (4 bits):
' This field holds an optional priority assigned to the COG being serviced by the driver.
' Strict priority COGs are serviced before any round-robin COGs. Round-Robin polled COGs
' share the bandwidth remaining fairly, by request but not necessarily bandwidth.
'
' Pattern Description
' ------- -----------
' 1_111 Highest priority polled COG
' 1_110 2nd highest priority polled COG.
' 1_... ...
' 1_000 Lowest priority polled COG (but still above round robin COGs)
' 0_xxx Round-robin polled COG
' if x==000, COG stalls if accessed flash is locked by another COG
' x<>000, COG returns with ERR_BUSY if flash locked by another COG
'
' Flag bits:
' 11 0 = serviced COG is notified of completion/error via its mailbox only
' 1 = serviced COG is notified with COGATN on completion as well as via its mailbox
' 10 0 = COG can be pre-empted when its burst is broken up (polling restarts then)
' 1 = COG's transfer is locked until it completes, even if broken into sub-bursts
' 9 Internally used flag for tracking request list processing state (value is ignored)
'
' Internal:
' 8-0 Internally maintained field that holds the COG ID within the driver. Value ignored.
'
' Device parameter structure:
'
' This is arranged as 16 bank parameter longs followed by 16 pin parameter longs, one each per bank.
' Devices over 16MB in size span multiple bank addresses (bits 24-27) and require duplicate values
' configured in each associated bank's parameter and pin parameter longs.
'
' Bank parameter long format (16 consecutive longs, one per bank):
' bit
' 31 16 15 12 11 8 7 0
' -------------------------------------------------------------
' | Maximum Burst | Delay | Flags | Size |
' -------------------------------------------------------------
' Maximum Burst:
' This is the burst size allowed for the device in bytes, assuming a sysclk/1 transfer rate.
' When unlimited. It should be configured to not exceed the maximum CS low time of 8us.
'
' Delay:
' This nibble is comprised of two fields which create the input delay needed for memory reads.
' bits
' 15-13 = P2 clock delay cycles when waiting for valid data bus input to be returned
' 12 = set to 0 if bank uses registered data bus inputs for reads, or 1 for unregistered
'
' Flags:
' bit
' 11 = reserved (set to 0)
' 10 = memory type: 0 = RAM, or 1 = ROM (writes get blocked)
' 9-8 = reserved
'
' Size:
' number of valid address bits - 1 used by the device mapped into this bank, e.g.:
' 16MB = 23
' 32MB = 24
' 64MB = 25
' 128MB = 26
' 256MB = 27
'
' NOTE: If the bank is not in use all its parameters must remain zeroed.
'
' The 16 bank parameters above are then followed by the 16 pin parameter longs, one long per bank.
'
' Pin parameter long format (16 consecutive longs, one per bank):
' bit
' 31 30 24 23 16 15 8 7 0
' -----------------------------------------------------------------
' | I | Reserved | Upper CE pin | CLK pin | Lower CE pin |
' -----------------------------------------------------------------
'
' The "I" bit (bit31) is the "invalid" bit, it must be set to 1 if the bank is not in use.
'
' The 3 pin fields define the P2 pins attached to the device control pins.
' In cases of 8MB devices, the upper and lower CE pins can be configured differently to enable
' a contiguous memory space. This will make the pair of device look somewhat like a single
' 16 bit device, although it won't be able to correctly cross the boundary during transfers.
' For 16MB or larger sized memories they should be set to the same values.
'--------------------------------------------------------------------------------------------------
' Returns the address of this driver's location in hub RAM to start it
PUB getDriverAddr() : r
return @driver_start
'--------------------------------------------------------------------------------------------------
DAT
orgh
driver_start
org
'..................................................................................................
' Memory layout for COG RAM once operational:
'
' COG RAM address Usage
' --------------- ----
' $00-$17 Mailbox data area (3 longs x 8 COGs) (24)
' $18-$1F Mailbox HUB parameter addresses per COG (8)
' $20-$7F COG service handlers (8 COGs x 12 longs per COG) (96)
' $80-$FF EXECF vector storage (8 requests x 16 banks) (128)
' $100-$197 Mailbox poller, error handlers, and all driver management code
' ~$198-$1F3 State and register variables
'
' Also during driver COG startup:
' $00-$17 is init as temporary init code - stage 2 (EXECF vector table init)
' $100-$1FF is uses as temporary init code - stage 1 (does HW setup & majority of driver init)
'..................................................................................................
' Mailbox storage after vector initialization
req0 call #init 'do HW setup/initialization
data0 rdlut c, b wz 'read bank info
count0 mov a, b 'set COGRAM address low nibble
req1 if_z mov ptra, #(no_vect & $1ff) 'set pointer to invalid vectors
data1 testb c, #FLASH_BIT wc 'check type: R/O PSRAM (1) or R/W PSRAM (0)
count1 if_nc_and_nz mov ptra, #(rw_vect & $1ff) 'set pointer to R/W PSRAM vectors
req2 if_c_and_nz mov ptra, #(ro_vect & $1ff) 'set pointer to R/O PSRAM vectors
data2 mov c, #$8 'setup vector base to $80
count2 setnib a, c, #1 'prepare vector base address for bank
req3 altd a, #0 'prepare COG destination read address
data3 rdlut 0-0, ptra++ 'read vector into table in COG RAM
count3 incmod c, #15 wz 'next vector
req4 if_nz jmp #count2 'repeat
data4 incmod b, #15 wz 'next bank
count4 if_nz jmp #data0 'repeat
req5 mov ptra, #$20 'setup base LUT address to clear
data5 rep #5, #80 'update next 80 longs
count5 cmp ptra, header wc 'check if LUT address range
req6 if_nc cmpr ptra, trailer wc '...falls in/outside control region
data6 if_c wrlut #0, ptra++ 'if outside, clear LUT RAM
count6 if_nc wrlut addr1, ptra++ 'copy control vector table into LUT
req7 if_nc add $-1, const512 'increment source of LUT write data
data7 mov burstwrite, 255-0 'save real bank 15 burst write vector
count7 _ret_ mov 255, #dolist 'setup list address, return to notify
' Mailbox parameter addresses per COG once patched
cog0mboxdata long 0*12+4 'address offset for cog0 mbox data
cog1mboxdata long 1*12+4 'address offset for cog1 mbox data
cog2mboxdata long 2*12+4 '...
cog3mboxdata long 3*12+4
cog4mboxdata long 4*12+4
cog5mboxdata long 5*12+4
cog6mboxdata long 6*12+4
cog7mboxdata long 7*12+4 'address offset for cog7 mbox data
'..................................................................................................
' Per COG request and state setup and service branching
cog0
mov ptra, #$20+0*10 'determine COG0 parameter save address
mov ptrb, cog0mboxdata 'determine COG0 mailbox data address
mov id, id0 'get COG0 state
getword limit, id, #1 'get COG0 burst limit
rdlut resume, ptra[8] wz 'check if we are in the middle of something
if_nz jmp #restore 'if so restore state and resume
mov addr1, req0 'get mailbox request parameter for COG0
mov hubdata, data0 'get COG0 mailbox data parameter
mov count, count0 'get COG0 mailbox count parameter
getbyte request, addr1, #3 'get request + bank info
altd request, #0 'lookup jump vector service table
execf request-0 'jump to service
cog1
mov ptra, #$20+1*10 'determine COG1 parameter save address
mov ptrb, cog1mboxdata 'determine COG1 mailbox data address
mov id, id1 'get COG1 state
getword limit, id, #1 'get COG1 burst limit
rdlut resume, ptra[8] wz 'check if we are in the middle of something
if_nz jmp #restore 'if so restore state and resume
mov addr1, req1 'get mailbox request parameter for COG1
mov hubdata, data1 'get COG1 mailbox data parameter
mov count, count1 'get COG1 mailbox count parameter
getbyte request, addr1, #3 'get request + bank info
altd request, #0 'lookup jump vector service table
execf request-0 'jump to service
cog2
mov ptra, #$20+2*10 'determine COG2 parameter save address
mov ptrb, cog2mboxdata 'determine COG2 mailbox data address
mov id, id2 'get COG2 state
getword limit, id, #1 'get COG2 burst limit
rdlut resume, ptra[8] wz 'check if we are in the middle of something
if_nz jmp #restore 'if so restore state and resume
mov addr1, req2 'get mailbox request parameter for COG2
mov hubdata, data2 'get COG2 mailbox data parameter
mov count, count2 'get COG2 mailbox count parameter
getbyte request, addr1, #3 'get request + bank info
altd request, #0 'lookup jump vector service table
execf request-0 'jump to service
cog3
mov ptra, #$20+3*10 'determine COG3 parameter save address
mov ptrb, cog3mboxdata 'determine COG3 mailbox data address
mov id, id3 'get COG3 state
getword limit, id, #1 'get COG3 burst limit
rdlut resume, ptra[8] wz 'check if we are in the middle of something
if_nz jmp #restore 'if so restore state and resume
mov addr1, req3 'get mailbox request parameter for COG3
mov hubdata, data3 'get COG3 mailbox data parameter
mov count, count3 'get COG3 mailbox count parameter
getbyte request, addr1, #3 'get request + bank info
altd request, #0 'lookup jump vector service table
execf request-0 'jump to service
cog4
mov ptra, #$20+4*10 'determine COG4 parameter save address
mov ptrb, cog4mboxdata 'determine COG4 mailbox data address
mov id, id4 'get COG4 state
getword limit, id, #1 'get COG4 burst limit
rdlut resume, ptra[8] wz 'check if we are in the middle of something
if_nz jmp #restore 'if so restore state and resume
mov addr1, req4 'get mailbox request parameter for COG4
mov hubdata, data4 'get COG4 mailbox data parameter
mov count, count4 'get COG4 mailbox count parameter
getbyte request, addr1, #3 'get request + bank info
altd request, #0 'lookup jump vector service table
execf request-0 'jump to service
cog5
mov ptra, #$20+5*10 'determine COG5 parameter save address
mov ptrb, cog5mboxdata 'determine COG5 mailbox data address
mov id, id5 'get COG5 state
getword limit, id, #1 'get COG5 burst limit
rdlut resume, ptra[8] wz 'check if we are in the middle of something
if_nz jmp #restore 'if so restore state and resume
mov addr1, req5 'get mailbox request parameter for COG5
mov hubdata, data5 'get COG5 mailbox data parameter
mov count, count5 'get COG5 mailbox count parameter
getbyte request, addr1, #3 'get request + bank info
altd request, #0 'lookup jump vector service table
execf request-0 'jump to service
cog6
mov ptra, #$20+6*10 'determine COG6 parameter save address
mov ptrb, cog6mboxdata 'determine COG6 mailbox data address
mov id, id6 'get COG6 state
getword limit, id, #1 'get COG6 burst limit
rdlut resume, ptra[8] wz 'check if we are in the middle of something
if_nz jmp #restore 'if so restore state and resume
mov addr1, req6 'get mailbox request parameter for COG6
mov hubdata, data6 'get COG6 mailbox data parameter
mov count, count6 'get COG6 mailbox count parameter
getbyte request, addr1, #3 'get request + bank info
altd request, #0 'lookup jump vector service table
execf request-0 'jump to service
cog7
mov ptra, #$20+7*10 'determine COG7 parameter save address
mov ptrb, cog7mboxdata 'determine COG7 mailbox data address
mov id, id7 'get COG7 state
getword limit, id, #1 'get COG7 burst limit
rdlut resume, ptra[8] wz 'check if we are in the middle of something
if_nz jmp #restore 'if so restore state and resume
mov addr1, req7 'get mailbox request parameter for COG7
mov hubdata, data7 'get COG7 mailbox data parameter
mov count, count7 'get COG7 mailbox count parameter
getbyte request, addr1, #3 'get request + bank info
altd request, #0 'lookup jump vector service table
execf request-0 'jump to service
fit 128
pad long 0[128-$] 'align init code to $80
'..................................................................................................
' This initialization code ($80-$FF) gets reused as the main service EXECF jump table later (128 longs)
init
' read in the additional LUT RAM code
add lutcodeaddr, ptrb 'determine hub address of LUT code
setq2 #511-(hwinit & $1ff) 'read the remaining instructions
rdlong hwinit & $1ff, lutcodeaddr '...and load into LUT RAM address $240
' read the startup parameters
setq #8-1 'read 8 longs from hub
rdlong startupparams, ptra '.. as the startup parameters
' setup some of the config flag dependent state and patch LUTRAM
testb flags, #EXPANSION_BIT wz'test for graphics expansion enabled
if_z add expansion, ptrb 'compensate for HUB address
if_nz mov expansion, ##donerepeats'disable expansion when flag bit clear
testb flags, #SLOWCLK_BIT wz
if_z shl clkduty, #1 'sysclk/4 or sysclk/3 is enabled
testb flags, #CLKSEL_BIT wc
if_nc_and_z shr xfreq2, #1 'enable sysclk/4 if clksel=0
if_nc_and_z add clkdelay, #13 'enable sysclk/4 if clksel=0
if_c_and_z mov xfreq2, ##$2AAAAAAB 'override streamer frequency for sysclk/3 if clksel=1
if_c_and_z sub clkduty, #1 'sysclk/3 is selected if clksel=1
if_c_and_z add clkdelay, #5 'enable sysclk/3 if clksel=1
' setup data pin modes and data bus pin group in streamer commands
and datapins, #%111100 'compute base pin
or datapins, ##(3<<6) 'configure 4 pins total
mov a, datapins 'get data pin base
wrpin registered, datapins 'prepare data pins for address phase transfer
shr a, #3 wc 'determine data pin group
or a, #8
setnib ximm8, a, #5 'setup bus group in streamer
bitc ximm8, #19
setnib xrecvdata, a, #5
bitc xrecvdata, #19
setnib xsenddata, a, #5
bitc xsenddata, #19
setnib xsendimm, a, #5
bitc xsendimm, #19
' setup device control pin states
setq2 #32-1 'read 32 longs to LUTRAM
rdlong $000, devicelist 'read bank/pin data for all banks
mov const512, ##512 'prepare constant
' generate minimum CE high time before access
qdiv frequency, ##1000000 'convert from Hz to MHz
getqx c 'get P2 clocks per microsecond
mov a, #MIN_CS_DELAY_US 'get time before active delay in microseconds
mul c, a 'convert microseconds to clocks
mov ptrb, #16 'point to bank pin config data
mov a, #16
pinloop
rdlut pinconfig, ptrb++ wc 'invalid if pin config bit 31 is one
if_c jmp #pinlooptest
and pinconfig, pinmask 'save us from invalid bits in args
getbyte cspin, pinconfig, #0 'read CS pin number for low 8MB sub-bank
wrpin #0, cspin 'clear smart pin mode
drvh cspin 'setup pins for all banks
getbyte cspin, pinconfig, #2 'read secondary CS pin number for high sub-bank
wrpin #0, cspin 'clear smart pin mode
drvh cspin 'setup pins for all banks
getbyte clkpin, pinconfig, #1 'read CLK pin number
fltl clkpin 'disable Smartpin clock output mode
wrpin clkconfig, clkpin 'set clk to Smartpin pulse mode output
wxpin #1, clkpin 'configure for 1 clocks between updates
drvl clkpin 'set clk state low
waitx c 'delay
call #hwinit 'setup HW into QSPI mode for lower 8MB sub-bank
xor cspin, pinconfig 'compare secondary CS pin # against primary CS pin #
test cspin, #$3f wz 'with only 6 bits of significance
if_ne getbyte cspin, pinconfig, #0 'revert back to different primary CS pin number
if_ne call #hwinit 'setup HW into QSPI mode for upper 8MB sub-bank
pinlooptest
djnz a, #pinloop
bith ximm8, #16 'reverse order of nibbles after init
' setup the COG mailboxes and addresses
rep #2, #8 'setup loop to patch mailbox addresses
alti $+1, #%111_000 'increase D field
add cog0mboxdata, mbox 'apply base offset to mailbox data
setq #24-1
wrlong #0, mbox 'clear out mailboxes ????
' setup the polling loop for active COGs
cogid id
alts id, #id0 'determine id register of control COG
setd patchid, #0 'patch into destination address
push ptra 'save ptra before we lose it
mov ptra, #10
mul ptra, id
add ptra, #$20 'prep ptra for reloadcogs
alts id, #cog0_handler 'add to handler base
sets ctrlpollinst, #0-0 'patch into jump instruction
mul id, #3
setd ctrlpollinst, id
rdlut id, ptra[9] 'save original value
wrlut initctrl, ptra[9] 'prep LUT data for reloadcogs
call #reloadcogs
wrlut id, ptra[9] 'restore original value
pop ptra 'restore original ptra
' move LUT control vectors into temporary location to avoid clobbering them later
setd d, #addr1
sets d, #(ctrl_vect & $1ff)
rep #2, #8
alti d, #%111_111 'patch & increment d/s fields in next instr.
rdlut addr1-0, #$60-0
'setup control COG service handling, we need to patch 5 instructions
'one existing instruction is moved earlier and four instructions get replaced
cogid id
mov a, #(cog1-cog0) 'get code separation of handlers
mul a, id 'scale ID by separation
add a, #cog0+4 'add to base for COG0 and offset
setd d, a 'set this as the destination
add a, #2 'increment COG address
sets d, a 'set this as the source
alti d, #%111_100
mov 0-0, 0-0 'move instruction
sets d, #controlpatch 'set source of patched instructions
rep #2, #2 'patch two instructions
alti d, #%111_111
mov 0-0, 0-0
add d, const512 'skip two instructions
add d, const512
rep #2, #2 'patch two instructions
alti d, #%111_111
mov 0-0, 0-0
' setup register values for control vector loop setup after we return
mov header, id 'get cog ID
mul header, #10 'multiply by size of state memory per COG
add header, #$20 'add to COG state base address in LUT
mov trailer, header 'determine start/end LUT address
add trailer, #9 '...for control region
or id, initctrl 'set id field for control COG
altd id, #id0
mov 0-0, id 'setup id field for notification
mov ptrb, ptra 'get startup parameter address
add ptrb, #4 'ptrb[-1] will be cleared at notify
mov b, #0 'prepare b for upcoming loop
_ret_ push #notify 'continue init in mailbox area
controlpatch getnib request, addr1, #7 'instructions to patch for control COG
and request, #7
add request, ptra 'add request vector offset
rdlut request, request 'lookup jump vector service table
fit $100 'ensure all init code fits this space
long 0[$100-$] 'pad more if required until table ends
'..................................................................................................
' Error result handling and COG notification of request completion
unsupported callpa #-ERR_UNSUPPORTED, #err 'operation not supported
invalidbank callpa #-ERR_INVALID_BANK, #err'bank accessed has no devices mapped
invalidlist callpa #-ERR_INVALID_LIST, #err'invalid list item request
alignmenterror callpa #-ERR_ALIGNMENT, #err 'flash alignment error
busyerror mov pa, #-ERR_BUSY 'flash busy, falls through...
err altd id, #id0 'adjust for the running COG
bitl 0-0, #LIST_BIT 'cancel any list in progress by this COG
wrlut #0, ptra[8] 'cancel any resume state
skipf #%10 'dont notify with success code 0 below
wrlong pa, ptrb[-1] 'set error code in mailbox response
notify wrlong #0, ptrb[-1] 'if no error, clear mailbox request
testb id, #NOTIFY_BIT wz 'check if COG also wants ATN notification
decod a, id 'convert COG ID to bitmask
if_z cogatn a 'notify COG via ATN
' Poller re-starts here after a COG is serviced
poller testb id, #PRIORITY_BIT wz 'check what type of COG was serviced
if_nz incmod rrcounter, rrlimit 'cycle the round-robin (RR) counter
bmask mask, rrcounter 'generate a RR skip mask from the count
' Main dynamic polling loop repeats until a request arrives
polling_loop rep #0-0, #0 'repeat until we get a request for something
setq #24-1 'read 24 longs
rdlong req0, mbox 'get all mailbox requests and data longs
polling_code tjs req0, cog0_handler ']A control handler executes before skipf &
skipf mask ']after all priority COG handlers if present
tjs req1, cog1_handler ']Initially this is just a dummy placeholder
tjs req2, cog2_handler ']loop taking up the most space assuming
tjs req3, cog3_handler ']a polling loop with all round robin COGs
tjs req4, cog4_handler ']from COG1-7 and one control COG, COG0.
tjs req5, cog5_handler ']This loop is recreated at init time
tjs req6, cog6_handler ']based on the active COGs being polled
tjs req7, cog7_handler ']and whether priority or round robin.
tjs req1, cog1_handler ']Any update of COG parameters would also
tjs req2, cog2_handler ']regenerate this code, in case priorities
tjs req3, cog3_handler ']have changed.
tjs req4, cog4_handler ']A skip pattern that is continually
tjs req5, cog5_handler ']changed selects which RR COG is the
tjs req6, cog6_handler ']first to be polled in the seqeuence.
pollinst tjs req7, cog7_handler 'instruction template for RR COGs
ctrlpollinst tjs req0, cog0_handler 'instruction template for control
skipfinst skipf mask 'instruction template for skipf
'..................................................................................................
' List handler
dolist tjf addr1, #real_list 'if addr1 is all ones this is a real list
execf burstwrite 'otherwise do a burst write to this bank
real_list setq #8-1 'read 8 longs (largest request size)
rdlong addr1, hubdata '..to update the request state
tjns addr1, #invalidlist 'error if request list item not valid
altd id, #id0 'get COG state
bith 0-0, #LIST_BIT wcz 'retain fact that we are in a list
bith id, #LIST_BIT 'retain fact that we are in a list
if_z jmp #unsupported 'no list recursion is allowed!
getbyte request, addr1, #3 'get upper byte of this request
service_request altd request, #0 'get request address in COG RAM
execf 0-0 'process the request
'..................................................................................................
' Restoring per COG state and resuming where we left off
restore rdlut addr1, ptra[0] 'restore then continue with working state
rdlut hubdata, ptra[1]
rdlut count, ptra[2]
rdlut addr2, ptra[3] wc 'C=1 indicates an extended request size
getbyte request, addr1, #3
if_nc execf resume 'if not extended then resume immediately
rdlut total, ptra[4] 'we need to read the extended parameters
rdlut offset1, ptra[5]
rdlut offset2, ptra[6]
rdlut link, ptra[7]
rdlut orighubsize, ptra[9]
execf resume 'then resume what we were doing last time
'..................................................................................................
' Re-configuration of QoS settings and custom polling loop sequence generator
reconfig push #notify 'setup return addr, then reload
reloadcogs setq #8-1 'reload all per COG QoS params
rdlong addr1, coglist 'use addr1+ as 8 long scratch area
setd a, #id0
sets a, #addr1
setq ##!($FF + (1<<LIST_BIT))'preserve list flag and COG ID state
rep #2, #8 'repeat for 8 COGs
alti a, #%111_111
muxq 0-0, 0-0
patchid rdlut 0-0, ptra[9] 'restore static control COG ID information
cogid c
decod excludedcogs, c 'exclude driver cog initially
mov a, #$8 'a iterates through prio levels 8=lowest
neg pa, #1 'start with all ones
fillprio mov c, #7 'c iterates through cogs
prioloop alts c, #id0
mov b, 0-0
getword d, b, #1 'get burst field
test d wz 'if burst=0
if_z bith excludedcogs, c '...then exclude this COG from polling
if_z jmp #excluded
getnib d, b, #3 'get RR/PRI flag & priority
cmp d, a wz 'compare against current priority level
if_z rolnib pa, c, #0 'if matches include COG at this level
excluded djnf c, #prioloop 'repeat for all 8 COGs
incmod a, #15 wz 'next level
if_nz jmp #fillprio
'determine priority cogs and build instructions for the polling sequence
mov pb, #0 'clear out set of priority COGs
mov a, #3 'start with no COGs being polled + 3 instructions
setd d, #polling_code 'initialize COGRAM write position
rep @endprioloop, #8 'test all 8 priority slots
testb pa, #3 wc 'test validity bit, c=1 if invalid
getnib c, pa, #0 'get cogid ID at this priority level
if_nc testb pb, c wz 'check if already exists as priority COG
if_nc_and_nz bith pb, c 'and only add if it doesn't
if_nc_and_nz add a, #1 'add another COG to poll
if_nc_and_nz alts c, #cog0_handler 'determine jump address per COG
if_nc_and_nz sets pollinst, #0-0 'patch jump handler in instruction
if_nc_and_nz mul c, #3
if_nc_and_nz setd pollinst, c 'patch REQ slot to poll in instruction
if_nc_and_nz alti d, #%111_000 'generate new COG RAM write address
if_nc_and_nz mov 0-0, pollinst 'move the instruction to COG RAM
ror pa, #4 'advance to next priority
endprioloop
xor pb, #$ff 'invert to find all the non-priority COGs
andn pb, excludedcogs 'and remove any other excluded COGs
ones rrlimit, pb wz 'count the number of RR COGs
add a, rrlimit 'account for this number of RR COGs to poll
sub rrlimit, #1 'setup last RR count value for incmod
alti d, #%111_000 'generate the control polling instruction
mov 0-0, ctrlpollinst 'write the instruction
if_nz alti d, #%111_000 'if RR COG count not zero we need a skipf
if_nz mov 0-0, skipfinst 'add the skipf instruction
if_nz add a, #2 'account for the extra skipf overhead instructions
setd polling_loop, a 'save it as the repeat count
if_z ret 'we are done now, if no round robin COGs
' populate the round robin COG polling instructions
mov rrcounter, #2 'fill the RR poll instruction list twice
rrloop mov b, pb 'get the set of RR COGs
mov c, #0 'start at COG ID = 0
mov a, #0 'req mailbox COGRAM address for COG 0
nextrrcog shr b, #1 wcz 'test for COG ID in RR COG set, set C=1
if_c setd pollinst, a 'patch REQ slot to poll in instruction
if_c alts c, #cog0_handler 'determine jump address
if_c sets pollinst, #0-0 'patch jump handler in instruction
if_c alti d, #%111_000 'generate new COG RAM write address
if_c mov 0-0, pollinst 'move the instruction to COG RAM
add c, #1 'increment the COG ID
add a, #3 'increase the request address
if_nz jmp #nextrrcog 'repeat for all COG IDs
_ret_ djnz rrcounter, #rrloop 'repeat twice, leave rrcounter zeroed
'..................................................................................................
' Code to get/set driver settings per bank or to dump COG/LUT state
set_latency ' (a) set latency
get_latency ' (b) get latency
set_burst ' (c) set burst size of bank
get_burst ' (d) get burst size of bank
' (e) dump state
getnib b, addr1, #6 ' a b c d get bank address
dump_state setq #511 ' | | | | e prepare burst write
wrlong 0, hubdata ' | | | | e write COG RAM to HUB
' | | | | e account for following AUGS
add hubdata, ##2048 ' | | | | e advance by 2k bytes
setq2 #511 ' | | | | e prepare burst write
wrlong 0, hubdata ' | | | | e write LUT RAM to HUB
add b, #16 ' a b | | | point to latency params
rdlut a, b ' a b c d | read data for bank
setbyte a, hubdata, #3 ' a | | | | patch latency
mov a, hubdata ' | | c | | patch burst/delay etc
wrlut a, b ' a | c | | if setting, save bank data
getbyte a, a, #3 ' | b | | | extract latency field only
wrlong a, ptrb ' | b | d | write result
jmp #notify ' a b c d e return success
'..................................................................................................
' Misc EXECF code
start_read_exec execf newburstr
start_write_exec execf resumewrites
continue_read_exec execf lockedreads
continue_write_exec execf lockedwrites
'..................................................................................................
' Variables
ximm8 long $6000_0008 '8 nibble transfers to pins
xrecvdata long $E000_0000 'arbitrary 4 bit reads from 4 bit bus pins
xsenddata long $A000_0000 'arbitrary 4 bit writes from hub to pins
xsendimm long $6000_0002 'arbitrary nx4 bit immediate writes from imm to pins
xfreq1 long $80000000
xfreq2 long $40000000
delay long 3
lutcodeaddr
startupparams
excludedcogs 'careful: shared register use!
frequency long lut_code - driver_start 'determine offset of LUT code from base
flags long 0
mask 'careful: shared register use!
resetmaskA long 0
limit 'careful: shared register use!
resetmaskB long 0
datapins long 0
const512 'careful: shared register use!
devicelist long 0
coglist long 0
mbox long 0
clkpin 'shared with code patched during init
clockpatch wxpin #1, clkpin 'adjust transition delay to # clocks
cspin 'shared with code patched during init
speedpatch setxfrq xfreq1 'instruction to set read speed to sysclk/1
registered long %100_000_000_00_00000_0 'config pin for clocked input
clkconfig long %100_000_000_01_00100_0 'config for Smartpin pulse output mode
clkdelay long 19
regdatabus long 0
deviceaddr long $10
rrcounter
pinmask long $ff7f7f7f
' jump addresses for the per COG handlers
cog0_handler long cog0
cog1_handler long cog1
cog2_handler long cog2
cog3_handler long cog3
cog4_handler long cog4
cog5_handler long cog5
cog6_handler long cog6
cog7_handler long cog7
expansion long gfxexpansion - driver_start
' EXECF sequences
newburstr long (%1111000000000011111100 << 10) + r_burst
lockedfill long (%0000000011101111100110 << 10) + w_locked_fill
restorefill long (%0001110100000001011100 << 10) + w_fill_cont
lockedreads long (%0000000001111000111100 << 10) + r_locked_burst
lockedwrites long (%0000000111100000111100 << 10) + w_resume_burst
resumewrites long (%0000000111100000000000 << 10) + w_resume_burst
resumereads long (%0000000011110000000000 << 10) + r_resume_burst
' SKIPF sequences
skiptable long %110110000 ' read modify write byte
long %101101000 ' read modify write word
long %011011000 ' read modify write long
read_skip long %11111110000110 ' extended single read skip sequence
write_skip long %1100011100011111110 ' extended single write skip sequence
fill_skip long %11000001000000010 ' extended fill skip sequence
burst_skip long %001111100000000 ' extended burst skip sequence
skipcase_a long %01101111100000000001000011111101
skipcase_b long %00000000010100000001100000010000
skipcase_c long %00000000011000000011111000010001
skipseq_write long %00000000000000000000111100000010
' LUT RAM address values
complete_rw long complete_rw_lut
continue_read long continue_read_lut
continue_write long continue_write_lut
noread long noread_lut
id0 long 0
id1 long 1
id2 long 2
id3 long 3
id4 long 4
id5 long 5
id6 long 6
id7 long 7
'These next 10 request registers below are also temporarily reused during init
'and COG updates and need to follow immediately after id0-id7
addr1 long 0
hubdata long 0
count long 0
addr2 long 0
total long 0
offset1 long 0
offset2 long 0
link long 0
burstwrite 'note shared register use during init
initctrl long $FEF01000 'round robin, burst=$fff0, no ATN, ERR on busy
id long 0
header long 0
trailer long 0
cmdaddr long 0
request long 0
rrlimit long 0
pinconfig long 0
clks long 0
resume long 0
orighubsize long 0
wrclks long 0
pattern long 0
pagesize long PAGE_SIZE
clkduty long $10002 'default is 50% clock @sysclk/2 rate
' temporary general purpose regs
a long 0
b long 0
c long 0
d long 0
fit 502
'..................................................................................................
orgh
lut_code
'HW init code up to 80 longs
'..................................................................................................
' Memory layout for LUT RAM once operational:
'
' LUT RAM address Usage
' --------------- ----
' $200-$20F Bank parameters: burst + type + size per bank (16)
' $210-$21F Pin parameters : latency + control pins per bank (16)
' $220-$26F COG state storage (8 COGs x 10 longs per COG)
' $270-$3FF Main PSRAM access code in LUTRAM
'
' Also during driver COG startup:
' $230-$24F is used for HW init setup
' $250-$26F is used as temporary vector storage
'..................................................................................................
org $230
' routines to (re-)initialize PSRAM chip into QSPI mode from whatever it was before
hwinit setxfrq xfreq2
pollxfi
mov pa, #$5F '$F5 - exit QSPI mode if we were in this mode
call #sendqspi
mov pa, ##$0FF00FF0 '$66 - reset enable
call #sendspi
mov pa, ##$F00FF00F '$99 - reset
call #sendspi
mov pa, ##$F0F0FF00 '$35 - enter quad spi mode
call #sendspi
ret
sendqspi mov clks, #2
skipf #%110
mov pb, xsendimm
sendspi mov clks, #8
mov pb, ximm8
wxpin #1, clkpin
drvl cspin 'active low chip select
drvl datapins 'enable the DATA bus
wxpin clkduty, clkpin 'start clock
nop 'delay for clock edge alignment
xinit pb, pa 'send 32 bit immediate data
wypin clks, clkpin 'start memory clock output
waitxfi 'wait for the completion
fltl datapins 'float data bus
drvh cspin 'raise chip select
_ret_ waitx #200 'delay before return to ensure CS delay
long 0[$270-32-$]
fit $270-32 ' keep room for 32 vector longs
' EXECF vectors only used during bank initialization at startup time, reclaimed later for COG state
rw_vect ' PSRAM jump vectors
long (%0001101100110100001100 << 10) + r_single
long (%0001011100110100001000 << 10) + r_single
long (%0000111100110100010100 << 10) + r_single
long (%1111000000000011111100 << 10) + r_burst
long (%0001110100000000001100 << 10) + w_single
long (%0001110100000000001000 << 10) + w_single
long (%0001110100000000000100 << 10) + w_single
long (%0001111000000000000110 << 10) + w_burst
ro_vect ' R/O PSRAM jump vectors
long (%0001101100110100001100 << 10) + r_single
long (%0001011100110100001000 << 10) + r_single
long (%0000111100110100010100 << 10) + r_single
long (%1111000000000011111100 << 10) + r_burst
long unsupported
long unsupported
long unsupported
long unsupported
ctrl_vect ' Control jump vectors
long (%0000000000111001111110 << 10) + get_latency
long unsupported
long (%0000000001111011111110 << 10) + get_burst
long (%0000000001111111000000 << 10) + dump_state
long (%0000000011010001111110 << 10) + set_latency
long unsupported
long (%0000000011001011111110 << 10) + set_burst
long reconfig
no_vect ' Invalid bank jump vectors
long invalidbank
long invalidbank
long invalidbank
long invalidbank
long invalidbank
long invalidbank
long invalidbank
long invalidbank
fit $270
'..................................................................................................
' PSRAM READS
' a b c d e f
' B W L B R L (a) byte read
' Y O O U E O (b) word read
' T R N R S C (c) long read
' E D G S U K (d) new burst read
' T M E (e) resumed sub-burst
' E D (f) locked sub-burst
r_burst mov orighubsize, count ' d preserve the original transfer count
tjz count, #noread_lut ' d check for any bytes to send
r_single test count wz ' a b c | test for RMW (z=0 if RMW)
modz 5 wz ' a b c | test for RMW (z=1 if RMW)
andn addr1, #1 ' | b | | align to word boundary to prevent page issues
andn addr1, #3 ' | | c | align to long boundary to prevent page issues
wrlong #0, ptrb ' a b | | clear upper bits of byte/word mailbox result
wrfast xfreq1, ptrb ' a b c | setup streamer hub address for singles
r_resume_burst getnib b, request, #0 ' a b c d e get bank parameter LUT address
rdlut b, b ' a b c d e get bank limit/mask
bmask mask, b ' | | | d e build mask for addr
getbyte delay, b, #1 ' a b c d e get input delay of bank + flags
shr b, #17 ' | | | d e scale burst size based on bus rate
fle limit, b ' | | | d e apply any per bank limit to cog limit