Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[P8X32A]: wrbyte wz #1

Open
konimaru opened this issue Oct 8, 2015 · 25 comments
Open

[P8X32A]: wrbyte wz #1

konimaru opened this issue Oct 8, 2015 · 25 comments
Assignees

Comments

@konimaru
Copy link
Owner

konimaru commented Oct 8, 2015

This used to work, i.e. always works in test cases. Here it seems to depend on randomly inserted insn.

  • I found the general issue (emulator Z flag not being set)
  • wrbyte value, address wz with value == $xxxxxx00 should set the native Z flag
  • current theory is memory corruption (behaves the same on two different h/w platforms, C3+QS)
@konimaru konimaru added the bug label Oct 8, 2015
@konimaru
Copy link
Owner Author

I placed a nop before and after a selected insn. The binary image stayed the same otherwise, i.e. no address references were affected. The only affected part was hub timing. Both images showed different behaviour (OK/NG).

This suggests interference from another cog doing hub ops.

@konimaru konimaru self-assigned this Oct 20, 2015
@konimaru
Copy link
Owner Author

konimaru commented Oct 21, 2015

@konimaru
Copy link
Owner Author

As a next step I tried to isolate the disturbance. The serial driver (running in parallel) does two rdlongs which fetch a non-zero value in byte lane 0 (last/first index). Creating a stand-alone code sequence shows the same behaviour. The serial driver is now started later (when everything is over) and the custom loop took its place.

By manipulating said loop I can make the emulator wrbyte wz call fail whenever I want (down to one non-zero read per hub window, i.e. always).

  • validate the cogid distance (currently 1) distance doesn't matter
  • figure out what's special about the emulator (standalone wrbyte wz seems to work)

@konimaru
Copy link
Owner Author

It's safe to say that the behaviour we want (see OP) is not guaranteed. The 6502 emulator code has been rewritten to cope with that (minimal overhead, a single long IIRC). All I can say right now is it's complicated.

@konimaru konimaru changed the title wrbyte wz [6502]: wrbyte wz Apr 18, 2016
@konimaru konimaru changed the title [6502]: wrbyte wz [P8X32A]: wrbyte wz Apr 18, 2016
@konimaru
Copy link
Owner Author

konimaru commented Feb 18, 2020

Time flies! Had another dig today and it jumped right into my face: if a cognew/coginit is in progress while the wrbyte wz is executed the Z flag remains zero (when it's expected to be set).

I chew on this a bit longer but I guess that's it.

@konimaru
Copy link
Owner Author

konimaru commented Feb 18, 2020

As already pointed out, a (manual) rdlong sequence has the same effect. Which ties in nicely with cognew & Co.

@konimaru
Copy link
Owner Author

konimaru commented Feb 19, 2020

current test case:

  • archived state is showing a good result (wrbyte wz working)
  • it includes 4 test cases (marked {TEST}) which - when activated - may affect the result
    • eins: cognew while emulator is running: NG
    • zwei: busy wrlong loop: OK
    • drei: busy cogid/rdlong loop: NG
    • vier: busy rdlong/cogid loop: OK

Note: if drei or vier are activated in combination with zwei the test result may be reversed (most likely timing). A busy rdlong loop will always be NG.

A good test result is indicated by the first printed 32bit hex value being of the form $000600??, a bad one will show $000400??.

wrbyte.wz - Archive [Date 2020.02.19 Time 07.49].zip

OK
OK
NG
NG

@Wuerfel21
Copy link

Wuerfel21 commented Feb 19, 2020

Hmm, something is up, can reproduce your test case failing.

Now here's a fun one: a much simpler test case where one can mess up everything at the price of uncommenting a single inconspicuously marked instruction
test_wrbyte_wz.zip

@konimaru
Copy link
Owner Author

konimaru commented Feb 19, 2020

Brilliant! Pattern %011 is something I haven't seen yet.

@Wuerfel21
Copy link

I think the more interesting observation to be had here is that hammering a single location never triggers the bug, whereas linear reading does it reliably

@konimaru
Copy link
Owner Author

Single location reads work as well, if you remove the cogid insn in my test case you always go south.

DAT             org     0

five            rdlong  par, :src
                jmp     #five
:src            long    $8001

gives me a $000400?? reliably.

@konimaru
Copy link
Owner Author

konimaru commented Feb 19, 2020

Fact is that one cog can screw up someone elses ALU results under the assumption that wrbyte wz is not illegal.

I still think we are missing the bigger picture :)

@konimaru
Copy link
Owner Author

konimaru commented Feb 19, 2020

Additional note re: five, that location holds a non-zero value. When I attach the rdlong to say $7FFC (default 0) the test passes. When I put a non-zero value into byte lane M it subsequently fails (wrbyte to 4n+M). IOW it's not just reading from hub, content is also important.

@konimaru
Copy link
Owner Author

I can also generate %011 on demand now. Original value 0, rdlong loop with 0 in byte lane 3 (the wrbyte is to 4n+3).

  • original value gives %001
  • byte-lane-3-zero-read appears to override the wrbyte -> %010
  • dec -> -1 -> %0--

@konimaru
Copy link
Owner Author

konimaru commented Feb 19, 2020

So it looks like an ongoing rdxxxx's value in the relevant byte lane will influence the Z flag for any wrbyte wz on that same byte lane. How cool is that?

@Wuerfel21
Copy link

Oh no, now we can do a sidechannel attack to steal data from other cogs! 😉

But that all still doesn't explain why my test doesn't trip when hammering every address individually vs reading memory linearly. The result stays the same if I add a nop to take the place of the commented-out add, so it's not related to the read's waitstates. Speaking of which, do the wrbyte wz's waitstates have any influence?

@konimaru
Copy link
Owner Author

konimaru commented Feb 19, 2020

If you let the garbo cog read only from $8000 (without add) it goes wrong as well.

@Wuerfel21
Copy link

Wuerfel21 commented Feb 19, 2020

Oh, my test is just busted, sorry. (garbo cog should be getting ptr into PAR, but is getting a pointer to itself)

(But why would hammering reads of mov testbyte,par not affect the flag?)

EDIT 2: oof, confused the labels. It was hammering the same location that being wrbyte'd - of course that wouldn't show up 🤦‍♀️

@Wuerfel21
Copy link

Also, interesting: If a cog that hammers writes is put between the read cog and the test cog, that doesn't change anything at all - the result from the last RD**** seems to linger for a while?

Also, is it just me and my busted tests or can the "false NZ" case only occur when the read is from ROM?

@konimaru
Copy link
Owner Author

konimaru commented Feb 19, 2020

Also, is it just me and my busted tests or can the "false NZ" case only occur when the read is from ROM?

long[$7FFC] := -1 and then let the garbo cog read from there (failure %100).

@Wuerfel21
Copy link

Well, I guess I didn't have any -1's in RAM

@konimaru
Copy link
Owner Author

konimaru commented Feb 22, 2020

Well, it has been an interesting exercise.

  • wrbyte wz is influenced by any read operation regardless of which cog did the read
  • if the read and write addresses are in the same 4K block (e.g. $4xxx) then everything works as expected, Z is set according to what is written
  • if both addresses live in different 4K blocks then the Z flag is set depending on the content of the read address regardless(!) of what is written to the write address

Note: When the write address is located in ROM the written value is ignored, Z is set according to the destination value.

@Wuerfel21
Copy link

Random thought now that it came up again somewhere else: Does the behaviour depend on a timing peculiarity? Because I don't remember anything like this in the Verilog. In particular, the 4K block thing, I don't think the Verilog ever divides memory into 4K blocks.... But now look at this die shot:
image
Hmmmmmmmmmmmmmmmm... 🤔

I think it'd be interesting to see if the behaviour changes at PLL1X or RCSLOW. Gotta try that one of these days.

@konimaru
Copy link
Owner Author

That could be an explanation (seen this image probably too often but never made this connection)! I see if I can find some time this evening.

@konimaru
Copy link
Owner Author

Clock frequency doesn't change the observed behaviour.

  • start with RCSLOW
  • run test
  • change to XTAL1|PLL16X|5M
  • display result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants