A closer look at interrupts

byuu · Post by **byuu** » Thu Sep 22, 2005 3:47 am

We've been treating $4210/$4211 bit 7 and actual /NMI raise as the same, but they're not.

In reality... the CPU has its own clock running that tries to simulate the PPU's dot counters (I guess so it doesn't need to poll the PPU for IRQs and such), but these counters don't take long dots into account.

I'll just explain how NMI works since I'm still dreading how I'm going to go about doing IRQ...

There are two variables needed, nmi_line and nmi_read. nmi_line is /NMI, and nmi_read is for $4210 reads.

When the CPU VTIME / HTIME = [225/240] / 0.5 (HC=2), it lowers nmi_read.
When you read from $4210, it returns the inverted status of nmi_read, and then it sets nmi_read back to high. HOWEVER, if VTIME == 225/240, and HTIME == 0.5 (meaning you read $4210 at the exact same instant that nmi_read was raised, then nmi_read is not raised. Why? Probably a bus conflict or something. The CPU is probably still sending the signal to set nmi_read high, and so the PPU read doesn't lower it or something. Bah.
nmi_read is NOT raised at the start of a new frame, but it is upon $4200 write with bit 7 clear.
Next variable, nmi_line, or /NMI as we call it now.
nmi_line is set when CPU VTIME / HTIME = [225/240] / 2.5 (HC=6). 4 cycles later than nmi_read. Very important.
nmi_line is lowered immediately, with no regard to $4200 bit 7. Now when you reach the last bus cycle of the opcode, the CPU checks to see if /NMI is low, and if it is, sets it back to high. Now it sees if NMI interrupts are enabled, and if they are, then on the first bus cycle it gets executed.
nmi_line is probably raised at the start of a new frame, I haven't checked yet. That goes with previous knowledge, however.
But clearly you can see how these two cannot possibly be done with just one variable, /NMI.

This does have some impact on other things, too. Take for example lda $4210 with p.m = 0.
$4210 bit 7 will return set when you're at { 225/240, 0.5 }, even though it isn't the last cycle. With the previous method, it wouldn't be set until the last cycle of the opcode (for the $4211 read).

And just to explain the SNES pipeline for the sake of completeness, take lda $4210 with P.M = 1. The pipeline has two stages, the bus stage and work stage.
The work stage is always 1 cycle behind the bus stage for obvious reasons (how do you manipulate data WHILE its being read?).
So we get:
<Note that each B/W block happen at the exact same time...>
[B1] opfetch
[W?] <last work cycle of previous opcode>

[B2] opread
[W1] <nothing to do>

[B3] opread
[W2] <nothing to do>

/NMI and /IRQ lines are tested here...

[B4] memread
[W3] <nothing to do>

/NMI and /IRQ are invoked here if necessary, /NMI takes priority.

[B1] opfetch
[W4] <$4210 result is now available>

So, even the real SNES has to know when its on the last bus cycle of an opcode in order to test /NMI, /IRQ at the right time. It can't do so any later or the results won't match what we see.
So then there's no real reason to emulate the pipeline, just add a last_cycle() function to the final cycle of every single opcode. It's easier than it sounds, there's never more than two possible last cycles per opcode.
By doing so, you instantly support anomie's chaos IRQ tests. Like:

Code: Select all

phk
ldx.w #next : phx
cli
php
sei
;set IRQ
wai
cli
rti
next:
;no IRQ occured!

Madness. I'm really beginning to hate this system...

I'm quite certain the same holds true for /IRQ, but I guess I'll have to emulate that behavior first, won't I?

What I can say of it right now is that this explains the difference in results for me and anomie:

yx trigger point
00 => Never
01 => H-IRQ: every scanline, H=HTIME+~3.5
10 => V-IRQ: V=VTIME, H=~2.5
11 => HV-IRQ: V=VTIME, H=HTIME+~3.5

Whereas I was using HTIME+4.5 (18) and 3.5 (14).
That's when the /IRQ pin is raised, but reading $4211, as anomie was obviously doing, will be HTIME+3.5 (14), and 2.5 (10).

EDIT: Ok, finished implementing this. It would appear that $4211 read is set at the same time as /IRQ goes low... unlike NMI. So then:
H = (4200 & 0x10) ? (HTIME) : 0;
H = (H != 0) ? ((H << 2) + 18) : 14;
At least, the results are a lot closer this way... still working on it though...

byuu · Post by **byuu** » Fri Sep 23, 2005 8:42 am

Damnit, I rescind my earlier information, thanks to a lot of help from TRAC.

As it stands, I can't get perfect timing for NMI or IRQ, but I'm extremely close.

I wrote 3 test ROMs that any emulator author might want to look into.
They are available here: http://byuu.cinnamonpirate.com/temp/nmi_irq.zip
In it:

[demo_nmi.smc, demo_nmi.asm]
This tests various edge cases in NMI interrupts. Not all of them, but quite a lot. The source code explains what each test is, and the test number that fails is written to SRAM (in case your emulator lacks a debugger to check).

[demo_irq.smc, demo_irq.asm]
This tests a lot more edge cases for IRQ. Don't even think about trying to get this one to pass unless you support testing to see if IRQs should trigger on the last bus cycle of each opcode, as opposed to the first.

[nmi.smc, nmi.srm, ufo.srm]
This is a really bruteforce NMI timing test. It just keeps trying to read $4210 and then when bit 7 is set, writes the latch position to a file. The NMI interrupt does the same. It shows off the phenomena where when you read $4210 at V=225/240, HC=2 (H=0.5), /NMI stays low. You can see it because sometimes the Y latch position in the SRAM file is $e2 instead of $e1, like you might expect. $4210 is returning the highest bit set twice.
Basically, with TRACs new timing information, I believe that the HC=2, HC=6 information was wrong. /NMI probably raises at HC=2 only. But my results still aren't perfect. They were before with the HC=6, though... what causes my test to now fail is because I lower /NMI when $4210 is read (real hardware does that, verified in demo_nmi.smc].

For what it's worth, no publically released emulator passes any of these tests. A red screen denotes failure, blue for passing.

PM me for the URL to the latest WIP with the source code for how I emulate NMI/IRQ to pass the first two tests, if you want.

Overload · Post by **Overload** » Sun Sep 25, 2005 6:41 am

byuusan wrote:[demo_nmi.smc, demo_nmi.asm]
This tests various edge cases in NMI interrupts. Not all of them, but quite a lot. The source code explains what each test is, and the test number that fails is written to SRAM (in case your emulator lacks a debugger to check).

Super Sleuth 1.03 and the current 1.04 preview passes this test, or at least gives a blue screen.

Something else that's worth mentioning, in case you want to add support for Super FX or SA-1. Both Super FX and SA-1 can send interrupts to the SCPU via the cartridge irq line. Thus there is the need to add support for an external interrupt flag. In some cases both the CPU Timing circuitry and the cartridge co-processor can issue an interrupt request at the same time. If this happens, reading TIMEUP ($4211) will only clear the internal interrupt and the SCPU will re-interrupt after the handler has returned because the cartridge irq line remains high.

Reading $3031 on the Super FX and writing $80 to $2202 on the SA-1 will clear the cartridge irq line.

byuu · Post by **byuu** » Sun Sep 25, 2005 7:40 am

Super Sleuth 1.03 and the current 1.04 preview passes this test, or at least gives a blue screen.

Oh, so it does! Awesome. The IRQ one fails at test 4, meaning you're giving time back to the main routine between each /IRQ call when $4211 is never cleared, if even a single opcode. Still, best of any publically released emu :D

I'm afraid of the SA-1 and Super FX. Given how perfect I like everything, I'd probably go insane trying to get that stuff working...

I have some more news about NMIs, at least...

I wrote individual tests, rather than trying to figure out $4210 read + /NMI lower behavior all at the exact same time. Interesting new findings...

First, the /NMI line goes low at V=225/240, HC=0. It only goes low on the last cycle of the opcode, not when the HV counter reaches that position. Important when you do something like stz $4200 with p.m = 1 (the NMI enable flag will change in the middle of the opcode).
$4210.d7 is set at V=225/240, HC=2. Reading it at HC=2 -or- HC=4 will result in the bit staying set, so the next read will also have $4210.d7=1.
I can't tell if reading $4210 clears pending NMIs or not. One test suggests it does (leaving $4200.d7 off until V=228, reading $4210, setting $4200.d7 = no NMI), another suggests it doesn't (reading $4210 on V=224, HC=1362, or V=225, HC=0, 2, 4, or 6 = NMI still triggers).
Now, even worse news. stz $4200 with p.m clear.
Writes to $42xx registers require 6 mcycles before the value is taken (e.g. $4201 latch behavior), that means the write to $4200 ends RIGHT as the last bus cycle of the opcode begins, and tests /NMI. The bad part: say NMI is enabled, and that write clears NMI at the same time the last cycle /NMI test occurs, it will use the value BEFORE the $4200 write posts. Good freaking luck emulating this behavior. It makes more sense when you see the opcode broken down into pipelined form:

http://byuu.cinnamonpirate.com/temp/nmi_chaos.txt

Eh, screw it. I think I'll just emulate the pipeline already...

Now to lure anomie here for input... :)

EDIT: Upon further examination, the pipeline thing probably won't help here either... to verify when /NMI goes low exactly (last cycle or anywhere within opcode?) I decided to try using mvn, since its write cycle is cycle 5, and cycles 6 + 7 are i/o cycles... results:

<note: verified that the cycle ordering was correct for the write cycle on hardware first, it is cycle 5 of 7>

Code: Select all

--- stz $4200 ;p.m = 0
0100,2000,4011,3995 : 0
* 4200 write at 225,   4 ;<224,1362 write start>
* /NMI test at 225,   4 = 0
0100,2000,4010,3996 : 1 <snes>
* 4200 write at 225,   6 ;<225,   0 write start>
* /NMI test at 225,   6 = 1
0100,2000,3995,4011 : 1 <bsnes>

--- mvn $00,$00
0100,2000,4004,4000 : 0
* 4200 write at 225,   4 ;<224,1362 write start>
* /NMI test  at 225,  10
0100,2000,4003,4001 : 1
* 4200 write at 225,   6 ;<225,   0 write start>
* /NMI test  at 225,  12

The above 4200 write positions are using the mcycle delay of 6 for writes, like 4201. However, if I remove the 2mcycle read delay, and the 6mcycle write delay, my results are now that both $4210.d7 is set at and /NMI is lowered at V=225/240, HC=0.
Could $4200 and $4210 be exempt from the delays, since the NMI circuitry is directly inside the CPU? Given the pipeline, it would still be waiting the full 6mcycles anyway before touching the data... but if it were just an internal variable, it'd be almost just like touching a CPU register: instantaneous. Whereas $4201 has to go all the way out to the PPU to latch the counters.

Now, if I assume that the write to $4200 and read from $4210 are instantaneous, then by setting /NMI low immediately when that position is reached, and that the value of $4200.d7 -AT- V=225/240,HC=0 is what is used... I am able to match the hardware results in the above two tests.
Any thoughts?

Overload · Post by **Overload** » Mon Sep 26, 2005 12:17 am

byuusan wrote:
Super Sleuth 1.03 and the current 1.04 preview passes this test, or at least gives a blue screen.
Oh, so it does! Awesome. The IRQ one fails at test 4, meaning you're giving time back to the main routine between each /IRQ call when $4211 is never cleared, if even a single opcode. Still, best of any publically released emu

Test 6 of demo_irq fails on my snes for obvious reasons, maybe the test would be more complete if you added a pal/ntsc check and case.

byuusan wrote: I have some more news about NMIs, at least...

The code below is taken directly from Wildguns, dma transfer starts just before the vertical blank period and ends during. i can understand the next opcode being executed before the jump to the nmi handler but how are two opcodes being executed?

lda #$01
sta mdmaen
lda #$02
tsb $66
-> nmi handler

byuu · Post by **byuu** » Mon Sep 26, 2005 12:49 am

Test 6 of demo_irq fails on my snes for obvious reasons, maybe the test would be more complete if you added a pal/ntsc check and case.

I don't have a PAL SNES, copier, or TV... nor the power supply conversion stuff I'd need even if I did have all of that, so I can't test to see which dots never fire. You bring up a good point though, I use the same invalid dot values for NTSC -and- PAL at the moment...
I will however modify the code to skip test 6 on a PAL SNES, at least...
BTW, I take it PAL scanlines are also 1364/scanline? The only thing I changed for PAL timing was adding the extra scanlines at the bottom, and chopping the framerate down, and lowering the CPU speed slightly for APU synchronization. Does it still have the long dots, the missing dot on scanline 240 non-interlace frame 1, etc?

The code below is taken directly from Wildguns

Wild Guns is an evil, evil game... I've no idea how zsnes9x can run that thing properly without a hack... so I take it Wild Guns flickers like that because $66 bit 1 isn't set during the NMI routine?

anomie mentions that when a DMA completes, there's a 24-36 or so mcycle delay before the NMI interrupt is triggered ("time enough for an opcode or two")... I haven't gotten far enough to test DMA<>NMI interaction myself, but it also doesn't make any sense to me why more than one opcode after the DMA would be necessary... even the pipelining would only throw things off by one cycle, and not by one full opcode.

As it stands, the only thing I have left before DMA interaction is a problem where on real hardware when HC=4, the NMI shouldn't go low until the next opcode. That makes no sense to me because every other test indicates it goes low at HC=0... maybe I'll just skip it for now :/

byuu · Post by **byuu** » Mon Sep 26, 2005 11:38 am

Oh, man... this is fun. I went back and reconfirmed all my previous tests, adding in a ton of debugging output from my emu to track things, and I found a lot of new things in the process. As of now, I pass the NMI test, the IRQ test, get 1,022 of 1,024 latch positions perfect in the NMI timing test, and get at least the first couple IRQ latch positions correct in the IRQ timing test (not included in the above ZIP).

Here we go. First of all, I was saying that all reads happened 2 clock cycles into the bus cycles, and writes happened 6 clock cycles into the bus cycle. That just doesn't work with all of the new NMI / IRQ findings, so from now on: all of my results assume that the read /and/ writes occur 0 clock cycles into the bus cycle. I special case $2137 and $4201 now. You'll see why in a minute.
Now, I'm sure there's still delays in reading and writing to various memory addresses. My theory is that the delays to $2137/$4201 are because the CPU has to communicate with the PPU. Whereas with $4200,$4210, and $4211, the variables are right there with the CPU, and much like the CPU registers, they don't require a bus fetch to get their values.

NMI
NMI triggers at V=225/240,HC=0.
If you read $4210 at V=225/240,HC=[0,2] then d7 stays set.
If you read $4210 at V=225/240,HC=[0,2,4] then if an NMI was scheduled to fire at last_cycle, it is delayed until the next opcode. It still fires, but not until one opcode later.
Reading $4210 will never prevent an NMI from firing.
Clearing $4200.d7 at V=224,HC=1362 will not stop the NMI from firing. Clearing $4200.d7 at V=225,HC=0 will stop it from firing, however.
This is why I had to remove my previous timing based on writes occuring 6 clock cycles into the bus cycles... if the write occurs 6 clock cycles in, then this behavior here wouldn't make any sense.
Oh, and I test clearing $4200 with mvn and stz (in 16-bit mode), so there's always at least 1 or 2 cycles before the last opcode cycle, so it isn't a matter of $4200.d7 being lowered in the last bus cycle that's causing this behavior...

IRQ
V=VTIME
H=(HTIME != 0) ? ((HTIME << 2) + 12) : (8);
This is the same as anomie has, except minus 2 for not couting 2 clock cycles into the bus cycle anymore.

Let's use the example of V=225,H=0. So /IRQ goes low immediately when the CPU reaches V=225,HC=8.
If you read $4211 at V=225,HC=8 or 10, then $4211.d7 will stay set, and the IRQ won't trigger until the next opcode, just like NMI.
Different though, is if you read $4211 at V=225,HC=12. In this case, $4210.d7 will lower, and the IRQ will not fire at all.
The following code is thus extremely dangerous and if used in any game, could result in preventing the IRQ from occuring:
- lda $4211 : bpl -
Reading at V=225,HC=14 or higher will make the IRQ act normally.
Writing to $4200 and clearing d4+d5 at HC=12 or lower will prevent the IRQ from occuring. Writing at HC=14 or higher will allow the IRQ to fire before IRQs stop occuring (because V+HIRQs just got disabled...)

To elaborate on the effects of reading $4211 some more... take the following:
V=225,H=0
$47,0 <225, 6>
$48,0 <225, 8> <$44>
$48,0 <225, 10> <$44>
$00,1 <225, 12>
$45,0 <225, 14>

The first value is the H position that the IRQ handler sets. $00 means the IRQ handler never got called, and thus the value was never set.
The next value indicates whether $4211.d7 was set in the read in the main routine.
The first bracket gives the y, x exact cycle position where the $4211 read bus cycle began.
The last bracket is what the latch value would be without emulating the effect where reading $4211 at HC=8,10 causes IRQ to fire one opcode later.

Now, what I can't yet figure out, and why my IRQ timing is so poor:

Code: Select all

008034 lda $4211     [$004211] A:0042 X:0010 Y:0000 S:01ff D:0000 DB:00 nvMxdizc
* $4211 read at  31,1352                                                        
* -> raised read <1>                                                            
008037 bpl $8034     [$008034] A:0042 X:0010 Y:0000 S:01ff D:0000 DB:00 nvMxdizc
* /IRQ lowered at  32,   2 -- 2 < 8 <= 10                                       
* irq_test passed at   32, 10                                                   
008034 lda $4211     [$004211] A:0042 X:0010 Y:0000 S:01ff D:0000 DB:00 nvMxdizc
008800 pha                     A:0042 X:0010 Y:0000 S:01fb D:0000 DB:00 nvMxdIzc
008801 lda $4211     [$004211] A:0042 X:0010 Y:0000 S:01fa D:0000 DB:00 nvMxdIzc
* $4211 read at  32, 124                                                        
* -> raised read <0>                                                            
008804 lda $2137     [$002137] A:00c2 X:0010 Y:0000 S:01fa D:0000 DB:00 NvMxdIzc
008807 lda $213c     [$00213c] A:0021 X:0010 Y:0000 S:01fa D:0000 DB:00 nvMxdIzc
00880a xba                     A:0027 X:0010 Y:0000 S:01fa D:0000 DB:00 nvMxdIzc

No idea what's happening here... the SNES will proceed one more opcode before entering the IRQ routine... OPHCT = $0027 in the above code, OPHCT = $002e on hardware. $0027+30(lda $4211) would give us ~$002e...

byuu · Post by **byuu** » Sat Oct 01, 2005 12:04 pm

Ok, I figured it all out now. The biggest problem is that /NMI and /IRQ go low 4 clock cycles after $4210 / $4211 bit 7 get set. I'll probably just type it up in a text file or something and stick it on my website instead of typing it all up here though... I also have test programs that verify everything this time.
As usual, ignore all my old notes x.x