Latch timing

Strictly for discussing ZSNES development and for submitting code. You can also join us on IRC at irc.libera.chat in #zsnes.
Please, no requests here.

Moderator: ZSNES Mods

Post Reply
byuu

Post by byuu »

Wow. My previous assumptions were way off.

Turns out it was my old test that was flawed with line 240/241. It actually is line 240 (via 0-240), not line 241.
The correct line (counting from 0 and up) is line 240 ($f0).
Sorry for the mis-information, I failed to catch it because I didn't understand the differing dot lengths that I describe below.

I also verified the longer dots, and what happens on scanline 240, during interlace and non-interlace.

First off, I realized that we were both wrong about which dots have 6 master cycles. The correct dots are 322 and 326, sort of.
It's really hard to explain, so take a look at the following table:

Code: Select all

Legend: { nop count, lda $00 count, lda $2100 count, lda $0000 count, frame count, I/NI (interlace status) } = { x result (what it should be if all cycles were 4 dots, not what it really returns), y result } = cycles (actual returned dot position (real x result))

{ 0, 0, 22, 10, 0, NI } = { 321, 0 } =  980 (321)
{ 0, 0, 21, 11, 0, NI } = { 321, 0 } =  982 (321)
{ 0, 0, 20, 12, 0, NI } = { 322, 0 } =  984 (322/321) --\
{ 0, 0, 19, 13, 0, NI } = { 322, 0 } =  986 (322)       |
{ 0, 0, 18, 14, 0, NI } = { 323, 0 } =  988 (322/323) --/
{ 0, 0, 17, 15, 0, NI } = { 323, 0 } =  990 (323)
{ 0, 0, 16, 16, 0, NI } = { 324, 0 } =  992 (323)
{ 0, 0, 15, 17, 0, NI } = { 324, 0 } =  994 (324)
{ 0, 0, 14, 18, 0, NI } = { 325, 0 } =  996 (324)
{ 0, 0, 13, 19, 0, NI } = { 325, 0 } =  998 (325)
{ 0, 0, 12, 20, 0, NI } = { 326, 0 } = 1000 (325)
{ 0, 0, 11, 21, 0, NI } = { 326, 0 } = 1002 (326/325) --\
{ 0, 0, 10, 22, 0, NI } = { 327, 0 } = 1004 (326)       |
{ 0, 0,  9, 23, 0, NI } = { 327, 0 } = 1006 (326/327) --/
{ 0, 0,  8, 24, 0, NI } = { 328, 0 } = 1008 (327)
{ 0, 0,  7, 25, 0, NI } = { 328, 0 } = 1010 (327)
Verify these with my above cycle demo ROM.

As you can see, dots 321-323 can have anywhere from 4-6 cycles each. Only one of them actually get 6 cycles per scanline, the other two get 4. Which one is actually 6 cycles changes all the time. Try { 0, 0, 20, 12, 0, NI } on my test program and just keep hitting reset, the x result will keep changing between 321 and 322. I say that 322 is the long dot only because it's in the middle. There does not seem to be one dot that gets hit more than the other two, nor is there any discernable pattern about which will be hit. It could just alternate between the three every frame, and this counter may not be reset when the SNES is reset (like the latch counters and frame counter (frame number) are). My test program lacks the ability to determine this. Care to come up with a better one?
Dots 325-327 act the same as 321-323.
This happens exactly the same in interlace and non-interlace mode.
This does not happen on odd frame/scanline 240 in non-interlace mode. All dots are 4 cycles long, and the highest latchable value is 339. The entire scanline is 1360(-40 DRAM refresh) cycles long. You were correct in your assumptions of this line, anomie.
The longer dots do occur in interlace mode line 240 odd frames.

DRAM refresh: Your notes were confirmed. Every other scanline the SNES alternates between starting DRAM refresh at cycle 534 and 538.

Code: Select all

non-interlace/even frame/scanline 0:
(1324*0)+532=532-264=268
(2*14) + (8*30) = 268
{ 2, 0, 8, 0, 0, NI } = { 133, 0 } = 532
{ 2, 0, 7, 1, 0, NI } = { 143, 0 } = 534

non-interlace/even frame/scanline 1:
(1324*1)+532=1856-264=1592
(20*30) + (31*32) = 1592
{ 0, 0, 20, 31, 0, NI } = { 133, 1 } = 532
{ 0, 0, 19, 32, 0, NI } = { 133, 1 } = 534
{ 0, 0, 18, 33, 0, NI } = { 134, 1 } = 536
{ 0, 0, 17, 34, 0, NI } = { 144, 1 } = 538
View all of my notes from today here, noting that I didn't thoroughly test interlace mode, and thus didn't record my tests for that:
http://setsuna.the2d.com/files/dot_and_dram_timing.txt
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:First off, I realized that we were both wrong about which dots have 6 master cycles. The correct dots are 322 and 326, sort of.
It's really hard to explain, so take a look at the following table:
Hrm.... on my SNES, i still get 323 and 327, constant. What results to you get from the test I sent you?
DRAM refresh: Your notes were confirmed. Every other scanline the SNES alternates between starting DRAM refresh at cycle 534 and 538.
As i suspected, I seem to detect in the NI odd frame lines 240 and 241 begin refresh in the same place. Otherwise, it does seem to alternate every scanline.
byuu

Post by byuu »

Hrm.... on my SNES, i still get 323 and 327, constant. What results to you get from the test I sent you?
Wow, very interesting indeed... I actually didn't run it, since I figured I had the cause of our testing differences solved. I'll run it and post the resulting SRAM file here, then.
As i suspected, I seem to detect in the NI odd frame lines 240 and 241 begin refresh in the same place. Otherwise, it does seem to alternate every scanline.
Why would you suspect that? Doesn't seem to make sense to me why 240+241 would be the same. Dots 323/327 are way after the refresh, unless because line 240 starts at 538 and line 241 would be 534, but since the line was 4 dots short, line 241 starts at 538... hmmm.

I'm going to try running to a bunch of different scanlines with and without interlace enabled, and see if the start position for each scanline is a constant.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Why would you suspect that? Doesn't seem to make sense to me why 240+241 would be the same. Dots 323/327 are way after the refresh, unless because line 240 starts at 538 and line 241 would be 534, but since the line was 4 dots short, line 241 starts at 538... hmmm.
During most scanlines, it's either 1360 or 1368 cycles between the starts of successive refreshes (recall each scanline is 1364 cycles, right in between the two). Perhaps the internal 'refresh counter' works in multiples of 8? Anyway, the period between refresh on 240 and refresh on 241 those frames is 4 cycles short, and to prevent everything from getting totally out of whack it would either need to make the refresh period shorter (1356 or 1364, but then why not just always 1364?) or always use 1360 and repeat the previous.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Tonight's results: i've written a version of Byuu's test so I can do some more experimentation. In particular, I've created a version that does a LDA $4212 right after the LDA $2137 and displays that value in addition to the latch position. It runs these LDAs in FastROM, so we can 'take out' the LDA $2137 simply by adding 24 cycles (one LDA $00 in the test) to the displayed latched position (i've actually done it, it works). I've only run the tests in non-interlace mode so far.

Results:
$4212 bit 7 gets set at [0,225] and cleared at [0,0]. It seems stable across frames. Interlace will probably not change this. Overscan mode will change the start position, and who knows what'll happen playing with overscan during $e1-$f0 (my guess: it'll change instantly).

$4212 bit 6 gets set at dot 274 of the scanline, and cleared at dot 1. It seems stable across scanlines, but testing across 240-241 on the short frame would be a good idea. Interlace will probably not affect it.

$4212 bit 1 is weird. It gets set on line 241, anywhere from dots 32.5 to 95.5. The first (even?) frame gets it at dot 76.5, the next frame gets 79.5, then 81.5, 84.5, 86.5, 89.5, and so on. I haven't tested when the bit clears yet. And then, we still have interlace and overscan.

edit: Hmmm... much like the refresh timer seems to 'check' every 8 master cycles, the auto joypad read timer could be checking every 256 cycles to give that pattern... A prediction: interlace will advance the pattern by 2 or 45 dots, rather than 2 or 3 dots.


Status report:
Completed Tests:
* WRAM Refresh
* Scanline length
* Frame length

In-progress Tests:
* $4212 bit 7 (needs interlace, overscan tested)
* $4212 bit 6 (needs interlace tested)
* $4212 bit 1 (needs clear timing and interlace/overscan)
* Dot lengths (my SNES is stable at $143/$147, but byuu's is odd)
* CPU/APU clock ratio (needs more SNESs tested)

Future tests:
* $213f bit 7
* $4210 bit 7
* $4211 bit 7 (check V, H, and VH)
* HDMA start
* NMI trigger
* IRQ trigger (check V, H, and VH)
* VRAM/CGRAM/OAM accessibility?
* any others?

[I think you pressed the wrong button... - grinvader]
byuu

Post by byuu »

Ok, I got my PPU1/2/CPU version numbers:

Code: Select all

$213e: 5c77=1 [11]
$213f: 5c78=3 [23]
$4210: 5a22=2 [02]
I get the same results as you. That leaves three possible causes for our differences with dot timings:
I) Our copiers are affecting our results.
II) The version numbers weren't changed, even though minor changes exist
III) Other factors are affecting the timing differences (NTSC-J vs. NTSC, perhaps?)

It's going to be a month or so because I don't have any money and just started working again, but when I get some money, I'll drive by some third-rate video game stores and track down a US SNES and try that out.
I also have another Japanese SNES around here somewhere.

I tried to run your test on my UFO, but it didn't work. It works on my emulator, and the screen turns white when its done. But on my UFO, the screen says 'Testing in progress...' and that never goes away, even after half an hour.
I finally gave up at the 40 minute mark and shut the system off. Here is the resulting SRAM file, although its practically useless.
http://setsuna.the2d.com/files/ppu.ram
Ignore the 512-byte header, that's a UFO thing. The following 32k is the actual SRAM.
I notice that you write to $700000,x even though you're using LoROM. Could this be the problem? LoROM was supposed to be 3x6000-3x7fff, right? It works in my emu because I mirror both locations for both LoROM and HiROM. Either way, it should work on the copier, but fails. I am able to load/save SRAM from official SNES games, both LoROM and HiROM, just fine.

As far as future tests, my first priority is to get proper cycle/latch timing so that we get the same exact results through emulation as we do through copiers. Then we can easily implement all of our findings, and test them through emulation to make sure they're correct. I just have to hope that I have all this stuff implemented correctly, at present.
We also need to go through all PPU registers and figure out which ones can be read/written at which times (during/outside v/hblank)
Overload
Hazed
Posts: 70
Joined: Sat Sep 18, 2004 12:47 am
Location: Australia
Contact:

Post by Overload »

byuusan wrote: I notice that you write to $700000,x even though you're using LoROM. Could this be the problem? LoROM was supposed to be 3x6000-3x7fff, right? It works in my emu because I mirror both locations for both LoROM and HiROM. Either way, it should work on the copier, but fails. I am able to load/save SRAM from official SNES games, both LoROM and HiROM, just fine.
SRAM Mapping for LoROM (Mode 20) is Bank at $70, HiROM (Mode 21) is Bank $30-$33.
byuu

Post by byuu »

HiROM (Mode 21) is Bank $30-$33.
Oops. Thank you. Have to wait for anomie to look at the SRAM file to see what's wrong, then.
BTW, wouldn't that limit SRAM in HiROM to 32k max (and LoROM to 64k)? I hear there are 128k SRAM games, but I suppose they use other memory mapping chips.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:I tried to run your test on my UFO, but it didn't work. It works on my emulator, and the screen turns white when its done. But on my UFO, the screen says 'Testing in progress...' and that never goes away, even after half an hour.
Hrm... looking at the SRAM, the test finished, but must have gotten stuck on the WAI or something. Or else your UFO spotted the STP opcode and did something weird?

BTW, i've checked those opcodes $21c2 and $21c3, they're open bus on my system. I've been wondering if it's your copier that uses those regs and makes the test ROM fail?
Here is the resulting SRAM file, although its practically useless.
Actually, that's the actual output of this program. You can see the WRAM refresh around byte offset 0x30a in your file, and now it's even easily explainable why dot $85 gets 3/4 the number of the rest (the first half gets the full blast, but the second half of the dot only gets every other scanline) and such. Then you're showing pretty constantly $141 and $145 in this test, no indication of the long dot bounching around like in your other test. Strange...
We also need to go through all PPU registers and figure out which ones can be read/written at which times (during/outside v/hblank)
Yes, ick...
byuu

Post by byuu »

Hrm... looking at the SRAM, the test finished, but must have gotten stuck on the WAI or something. Or else your UFO spotted the STP opcode and did something weird?
It could be that you wrote to CGRAM/whatever when the SNES wasn't in v/hblank or whatever and then STP locked the processor. The UFO starts games initially in a pretty poor condition (weird defaults and such). It's best to reset it after starting a game to get things accurate again, but I didn't want to risk messing up the test.
Actually, that's the actual output of this program. You can see the WRAM refresh around byte offset 0x30a in your file, and now it's even easily explainable why dot $85 gets 3/4 the number of the rest (the first half gets the full blast, but the second half of the dot only gets every other scanline) and such. Then you're showing pretty constantly $141 and $145 in this test, no indication of the long dot bounching around like in your other test. Strange...
Ah, so it is. Neat. So I get 321/325, and you get 323/327. So unless we can think of more tests for this... I think 322/326 is a very nice comprimise, don't you? :)

Perhaps something changes whether it happens on 321/325 or 323/327. That's 8 cycles, which is the same as the deviance in DRAM refresh starts.
It's obviously not changed every line, as your test proves. In fact, I only noticed it changing because I was resetting the SNES repeatedly.

Here is the SRM file I get from my implementation of the above, btw:
http://setsuna.the2d.com/files/ppu.srm

It's wrong only because I always start DRAM refresh at cycle 536. Once I move it to 534/538, I think it'll match the SNES nearly 1:1
Overload
Hazed
Posts: 70
Joined: Sat Sep 18, 2004 12:47 am
Location: Australia
Contact:

Post by Overload »

byuusan wrote:
HiROM (Mode 21) is Bank $30-$33.
Oops. Thank you. Have to wait for anomie to look at the SRAM file to see what's wrong, then.
BTW, wouldn't that limit SRAM in HiROM to 32k max (and LoROM to 64k)? I hear there are 128k SRAM games, but I suppose they use other memory mapping chips.
The biggest LoROM cart available has 128K, which is 4 banks of 32K mapped from $70-$73 using a mad-1. The biggest HiROM cart available has 32K but I think they can also be configured to hold 128K ($30-$3f).

I was speaking in terms of copiers though, which are only fitted with 32K.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:It could be that you wrote to CGRAM/whatever when the SNES wasn't in v/hblank or whatever and then STP locked the processor.
That's what the WAI was supposed to be for... Anyway, the actual test doesn't take terribly long, and even if you do manage to cut it off early the numbers should still be good enough to be recognizable.
The UFO starts games initially in a pretty poor condition (weird defaults and such). It's best to reset it after starting a game to get things accurate again, but I didn't want to risk messing up the test.
I wonder how it manages to screw things up so badly? I should see if TRAC will run the thing, get another data point or two.
It's obviously not changed every line, as your test proves. In fact, I only noticed it changing because I was resetting the SNES repeatedly.
Hrm... Anyway, resetting shouldn't hurt the test, it'll just start over with wiping the important part of SRAM and everything. So if you feel like running it again, you can see whether your result changes much.


edit: Ok, i've tested auto joypad read end: it ends 4224 cycles after the start. Interlace doesn't affect VBlank or HBlank positions. Auto Joypad Read is affected just as i predicted.
Last edited by anomie on Sun Jan 09, 2005 11:44 pm, edited 1 time in total.
byuu

Post by byuu »

Are you sure we're getting accurate positions here? I was under the impression that we don't know how much the lda $2137 interferes with the resulting latch positions yet.
Once I have that info, I'll try and work on the NMI tests.

Neat info on auto-joypad read, btw. Can you figure out what is returned from $4218/$4219 when read during joypad polling? I'm just returning 0's for now, but I doubt that's right.
BTW, i've checked those opcodes $21c2 and $21c3, they're open bus on my system. I've been wondering if it's your copier that uses those regs and makes the test ROM fail?
I'll play around with these some more, I need to add Open Bus support first before I can understand what I'm looking for with $21c2/$21c3 results.
As far as I know, no one has found a backdoor with the UFO like they have with the Game Doctors, but it's possible one exists.
The biggest LoROM cart available has 128K, which is 4 banks of 32K mapped from $70-$73 using a mad-1. The biggest HiROM cart available has 32K but I think they can also be configured to hold 128K ($30-$3f).
Thanks Overload, that helps me out. I was also mapping $700000 at 64k granularity, which was incorrect.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:Are you sure we're getting accurate positions here? I was under the impression that we don't know how much the lda $2137 interferes with the resulting latch positions yet.
You mean WRT an LDA $2137 LDA $4212 pair? As i said, i tested it directly by removing the LDA $2137, and the $4212 bits changed at the same position plus one LDA $00 (=24 cycles slowrom, LDA $2137 = 24 cycles fastrom). So whether the latch happens during the first or the last cycle of the LDA $2137, the bits get read at the same cycle of LDA $4212.
Neat info on auto-joypad read, btw. Can you figure out what is returned from $4218/$4219 when read during joypad polling? I'm just returning 0's for now, but I doubt that's right.
A bug in one of my tests recently partially probed that. So far, it's known that the values read are oddly shifted. For example, if you're pressing B, you might read that bit set at different positions in the word depending on just when you read. Later perhaps I'll see if there's a real pattern.
I'll play around with these some more, I need to add Open Bus support first before I can understand what I'm looking for with $21c2/$21c3 results.
Try this code for starters:

Code: Select all

PEA $21c2
PLD
LDX $00  ; will read $21c2
STX wherever
PEA $21c2-$ff
PLD
LDX $ff  ; will read $21c2
STX wherever
PEA $0
PLD
If it is Open Bus, it'll read $0000 for the first read and $ffff for the second.
byuu

Post by byuu »

I finished the DRAM refresh tests. You were dead-on, as usual, anomie. Updated notes:
http://setsuna.the2d.com/files/dot_and_dram_timing.txt

Basically, it changes between 534 and 538 every scanline (alternates between 1360 and 1368 cycles apart), except scanline 240 non-interlace/odd frame, where it stays the same as the last frame (and is thusly always 1360 cycles away from the previous DRAM refresh spot, otherwise it could end up being 1364 cycles apart, and I guess it really does work in multiples of 8).
This causes the start positions to invert every full frame. The same thing happens with interlace because of the extra scanline.

Non-interlace:
Frame 0 scanline 0: 534
Frame 1 scanline 0: 534
Frame 2 scanline 0: 538
Frame 3 scanline 0: 538
Frame 1 scanline 240: 534
Frame 1 scanline 241: 534
Frame 3 scanline 240: 538
Frame 3 scanline 241: 538

Interlace:
Frame 0 scanline 0: 534
Frame 1 scanline 0: 538
Frame 2 scanline 0: 538
Frame 3 scanline 0: 534

Interlace odd frame scanline 240 lacks the longer dots [321-323]/[325-327], so it acts normally, as expected.

Here is pseudo-code to implement this, noting it doesn't take into account what happens when you change between interlace and non-interlace mid-frame:

Code: Select all

ulong get_dram_refresh_pos(void) {
word dram_table[2] = { 534, 538 };
ulong dram_refresh;
static dram_refresh_pos = 0;
  dram_refresh = dram_table[dram_refresh_pos];
  if(interlace != false && frame != FRAME_ODD && scanline != 240) {
    dram_refresh_pos ^= 1;
  }
  return dram_refresh;
}
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Another quick test: If i LDA $2137 as the very first instruction, I latch dot 53.0 (cycle 212) after reset. Inserting NOPs and such to get to WRAM Refresh, the very first refresh is at 538.

If we want to make the assumption that the WRAM Refresh 'counter' starts at the same time the reset pulse clears, then latch happens at the end of the read cycle and the PPU starts at dot 32.5 (or more likely it starts at 0 and runs for 130 master cycles before it releases /RESET on the 5A22 (remember, PPU2 gets the reset signal from the CIC as input, and outputs one reset signal to PPU1 and another to the rest of the chips)). The reset handling takes 52 cycles before we start executing the first opcode, and then 30 cycles to latch = 212. Meanwhile, the counter starts at 130 and 51*8 cycles later is the first refresh.

OTOH, i can't seem to get the auto joypad read 'counter' to add up... But it is interesting that the earliest position is cycle 130 on the scanline.

And anyway, that's all assumption.
byuu

Post by byuu »

Another quick test: If i LDA $2137 as the very first instruction, I latch dot 53.0 (cycle 212) after reset. Inserting NOPs and such to get to WRAM Refresh, the very first refresh is at 538.
Nice catch. Looks like I had that backwards.
If we want to make the assumption that the WRAM Refresh 'counter' starts at the same time the reset pulse clears, then latch happens at the end of the read cycle and the PPU starts at dot 32.5 (or more likely it starts at 0 and runs for 130 master cycles before it releases /RESET on the 5A22 (remember, PPU2 gets the reset signal from the CIC as input, and outputs one reset signal to PPU1 and another to the rest of the chips)). The reset handling takes 52 cycles before we start executing the first opcode, and then 30 cycles to latch = 212. Meanwhile, the counter starts at 130 and 51*8 cycles later is the first refresh.
Are any of those numbers verified at all? Like /RESET remaining high for 130 cycles, or the reset vector requiring 52 cycles?
When the SNES resets, it gets pushed into emulation mode, so we assume that an emulation mode interrupt consists of:
1 ROM cycle (8 master cycles) + 3 stack cycles (24 master cycles) + 2 memory cycles (16 master cycles) = 48 cycles.
Or if it were native (which it's not): 1p + 1i/o + 4s + 2m = 62 cycles.

You know, if we can get this right and get the same initial latching positions as a real SNES, we can easily determine the other information like exact start positions for NMI, VBlank, etc.
byuu

Post by byuu »

Verified or not, I was feeling adventurous. I went ahead and implemented the /RESET + reset IRQ timings.
I've also somewhat noticed that the latch value is calculated after the opcode has completed, but I only tested it with lda $2137/lda $002137. It'd be a good idea to use some other opcodes, like asl $2137 (will it latch after the read of $2137, or after the entire opcode?).
Anyway, this is what I get for $213c at SNES reset:
$35 (53.0) -> lda $2137
$38 (56.5) -> nop : lda $2137
$37 (55.0) -> lda $002137
$3a (58.5) -> nop : lda $002137

I made the master cycle counter start at cycle 182 to simulate the /RESET + reset IRQ cycles. I also had to rewrite my entire damned CPU core to get it to update the counters before actually performing the opcode, which really, really sucked.
It works like this now:

Code: Select all

void g65816_op_lda_addrb() {
  gx816->read_op(MEMMODE_ADDR);
  add_cpu_cycles(3, 1, 0, 0); //program cycles, memory cycles, indirect memory cycles, i/o cycles; addresses set by read_op()
  g65816_op_lda_b(); //this is where $2137 is read, hence why the counter needs to be updated before the opcode
  g65816_incpc(3);
}
No idea if that's right, but I'm going to be really mad if it isn't :P

With the above stuff, I'm able to exactly match a real SNES now. So next I'm going to start trying to get the values to match after HBlanking once, then a few times, then after VBlanking once, then a few times, etc.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:Are any of those numbers verified at all? Like /RESET remaining high for 130 cycles, or the reset vector requiring 52 cycles?
According to the datasheet, /RESET is 2 IO cycles, 3 stack access cycles, and 2 vector read cycles: 6+6+8+8+8+8+8 = 52. And it really isn't so much "/RESET stays high for 130 cycles" as "/RESET remains high for 130 cycles after PPU2 starts running", which is still completely unknown (another oscilloscope test).

(later)

Ok, some NMI results. $4210 bit 7 seems to be set at [0.5,225] and cleared at [0,0]. Simple enough. Interlace doesn't seem to affect it.

The actual NMI is a bit odder, though. As near as I can figure, it depends on the ROM speed(s) of the previous instruction. Pure FastROM can have NMI as early as [2,225]. Mixing FastROM and SlowROM (LDA $0000, RTL, JML) gets [2.5,225]. Pure/ SlowROM gives [3,225]. Mixing FastROM and XSlowROM (LDA $4000) gives [3.5,225], and mixing SlowROM and XSlowROM gives [4,225]. I haven't extensively tested the odder combinations that can happen with indirect addressing.
byuu

Post by byuu »

Well, after another night of tests, I'm afraid to say that I have good news, and really bad news.

The good news is that I figured out how the opcode cycles factor into the latch positions returned by $2137, the bad news is that it actually happens in the middle of the opcode.

To try and explain it better, let me show you lda $2137's cycle mnemonics:

Code: Select all

  [1 ] pbr,pc   ; opcode
  [2 ] pbr,pc+1 ; aal
  [3 ] pbr,pc+2 ; aah
  [4 ] dbr,aa   ; data low
  [4a] dbr,aa+1 ; data high [A=16]
The left is the address bus, right is the data bus contents.
Basically, cycles 1, 2, and 3 read in the opcode and the absolute address (word). The fourth cycle is where $2137 is actually read. The counter is latched right here. So you have cycle pos + 3 memory cycles (at pbr,pc (+1, +2)).
This changes your theory for the intial starting cycle of the SNES.
Instead of 182, we now get 188.
If lda $2137 latches at 212, then 212 - 24 = 188

(Note that I'm somewhat guessing that the latch occurs before the memory read, and not after, due to common sense / logic. How could the latch know where it would be after the latch register itself was read? I doubt the SNES would update the cycle position before actually reading $2137.)

This works exactly like you'd expect for indirect, R-M-W opcodes, etc.
The actual cycle that reads from (not writes to) $2137 (whether it be direct, indirect, or whatever) is where the latch occurs.

The really bad news, is that in order to accurately emulate the latching counter, you would have to write a CPU core that actually works cycle-by-cycle, instead of opcode-by-opcode. It should be obvious why this would be a lot slower. Sure, you could cheat with opcodes that can't read $2137 (like branches and jumps), but you're still talking at least 60% of opcodes that would need this support. This adds at least 200-300% overhead at the very best case scenario to the CPU emulation speed, and is quite a bit more difficult to write. Keep in mind that you would have to recalculate the x/y dot positions, along with all of their quirks (longer/shorter dots, missing scanlines, etc) after each cycle. Or at the very least, after each opcode and before each MMIO read/write.

I really don't think we'll ever see any emulators actually implementing this, because it just isn't worth it. With that said, I'm trying to do it anyway. I know it'll kill speed which I'm already really bad at, but I think it will be invaluable for determining odd behaviours like the NMI $4210 bit 7 read thing I was talking to you about. It'll probably take me another 3 weeks to finish rewriting all the opcodes to go cycle-by-cycle.

As an example, this is what the opcodes would need to look like to latch properly:

Code: Select all

void g65816_op_asl_addrw(void) {
word m;
//1-3 [op fetch]
  gx816->op.aa.w = gx816->read_operand(2);
//4 [read low]
  m = gx816->op_read(MEMMODE_ADDR, gx816->op.aa.w);
  gx816->op.aa.w++;
//4a [read high]
  m |= gx816->op_read(MEMMODE_ADDR, gx816->op.aa.w) << 8;
//5 [i/o]
  g65816_testc(m & 0x8000);
  m <<= 1;
//6a [write high]
  gx816->op_write(MEMMODE_ADDR, gx816->op.aa.w, m >> 8);
  gx816->op.aa.w--;
//6 [write low]
  gx816->op_write(MEMMODE_ADDR, gx816->op.aa.w, m);
//pc
  g65816_testn(m & 0x8000);
  g65816_testz(m == 0);
  g65816_incpc(3);
}
gx816->read_operand(n) will increment the cycle counter by (n + 1) memory cycles at (gx816->regs.pc), and the gx816->op_read() opcodes will add one memory cycle based on the address specified in the second argument. Notice how I can't even perform a word-read/write without breaking the timing. I've added this to about 6-8 opcodes now, and tested them all against a real SNES. Here are those results:

Code: Select all

  lda #$37 : sta $00
  lda #$21 : sta $01
  lda #$00 : sta $02
  lda $2137
$53 emu/$53 snes

  lda #$37 : sta $00
  lda #$21 : sta $01
  lda #$00 : sta $02
  lda [$00]
$57 emu/$57 snes

  clc : xce
  rep #$20
  lda #$2100 : pha : pld
  lda $37
$56 emu/$56 snes

  lda #$37 : sta $00
  lda #$21 : sta $01
  lda ($00)
$4b emu/$4b snes

  clc : xce
  rep #$20
  lda $2136
$43 emu/$43 snes
Note that this still doesn't verify the start position of the SNES, nor any offsets in the actual latch value returned (I remember someone saying that the latch position is off by 5.5 dots or so?), I don't know how to prove/disprove that right now. My goal here is to just keep rewriting code to make it match real SNES tests.

---

With all that said, I'm mildly curious. Does anyone plan to implement any/all of these findings into their emulators? Or is this just strictly going to be for research/documentation purposes? I realize how unfeasible a lot of these quirks are to implement.
Dmog
Lurker
Posts: 192
Joined: Tue Aug 31, 2004 6:03 pm

Post by Dmog »

byuusan wrote:With all that said, I'm mildly curious. Does anyone plan to implement any/all of these findings into their emulators? Or is this just strictly going to be for research/documentation purposes? I realize how unfeasible a lot of these quirks are to implement.
Can't speak for the devs, but probably not. Not in Zsnes anyway. I believe the bug you found for the color add/sub thing was implemented though.

Btw, please tell me you have a backup of these threads somewhere... It would be a shame if all the researches you and anomie have done so far would just dissapear..You know what happened to the other forums.

You could start a Snes knowledge/research site or something. Better than to just post your findings here
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Post by creaothceann »

byuusan wrote:in order to accurately emulate the latching counter, you would have to write a CPU core that actually works cycle-by-cycle, instead of opcode-by-opcode. It should be obvious why this would be a lot slower. Sure, you could cheat with opcodes that can't read $2137 (like branches and jumps), but you're still talking at least 60% of opcodes that would need this support. This adds at least 200-300% overhead at the very best case scenario to the CPU emulation speed
I wouldn't worry about it. So what if the users of your emulator have to disable all the fancy effects (triple buffering... :D ) to make it playable? At least it will be playable, and not crash somewhere, or have gfx errors!
byuusan wrote:I know it'll kill speed which I'm already really bad at
Make it work first, then make it work fast. :)
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
zidanax
Rookie
Posts: 49
Joined: Thu Jul 29, 2004 5:17 am
Location: USA

Post by zidanax »

Personally, I think it would be nice to have an emulator that's extemely accurate, even if it's so slow it won't play at full speed on my computer.
byuu

Post by byuu »

The problem is that it likely won't play at full speed on any computers for a long time.
Dr. Dobbs recently wrote a nice document about how processor speeds are starting to reach their theoretical limits. Anyone notice how the 2GHz processor was released in 2001 or so? We're still not at 4GHz.
They're only working on multicore processors now, which won't do emulators (at least not for the SNES) any good at all.

Also, this won't really fix any games. Maybe one or two stubborn ones. I would wager quite a bit that virtually no games in existance care about most of this stuff we're finding out.

I have all of my findings saved, and I remember them in my mind as well, so don't worry about losing anything. I want to complete my understanding of something before trying to document it.
Noxious Ninja
Dark Wind
Posts: 1271
Joined: Thu Jul 29, 2004 8:58 pm
Location: Texas
Contact:

Post by Noxious Ninja »

creaothceann wrote:
byuusan wrote:in order to accurately emulate the latching counter, you would have to write a CPU core that actually works cycle-by-cycle, instead of opcode-by-opcode. It should be obvious why this would be a lot slower. Sure, you could cheat with opcodes that can't read $2137 (like branches and jumps), but you're still talking at least 60% of opcodes that would need this support. This adds at least 200-300% overhead at the very best case scenario to the CPU emulation speed
I wouldn't worry about it. So what if the users of your emulator have to disable all the fancy effects (triple buffering... :D ) to make it playable? At least it will be playable, and not crash somewhere, or have gfx errors!
byuusan wrote:I know it'll kill speed which I'm already really bad at
Make it work first, then make it work fast. :)
And people in the future will be grateful. Look at C64 emulation. Back when people realized they had to write cycle-exact emulators to make the last few things work right, they didn't have PCs fast enough to run them. Now, we do, and we don't have to worry about accuracy.
[u][url=http://bash.org/?577451]#577451[/url][/u]
Post Reply