I totally agree, university is a total waste of time and money. I didn't get straight A's in high school either but i can still write better code than anyone that did.byuusan wrote: I don't agree with college. For the specific fields I'm interested in, it's overrated and overvalued. The computer science stuff I saw from a roommate who was in college for it was a joke.
Would you rather hire a worker who spent his whole life programming and loved doing it, or someone who got into the profession at 20 looking to earn big money, but who doesn't really care about it?
As far as pursuing college anyway: I didn't have straight A's in high school, and since I'm a white male, that means I can't get a free scholarship. I can't afford to take a few years off work while I go to college, and I don't want to pay back student loans the rest of my life. Even if I did go to college, comp. sci. would be a waste of my time (what could they really teach me at this point?), and incredibly stupid (since most jobs in this field are being outsourced anyway).
Latch timing
Moderator: ZSNES Mods
I can attest to that as I can't write code at all. Ahahaha!Overload wrote:I totally agree, university is a total waste of time and money. I didn't get straight A's in high school either but i can still write better code than anyone that did.byuusan wrote: I don't agree with college. For the specific fields I'm interested in, it's overrated and overvalued. The computer science stuff I saw from a roommate who was in college for it was a joke.
Would you rather hire a worker who spent his whole life programming and loved doing it, or someone who got into the profession at 20 looking to earn big money, but who doesn't really care about it?
As far as pursuing college anyway: I didn't have straight A's in high school, and since I'm a white male, that means I can't get a free scholarship. I can't afford to take a few years off work while I go to college, and I don't want to pay back student loans the rest of my life. Even if I did go to college, comp. sci. would be a waste of my time (what could they really teach me at this point?), and incredibly stupid (since most jobs in this field are being outsourced anyway).
-
- ZSNES Shake Shake Prinny
- Posts: 5632
- Joined: Wed Jul 28, 2004 4:15 pm
- Location: PAL50, dood !
Why did you post a pic in a dev thread ?
皆黙って俺について来い!!
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Code: Select all
<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Probably... OTOH, i have a bit of a suspicion that for a "STA $420B ; NOP" the processor might just be paused for DMA after the opcode load but before the IO cycle, based on the open bus value.byuusan wrote:I figured HDMA would wait until the current instruction had completed, like NMIs and IRQs.
I haven't seen the extra cycles that would eat, with jumping to the abort vector and everything...A long shot here, but I'm curious. Does the SNES ever raise the /ABORT line on its own, or is that only doable via external device? Perhaps some things that act weird with certain opcodes (like PHP/RTI, CLI/RTI, etc. that you mentioned above) are a result of the SNES aborting the opcode part-way through?
I can see this being very plausible, except I'm not sure why the open bus value would matter.Probably... OTOH, i have a bit of a suspicion that for a "STA $420B ; NOP" the processor might just be paused for DMA after the opcode load but before the IO cycle, based on the open bus value.
Oh, it has its own vector? Nevermind, then.I haven't seen the extra cycles that would eat, with jumping to the abort vector and everything...
The open bus value is just the proof that the CPU is paused mid opcode.byuusan wrote:I can see this being very plausible, except I'm not sure why the open bus value would matter.Probably... OTOH, i have a bit of a suspicion that for a "STA $420B ; NOP" the processor might just be paused for DMA after the opcode load but before the IO cycle, based on the open bus value.
The 65816 always does an opcode load, even if a NMI/IRQ occurs. The datasheet shows the interrupt vector as having 2 IO's, but the first has VDA=VPA=1 which means that it is actually an opcode fetch. (H)DMA probably act just like a NMI.anomie wrote:Probably... OTOH, i have a bit of a suspicion that for a "STA $420B ; NOP" the processor might just be paused for DMA after the opcode load but before the IO cycle, based on the open bus value.byuusan wrote:I figured HDMA would wait until the current instruction had completed, like NMIs and IRQs.
The CLI/RTI issue is most likely due to the pipelining done by the cpu. The flags register is probably updated during the opcode fetch for the next instruction. This means that it will be too late for it to affect IRQ handling until the next instruction is finished. RTI/BRK/COP won't have this problem because they update the flags before the instruction is finished. The opcode pipelining example given on page 38 of the 65816 programming manual can be applied to CLI/SEI. SEP and REP will need an extra step to load an extra byte.I haven't seen the extra cycles that would eat, with jumping to the abort vector and everything...A long shot here, but I'm curious. Does the SNES ever raise the /ABORT line on its own, or is that only doable via external device? Perhaps some things that act weird with certain opcodes (like PHP/RTI, CLI/RTI, etc. that you mentioned above) are a result of the SNES aborting the opcode part-way through?
Why would CLI/SEI allow an IRQ but CLI/RTI not, due to pipelining?DMV27 wrote:The CLI/RTI issue is most likely due to the pipelining done by the cpu. The flags register is probably updated during the opcode fetch for the next instruction. This means that it will be too late for it to affect IRQ handling until the next instruction is finished.
Which manual is this now?The opcode pipelining example given on page 38 of the 65816 programming manual can be applied to CLI/SEI. SEP and REP will need an extra step to load an extra byte.
The manual is on the WDC website at the bottom right corner of the page (Programmanual.pdf).Which manual is this now?
If an IRQ is pending and the I flag is set, then the instruction sequence CLI / SEI / NOP will cause an interrupt to occur between the SEI and NOP. The cpu cycles for this sequence:Why would CLI/SEI allow an IRQ but CLI/RTI not, due to pipelining?
0.0 - Fetch opcode
1.0 - Interpret opcode as CLI
1.9 - Interrupt sampling: result - interrupts disabled
2.0 - Fetch opcode; Clear I flag
3.0 - Interpret opcode as SEI
3.9 - Interrupt sampling: result - interrupts enabled
4.0 - Fetch opcode; Set I flag; Start IRQ sequence
CLI / RTI is basically the same, except that the RTI will restore the flags register long before the next interrupt sampling time.
Ok, interesting news: it seems HDMA will interrupt an instruction mid-cycle to execute, no waiting until the end of the instruction. Ick.
I set up a Byuu test ROM to HDMA frop open bus into $2180, and then I manually disable HDMA after the first scanline (when HDMA always runs). The code in the general vicinity is:
If the latch occurs at $112.5, HDMA's open bus value is $3F. $113.0-$114.0 give $AD. $114.5-$115.5 give $21. After this, HDMA is occurring before the latch. "$116.0"-"$117.0" give $21, but of course the actual latched values are screwy. "$117.5"-"$118.5" give $37. And so on.
There does seem to be a pattern in the HDMA delay, too. But it's nothing clear-cut. A series of SlowROM NOPs gives a delay pattern of 40-32-32-36-36-30-30, while a series of SlowROM LDA $0000s gives 32-32-32-40 and a series of various pure-FastROM instructions gives 30-36-36. And there does seem to be some pipelining going on or something, at least some values don't stay on open bus as long as it seems they should.
Ick.
I set up a Byuu test ROM to HDMA frop open bus into $2180, and then I manually disable HDMA after the first scanline (when HDMA always runs). The code in the general vicinity is:
Code: Select all
6B RTL
...
a9 20 LDA #$20
9C 81 21 STZ $2181
8D 82 21 STA $2182
9C 83 21 STZ $2183
EA NOP
AD 37 21 LDA $2137
AD 3F 21 LDA $213F
There does seem to be a pattern in the HDMA delay, too. But it's nothing clear-cut. A series of SlowROM NOPs gives a delay pattern of 40-32-32-36-36-30-30, while a series of SlowROM LDA $0000s gives 32-32-32-40 and a series of various pure-FastROM instructions gives 30-36-36. And there does seem to be some pipelining going on or something, at least some values don't stay on open bus as long as it seems they should.
Ick.
Wow, excellent. Don't know how I missed that document. Thanks.The manual is on the WDC website at the bottom right corner of the page (Programmanual.pdf).
Ick, indeed. I think it's doable, though. With a CPU core that goes cycle-by-cycle, you'd just have to make it update the open bus value for each cycle, and have the update_cycle_counter() routine test and trigger HDMA. This should allow you to resume the rest of the opcode after completing the HDMA mid-opcode.Ok, interesting news: it seems HDMA will interrupt an instruction mid-cycle to execute, no waiting until the end of the instruction. Ick.
The only delay pattern I notice in your results is that if you remove the first result, it seems to always repeat the same result twice. Rather obvious, though.
Damn, my UFO must be interfering with Open Bus. I get 00 no matter what open bus register I read. I tried $21c2,$21c3,$21ff,$2200,$22ff, and $4200.Try this code for starters:If it is Open Bus, it'll read $0000 for the first read and $ffff for the second.Code: Select all
PEA $21c2 PLD LDX $00 ; will read $21c2 STX wherever PEA $21c2-$ff PLD LDX $ff ; will read $21c2 STX wherever PEA $0 PLD
I tried this:
Code: Select all
sep #$30
pea $21c2 : pld
lda $00 : sta $7ec000 ;result 1
pea $21c2-$ff : pld
lda $ff : sta $7ec002 ;result 2
Code: Select all
sep #$30
lda $21c2 : sta $7ec000 ;result 1
lda $21c3 : sta $7ec002 ;result 2
My SNES+UFO pass the SNES Test Program electronics test, too; but I have no idea what all that tests.
I don't want to implement something I can't test :(
I'll have to put this off a few months until I can afford a GD7 I suppose.
This also means I'll be unable to help you with the HDMA tests you're performing currently.
The test ROM I used, with source, is here. Perhaps you could try it and see what you get? I reused the print code from an older test of mine, so just ignore that part.
Sorry for the triple post now, but this is a long one:
I was trying to match $2137 latch timings for when the SNES is first powered on, and at hblank.
Code for this (no code is executed before it):
Results:
There's a 4 cycle difference.
If we believe:
And that the actual latch occurs immediately before the actual read cycle, then the initial cycle position would be 188.
Assuming the SNES starts at 188, then the current CPU state is:
Scanline: 0, HCycle Pos: 1178, HDot Pos: 294, Master Cycle Pos: 1178
NMI: Disabled, VIRQ: Disabled, HIRQ: Disabled, VIRQ Pos: 0, HIRQ Pos: 0
CPU Status: { PC:008007 A:0040 X:0000 Y:0000 S:01ff D:0000 DB:00 }
APU Status: { PC:ffc7 A:00 X:ec Y:00 SP:ef YA:0000 }
The logic of lda addr:
So then that gives us:
1202, which is immediately before the actual read occurs (opcode cycle 4). But the SNES says the latch occurs at 1198, 4 master cycles sooner.
This presents a problem for me to implement. Basically, this suggests that the latch occurs half-way through cycle 3 [pbr,pc+2 ; aah]. I guess it could be possible that cycle 3 is actually reading the address ($2137), and just contains the address/data bus values from the previous cycle, which could be explained by pipelining. Perhaps the counter is latched mid-cpu-cycle. However, if I move this back by 4, then I would also have to move back the start position by 4. The results would still be off by 4 if I were to do this.
If I were to move each result back by 4, then add 4 to the initial latch value, the result would still be incorrect.
The only theory I can think of is that the 21mhz clock cycle is latched between the cpu cycle (that is 6-12 master clock cycles) that reads the byte, and that position can vary, even between identical opcodes.
Maybe a diagram will be more clear. The latch range can occur anywhere between the brackets.
...Are we really going to have to reverse-engineer the 65816's pipelining to figure this all out? I don't think I've ever even heard of an emulator that emulated the low-level logic of a CPU like this before.
I was trying to match $2137 latch timings for when the SNES is first powered on, and at hblank.
Code for this (no code is executed before it):
Code: Select all
start {
lda $2137
}
hblank {
- lda $4212 : bit #$40 : beq -
lda $2137
}
Code: Select all
SNES:
$0035 ( 53.0 - 212) start
$012b (299.5 - 1198) hblank
246.5 dots (986 cycles) difference
Emulation:
$0035 ( 53.0 - 212) start
$012c (300.5 - 1202) hblank
247.5 dots (990 cycles) difference
If we believe:
Code: Select all
Initial latch values for $213c/$213d
0035:0000 (53.0 -> 212) [lda $2137]
0038:0000 (56.5 -> 226) [nop : lda $2137]
Assuming the SNES starts at 188, then the current CPU state is:
Scanline: 0, HCycle Pos: 1178, HDot Pos: 294, Master Cycle Pos: 1178
NMI: Disabled, VIRQ: Disabled, HIRQ: Disabled, VIRQ Pos: 0, HIRQ Pos: 0
CPU Status: { PC:008007 A:0040 X:0000 Y:0000 S:01ff D:0000 DB:00 }
APU Status: { PC:ffc7 A:00 X:ec Y:00 SP:ef YA:0000 }
The logic of lda addr:
Code: Select all
/**********************
*** 0xad: lda addr ***
**********************
cycles:
[1 ] pbr,pc ; opcode
[2 ] pbr,pc+1 ; aal
[3 ] pbr,pc+2 ; aah
[4 ] dbr,aa ; data low
[4a] dbr,aa+1 ; data high [1]
*/
Code: Select all
1178: read opcode [+8] -> 1186
1186: read operand [+8] -> 1194
1194: read operand [+8] -> 1202
This presents a problem for me to implement. Basically, this suggests that the latch occurs half-way through cycle 3 [pbr,pc+2 ; aah]. I guess it could be possible that cycle 3 is actually reading the address ($2137), and just contains the address/data bus values from the previous cycle, which could be explained by pipelining. Perhaps the counter is latched mid-cpu-cycle. However, if I move this back by 4, then I would also have to move back the start position by 4. The results would still be off by 4 if I were to do this.
If I were to move each result back by 4, then add 4 to the initial latch value, the result would still be incorrect.
The only theory I can think of is that the 21mhz clock cycle is latched between the cpu cycle (that is 6-12 master clock cycles) that reads the byte, and that position can vary, even between identical opcodes.
Maybe a diagram will be more clear. The latch range can occur anywhere between the brackets.
Code: Select all
o = tick, lr = latch range
[..lr..]
21mhz clock: oooooooo oooooooo [oooooo] oooooo
65816 clock: o------- o------- [o-----] o-----
cycle1 cycle2 cycle3 cycle4
Ummm... i wrote this all down at one point. Let's see...byuusan wrote:My SNES+UFO pass the SNES Test Program electronics test, too; but I have no idea what all that tests.
The testing begins at ~$00:84b3 with something about scratchpad ram, then $2180-3 to read/write all RAM, then see if $00:1000 mirrors $20:0000, then test read/write to $7e:2000-$7f:ffff, then something i didn't understand at $00:8a5e, then test VRAM writes (low byte and high byte increment, i think), test read/write to all DMA $43xx registers, test read/write to OAM, test read/write CGRAM, test multiplication via $4202/3, test multiplication via $211b/c, test $4204-6 division, DMA into and out of VRAM, V-blank and H-blank positions using $4201 bit 7 latch. Then i stopped paying attention from $00:9373 to $00:959a (snes9x passed all this and i was getting sick of tracing) where it tests whether $213f bit 7 works in interlace mode, whether $4212 bits 6 and 7 latch with $2137 in the correct ranges, whether RTO flags work, whether the BIT instruction sets the Z flag, whether LDA [$00] works right, and ("maybe") whether ROM can be written to.
Even though there's that area I never looked at, there's no indication they did anything like testing open bus values, it's all straightforward stuff that should be working anyway.
Neat, I don't have quite a bit of that stuff in place yet. Thanks for the info. BTW, what's scratchpad ram?
So that would mean $00:1000 = $20:1000, right?
No thoughts on why Open Bus doesn't work for me? I would prefer not to purchase another copier if I don't have to :/
I'm going to go over the hblank latch timing stuff some more on my next day off. Maybe I can figure something out there.
I have $20:0000-1fff mapped to $00:0000-1fff...then see if $00:1000 mirrors $20:0000
So that would mean $00:1000 = $20:1000, right?
No thoughts on why Open Bus doesn't work for me? I would prefer not to purchase another copier if I don't have to :/
I'm going to go over the hblank latch timing stuff some more on my next day off. Maybe I can figure something out there.
Probably $0000-$1fff in banks $00-$3f and $80-$bf...byuusan wrote:BTW, what's scratchpad ram?
Err... yeah, typo. Sorry.I have $20:0000-1fff mapped to $00:0000-1fff...then see if $00:1000 mirrors $20:0000
So that would mean $00:1000 = $20:1000, right?
Unfortunately, it probably is the copier. BTW, if your copier is returning 0s for everything that's supposed to be open bus, i wonder if it breaks the rotating barrels in DKC2 so they won't stop like snes9x used to?No thoughts on why Open Bus doesn't work for me? I would prefer not to purchase another copier if I don't have to :/
Ok, I finally got a few hours free to work on timing some more. This is going to be a long one, so you may wanna get some coffee now.
First up, the previous post I made about the 4-cycle hblank difference was my own fault. The problem was that I was checking for the high byte of the old program counter to be different from the high byte of the new program counter before incrementing the counter by 2 after branch instructions. So it was thinking the jump went back to $7ffe and adding cycle 2b to the counter when it shouldn't have been. My results now match the SNES. I also fixed a small bug where I was adding the HDMA channel init. cycles on the very first scanline. With those two issues fixed, I was able to *perfectly* match cycles with my demo_optiming.smc file to a real SNES. I modified that rom and made demo_hvblank.smc by reading $4212 where $2137 was read before. With that, I was able to get the exact positions where bits 6 and 7 are set/cleared in $4212. The results were interesting.
First, here are the two ROMs and their source code:
http://setsuna.the2d.com/files/hvtiming.zip
And here is the latest version of bsnes, which matches the above two ROMs perfectly:
http://setsuna.the2d.com/files/bsnes_v004_wip10.rar
You can use these to verify my results below.
I obviously could not read $4212 and $2137 at the exact same time, so I modified my emulator until the results matched the real SNES.
The results are as follows:
hblank is set at dot 274.5 (cycle 1098), and cleared at dot 1.5 (cycle 6)
vblank is set at dot 0.5 (cycle 2) of scanline 225/240 (depending on overscan), and cleared at dot 0.5 (cycle 2) of scanline 0.
I know these don't match your results, this is probably because I'm now actually latching the counters at the exact cycle position they should be (as far as we know). There's still the chance that the counter is really latched at a different time within the cycle that reads from $2137, but using the above logic, and assuming the SNES starts execution of the reset vector at cycle 188, then the above is correct. If we find out later on that this is incorrect, we can adjust the h/v/blank set/clear positions by hand, with no need to retest the results.
This is the test I used on demo_optiming.smc to verify that the cycle timing (for the code that was executed) matches the real SNES:
I tried many other values, and obviously was able to match up the h/v/blank counters as well, so I'm fairly confident that there are no timing flaws (that are exposed) in the opcodes that are used in my demo programs.
Now on to some new information. I saw that your registers.txt file was lacking some information on $4201 bit 7, so I decided to fill that in.
Upon SNES power on/reset, this value is reset to $ff. In order to latch the counters by reading $2137, bit 7 of $4201 has to be set. If you write a 0 to bit 7 of $4201, it will latch the counters, and then disable the ability to latch the counters again. Writing a 0 again to $4201 or reading from $2137 during this time has no effect, the latches stay the same as when the 0 was first written to $4201. You can re-enable the latches by writing a 1 to bit 7 of $4201 again. This does *not* update the latches, but you can now either read from $2137, or write 0 to bit 7 of $4201 to latch the counters again.
Examples (+ = counter latched, - = counter not latched):
As far as the exact cycle the counter is latched... it is latched 4 master cycles into the write cycle that writes the 0 to $4201 bit 7. Or at least, it is latched 4 master cycles later than reading from $2137 would be. I believe this is because with reading $2137, the PPU is alerted about the read before the CPU gets the result. With writing to $4201, the CPU knows the value before the PPU has it. So it is taking an additional 4 cycles before the PPU realizes that $4201 bit 7 should be cleared. Sound reasonable enough to you?
I haven't implemented the +4 master cycle delay into bsnes, so my results are still off by 4 master cycles, as shown below:
The nop was used to see that both started on .0 of the dot, instead of .5
The delay should be trivial to add, though. Just add 4 cycles to the x/y latch position.
First up, the previous post I made about the 4-cycle hblank difference was my own fault. The problem was that I was checking for the high byte of the old program counter to be different from the high byte of the new program counter before incrementing the counter by 2 after branch instructions. So it was thinking the jump went back to $7ffe and adding cycle 2b to the counter when it shouldn't have been. My results now match the SNES. I also fixed a small bug where I was adding the HDMA channel init. cycles on the very first scanline. With those two issues fixed, I was able to *perfectly* match cycles with my demo_optiming.smc file to a real SNES. I modified that rom and made demo_hvblank.smc by reading $4212 where $2137 was read before. With that, I was able to get the exact positions where bits 6 and 7 are set/cleared in $4212. The results were interesting.
First, here are the two ROMs and their source code:
http://setsuna.the2d.com/files/hvtiming.zip
And here is the latest version of bsnes, which matches the above two ROMs perfectly:
http://setsuna.the2d.com/files/bsnes_v004_wip10.rar
You can use these to verify my results below.
Code: Select all
demo_hvblank.smc {
hblank set at: 0: 0: 55: 56:0:NI
hblank cleared at: 0: 0: 63: 57:0:NI
vblank set at: 4095: 550:3294:4009:0:NI
vblank cleared at: 4002:4095:2881:3309:0:NI
}
The results are as follows:
hblank is set at dot 274.5 (cycle 1098), and cleared at dot 1.5 (cycle 6)
vblank is set at dot 0.5 (cycle 2) of scanline 225/240 (depending on overscan), and cleared at dot 0.5 (cycle 2) of scanline 0.
I know these don't match your results, this is probably because I'm now actually latching the counters at the exact cycle position they should be (as far as we know). There's still the chance that the counter is really latched at a different time within the cycle that reads from $2137, but using the above logic, and assuming the SNES starts execution of the reset vector at cycle 188, then the above is correct. If we find out later on that this is incorrect, we can adjust the h/v/blank set/clear positions by hand, with no need to retest the results.
This is the test I used on demo_optiming.smc to verify that the cycle timing (for the code that was executed) matches the real SNES:
Code: Select all
demo_optiming.smc {
4095:4095:4095:4095:0:NI = 172x,47y
4094:4095:4095:4095:0:NI = 168x,47y
}
Now on to some new information. I saw that your registers.txt file was lacking some information on $4201 bit 7, so I decided to fill that in.
Upon SNES power on/reset, this value is reset to $ff. In order to latch the counters by reading $2137, bit 7 of $4201 has to be set. If you write a 0 to bit 7 of $4201, it will latch the counters, and then disable the ability to latch the counters again. Writing a 0 again to $4201 or reading from $2137 during this time has no effect, the latches stay the same as when the 0 was first written to $4201. You can re-enable the latches by writing a 1 to bit 7 of $4201 again. This does *not* update the latches, but you can now either read from $2137, or write 0 to bit 7 of $4201 to latch the counters again.
Examples (+ = counter latched, - = counter not latched):
Code: Select all
$00->$4201+
$80->$4201-
$00->$4201+ $00->$4201- $80->$4201- $00->$4201+
$2137->a+ $00->$4201+ $2137->a-
I haven't implemented the +4 master cycle delay into bsnes, so my results are still off by 4 master cycles, as shown below:
Code: Select all
$4201 bit 7 latch position
[code used: lda #$00 : sta $4201 -> nop : lda #$00 : sta $4201] {
snes: 3a -> 3d [58.0]
bsnes: 39 -> 3c [57.0]
}
The delay should be trivial to add, though. Just add 4 cycles to the x/y latch position.
Hrm... What position during the 6-cycle read are you latching, and what position during the 6-cycle read are you 'latching' the bits of $4212? My numbers are based on the two being the same, your numbers seem to have the two different by 2 master cycles.byuusan wrote:I know these don't match your results, this is probably because I'm now actually latching the counters at the exact cycle position they should be (as far as we know).
Given that, your numbers are consistant with mine: they're all .5 dots greater.
Was it? Hrm... Come to think of it, i probably haven't updated it with the timing findings, just timing.txt.Now on to some new information. I saw that your registers.txt file was lacking some information on $4201 bit 7, so I decided to fill that in.
I've known that writing 0 disables latching, and writing 0 then 1 will latch, but i've never gotten around to testing whether it latches on the writing 0 or the writing 1 to the bit. Writing 0 also disables latching by the attached controller (e.g. the Supescope), BTW.If you write a 0 to bit 7 of $4201, it will latch the counters, and then disable the ability to latch the counters again. [...] You can re-enable the latches by writing a 1 to bit 7 of $4201 again. This does *not* update the latches, but you can now either read from $2137, or write 0 to bit 7 of $4201 to latch the counters again.
I should double-check this one, i think. My guess would just be that $2137 goes "Address Bus B -> PPU", while $4201 goes "internal CPU address bus -> IO port -> PPU latch pin".As far as the exact cycle the counter is latched... it is latched 4 master cycles into the write cycle that writes the 0 to $4201 bit 7. Or at least, it is latched 4 master cycles later than reading from $2137 would be. I believe this is because with reading $2137, the PPU is alerted about the read before the CPU gets the result. With writing to $4201, the CPU knows the value before the PPU has it. So it is taking an additional 4 cycles before the PPU realizes that $4201 bit 7 should be cleared. Sound reasonable enough to you?
I haven't gotten to test, but i've been thining about the interrupt delay and how it seems to change with the speed of the previous instruction. I'm thinking now that it may well be more simple than that: the delay depends on only the final cycle of the previous instruction and the opcode-load of the next (interrupted) instruction. And not that it matters (the interrupt processing first pushes PB which alters open bus), but Open Bus would be said opcode of the interrupted instruction (loaded during that first "IO" cycle of the interrupt processing as DMV27 said).
As for the HDMA delay... i still have no idea how that is determined. Any ideas anyone?
Unfortunately, they're definately the same. Both are MMIO reads where the counter is updated right before any read occurs between $00:$2000-$5fff, and both use the same exact opcode (lda addr). Memory regions $00:$2000-$3fff and $00:$4200-$5fff are both mapped as being FastROM speed (6 master cycles).Hrm... What position during the 6-cycle read are you latching, and what position during the 6-cycle read are you 'latching' the bits of $4212? My numbers are based on the two being the same, your numbers seem to have the two different by 2 master cycles.
The exact position within the opcode where the latching occurs is at master cycle 0 of that cpu cycle.
Example:
master cycle position = 0
opcode -> lda $2137 [a=8]
[1 ] pbr,pc ; opcode
[2 ] pbr,pc+1 ; aal
[3 ] pbr,pc+2 ; aah
[4 ] dbr,aa ; data low
read the opcode in, add cycle 1, master cycle pos = 8
read aal, add cycle 2, master cycle pos = 16
read aah, add cycle 3, master cycle pos = 24
read data low (read from $2137), counter is latched at master cycle pos = 24, now add cycle 4, making master cycle pos = 30
I do the same thing with writes, but by adding 4 to the cycle pos (24+4->28), it would match the SNES results.
If it turns out that it's not the read vs. write thats causing that 4 cycle difference, and its because its crossing CPU->IO->PPU instead, then that would mean that $4212 should also be slower than $2137 at updating values. Maybe it is...
I'm still struggling to catch up to you as it is :PAs for the HDMA delay... i still have no idea how that is determined. Any ideas anyone?
For some reason, I think my DRAM refresh is messed up again with the new cycle-by-cycle timing, so I need to fix that first. Then look at NMI/IRQ real quick, and then I'll try and give you a hand with HDMA.
Information looks good. We did cover this a couple of years ago when we were doing open bus tests.byuusan wrote: Now on to some new information. I saw that your registers.txt file was lacking some information on $4201 bit 7, so I decided to fill that in.
Upon SNES power on/reset, this value is reset to $ff. In order to latch the counters by reading $2137, bit 7 of $4201 has to be set. If you write a 0 to bit 7 of $4201, it will latch the counters, and then disable the ability to latch the counters again. Writing a 0 again to $4201 or reading from $2137 during this time has no effect, the latches stay the same as when the 0 was first written to $4201. You can re-enable the latches by writing a 1 to bit 7 of $4201 again. This does *not* update the latches, but you can now either read from $2137, or write 0 to bit 7 of $4201 to latch the counters again.
http://www.snes9x.com/forum/topic.asp?TOPIC_ID=7293
Something else that might be worth mentioning. Clearing bit 7 of $4201 also disables the ability to clear the latch signal when you read $213f. If you set bit 7 of $4201, then clear it, the PPU2 will latch and bit 6 of $213f is set. Because latching is now disabled, no matter how many times you read $213f the latch signal will not clear until you set bit 7 of $4201.
Ah, darn. Wasn't in the docs I read, no big deal though. Confirmation is good :)We did cover this a couple of years ago when we were doing open bus tests.
Does this apply to $2137, too? Like say $4201 bit 7 is set, and $213f bit 6 is clear. When I read $2137, I take it the next read from $213f will have bit 6 set, and all subsequent reads will have $213f bit 6 cleared?Something else that might be worth mentioning. Clearing bit 7 of $4201 also disables the ability to clear the latch signal when you read $213f. If you set bit 7 of $4201, then clear it, the PPU2 will latch and bit 6 of $213f is set. Because latching is now disabled, no matter how many times you read $213f the latch signal will not clear until you set bit 7 of $4201.
By the way Overload, do you have any knowledge on the Super UFO 8 line of copiers, or ideas of how/why open bus would not be mapped for me? I'd really need to get that working on my copier in order for me to emulate it properly, and I really can't wait to get started on this open bus thing.
Another day, another step closer to perfect timing emulation.
In order to get DRAM refresh matching my SNES with the new cycle-by-cycle core, I actually had to make it occur mid-opcode. I also noticed that I was using the pre-adjusted address in determining memory timing, so if D=2100 and you used lda $37, it was counting that as a read from $000037 >_<
With that, I was able to get nearly identical results to my old SRAM file from your PPU Speed Test.smc file. I know why it's different already, it's because I didn't reset the SNES before running the test. I tried to do the test again tonight, but unfortunately, it does not work after resetting the SNES. I just get a blank screen. I tried it ~10 times, and tried resetting immediately when the program started and whatnot. Nothing seems to work :/ The program works fine when I reset it in my emulator.
Here's 0x010a in the UFO RAM file, noting the initial latch positions were wrong:
FF7FAB2A00000000000000000000000000000000AB2AFE7F
And here's 0x010a in ppu.srm:
FE7FAA2A00000000000000000000000000000000AA2A0080
And here's UFO RAM 0x0282:
FDFFAAAAA8AAA8AAFFFFA8AA
ppu.srm 0x0282:
A8AAFDFFAAAAA8AAA8AAFFFF
Keep in mind that I forced the longer dots to 322/326 as a compromise, my SNES changes every reset (it was 321/325 in that test), and yours uses 323/327 every time.
The rest are all 0xaaa8-0xaaaa, as they should be. So my conclusion is that DRAM refresh probably does occur mid-opcode. The reason we thought it didn't was because we were only updating the latches at the end of the opcode.
While I was trying to get your PPU Speed Test to give correct results again, I tried to fill in the blanks with cycle timings for some odd opcodes.
First is rep/sep #const.
[1] pbr,pc ; opcode
[2] pbr,pc+1 ; idl
[2a] pbr,pc+2 ; idh [1][6]
1 - add 1 byte (for immediate only) for m=0 or x=0, add 1 cycle for m=0 or x=0. rep, sep are always 3 cycle instructions and vpa is low during the third cycle. the address bus is pbr,pc+1 during the third cycle.
6 - add 1 cycle if branch is taken in across page boundaries in 6502 emulation mode (e=1).
First of all, 6 doesn't even apply here. The first half of 1 doesn't apply to rep/sep (it was intended for adc/sbc #const, etc.), and the second half is vague.
If VDA=0 and VPA=0, then it's an I/O cycle (6 master cycles). However, no such cycle exists. I made a test program that executed:
clc : xce
rep #$20 : sep #$20 ;... repeat these two 32 times for a total of 64 opcodes
lda $2137
And I got the latch value 0001:0051. To match this through emulation, I had to remove cycle 2a from rep/sep. Therefore, either the document is wrong, or this is a quirk specific to the SNES version of the CPU. But rep/sep is only two cycles, one opcode fetch and one operand fetch.
Next up, I figured out how condition 4 works.
Example opcode: lda addr,x
4 - add 1 cycle for indexing across page boundaries, or write, or x=0. when in emulation mode, this cycle contains invalid addresses.
It means exactly as it says, but there are some weird quirks with it as well.
Assume: m=1/x=1 {
ldy #$00 : lda $20ff,y
} -> This will not trigger cycle 3a. The boundary was not crossed.
Assume: m=1/x=1 {
ldy #$01 : lda $20ff,y
} -> This will trigger cycle 3a. The boundary was crossed.
Assume: m=1/x=0 {
ldy #$0000 : lda $2000,y
} -> This will trigger cycle 3a. X=0
Assume: m=1/x=0 {
ldy #$0001 : lda $20ff,y
} -> This will trigger cycle 3a. X=0, even though it crosses the boundary, and x=0, it only gets one additional cycle.
Assume: m=1/x=1 {
ldy #$00 : sta $2000,y
} -> This will trigger cycle 3a. A write was performed.
Assume: m=0/x=1 {
ldy #$01 : lda $20ff,y
} -> This will trigger cycle 3a. The bank boundary was crossed.
Now for the weird one.
Assume: m=0/x=1 {
ldy #$00 : lda $20ff,y
} -> This will *not* trigger cycle 3a. The first read does not cross the page boundary, and even though the second read does (reads from $2100), it doesn't count. This should be expected, given cycle 3a couldn't occur after cycle 4, but it's still weird. Why is cycle 3a even needed, then? If it's able to cross the page boundary on the second read with no overhead, why is there overhead during the first cycle?
Oh, and the above condition 4 notes hold true regardless of whether you are in emulation mode or native mode. I tried both for all but the m/x=0 ones (for obvious reasons).
In order to get DRAM refresh matching my SNES with the new cycle-by-cycle core, I actually had to make it occur mid-opcode. I also noticed that I was using the pre-adjusted address in determining memory timing, so if D=2100 and you used lda $37, it was counting that as a read from $000037 >_<
With that, I was able to get nearly identical results to my old SRAM file from your PPU Speed Test.smc file. I know why it's different already, it's because I didn't reset the SNES before running the test. I tried to do the test again tonight, but unfortunately, it does not work after resetting the SNES. I just get a blank screen. I tried it ~10 times, and tried resetting immediately when the program started and whatnot. Nothing seems to work :/ The program works fine when I reset it in my emulator.
Here's 0x010a in the UFO RAM file, noting the initial latch positions were wrong:
FF7FAB2A00000000000000000000000000000000AB2AFE7F
And here's 0x010a in ppu.srm:
FE7FAA2A00000000000000000000000000000000AA2A0080
And here's UFO RAM 0x0282:
FDFFAAAAA8AAA8AAFFFFA8AA
ppu.srm 0x0282:
A8AAFDFFAAAAA8AAA8AAFFFF
Keep in mind that I forced the longer dots to 322/326 as a compromise, my SNES changes every reset (it was 321/325 in that test), and yours uses 323/327 every time.
The rest are all 0xaaa8-0xaaaa, as they should be. So my conclusion is that DRAM refresh probably does occur mid-opcode. The reason we thought it didn't was because we were only updating the latches at the end of the opcode.
While I was trying to get your PPU Speed Test to give correct results again, I tried to fill in the blanks with cycle timings for some odd opcodes.
First is rep/sep #const.
[1] pbr,pc ; opcode
[2] pbr,pc+1 ; idl
[2a] pbr,pc+2 ; idh [1][6]
1 - add 1 byte (for immediate only) for m=0 or x=0, add 1 cycle for m=0 or x=0. rep, sep are always 3 cycle instructions and vpa is low during the third cycle. the address bus is pbr,pc+1 during the third cycle.
6 - add 1 cycle if branch is taken in across page boundaries in 6502 emulation mode (e=1).
First of all, 6 doesn't even apply here. The first half of 1 doesn't apply to rep/sep (it was intended for adc/sbc #const, etc.), and the second half is vague.
If VDA=0 and VPA=0, then it's an I/O cycle (6 master cycles). However, no such cycle exists. I made a test program that executed:
clc : xce
rep #$20 : sep #$20 ;... repeat these two 32 times for a total of 64 opcodes
lda $2137
And I got the latch value 0001:0051. To match this through emulation, I had to remove cycle 2a from rep/sep. Therefore, either the document is wrong, or this is a quirk specific to the SNES version of the CPU. But rep/sep is only two cycles, one opcode fetch and one operand fetch.
Next up, I figured out how condition 4 works.
Example opcode: lda addr,x
Code: Select all
[1 ] pbr,pc ; operand
[2 ] pbr,pc+1 ; aal
[3 ] pbr,pc+2 ; aah
[3a] dbr,aah,aal+xl ; io [4]
[4 ] dbr,aa+x ; data low
[4a] dbr,aa+x+1 ; data high [1]
It means exactly as it says, but there are some weird quirks with it as well.
Assume: m=1/x=1 {
ldy #$00 : lda $20ff,y
} -> This will not trigger cycle 3a. The boundary was not crossed.
Assume: m=1/x=1 {
ldy #$01 : lda $20ff,y
} -> This will trigger cycle 3a. The boundary was crossed.
Assume: m=1/x=0 {
ldy #$0000 : lda $2000,y
} -> This will trigger cycle 3a. X=0
Assume: m=1/x=0 {
ldy #$0001 : lda $20ff,y
} -> This will trigger cycle 3a. X=0, even though it crosses the boundary, and x=0, it only gets one additional cycle.
Assume: m=1/x=1 {
ldy #$00 : sta $2000,y
} -> This will trigger cycle 3a. A write was performed.
Assume: m=0/x=1 {
ldy #$01 : lda $20ff,y
} -> This will trigger cycle 3a. The bank boundary was crossed.
Now for the weird one.
Assume: m=0/x=1 {
ldy #$00 : lda $20ff,y
} -> This will *not* trigger cycle 3a. The first read does not cross the page boundary, and even though the second read does (reads from $2100), it doesn't count. This should be expected, given cycle 3a couldn't occur after cycle 4, but it's still weird. Why is cycle 3a even needed, then? If it's able to cross the page boundary on the second read with no overhead, why is there overhead during the first cycle?
Oh, and the above condition 4 notes hold true regardless of whether you are in emulation mode or native mode. I tried both for all but the m/x=0 ones (for obvious reasons).