Synchronizing multiple clocks of varying frequencies

Strictly for discussing ZSNES development and for submitting code. You can also join us on IRC at irc.libera.chat in #zsnes.
Please, no requests here.

Moderator: ZSNES Mods

byuu

Synchronizing multiple clocks of varying frequencies

Post by byuu »

I'm trying to come up with a way to synchronize the SNES main clock (315/88*6mhz) with the APU clock (~1.024mhz) down to individual cycles of each clock -- I plan to emulate each at the cycle level for fun. The problem is that I can't seem to do that with a 32-bit number.

To show the problem on a small scale. Let's say the CPU is 6 ticks/second, the PPU is then 4 ticks/second, and the APU is 1.5 ticks/second. You could just use 6*4*1.5=36 ticks/second for your main system-wide clock speed, and call one CPU cycle every 36/6 ticks, the PPU every 36/4 ticks, the APU every 36/1.5 ticks.
The problem is that multiplying 21.477 million by 1.024 million creates an astronomical number somewhere in the trillions. And I still have the DSP clock to worry about (~24mhz).
I also tried converting the mhz into seconds using 1/21.477million*current_pos, but as you can guess, even 64-bits of floating point doesn't have nearly enough accuracy for that to be effective.

So how do other emulators manage to perfectly (or close to perfectly) sync mhz+ clocks running at different speeds? The code needs to be as perfect as possible, so that it's accurate down to each cycle being executed in the correct order. I can't use something that synchronizes every nth of a second. I prefer to keep the decimal place accuracy on the main CPU clock (21477272.72727272...) across seconds if possible, but it's probably not too important.

Edit: Ok, the DSP is 24576000, and since 24576000/1024000 = 24, and the PPU/CPU can run off the same clock, I just need to synchronize two clocks: Clock A: 24.576mhz, Clock B: 21.477272...mhz
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Post by creaothceann »

Can you use several variables? (Like we do with seconds, minutes and hours, because expressing everything in seconds would make the numbers too large.)
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
byuu

Post by byuu »

No.

Test code
This is a stab at trying to do this with doubles. I'm pretty sure this is the best I'll ever get, but it's hopelessly slow. It's still not perfect, either: a lot of information was lost on the decimal place. Any help speeding up the core part of the loop would be greatly appreciated.
Two clocks should suffice, the 65816 can be derived via cpu / { 6, 8, 12}, the ppu via cpu / 4, and the spc700 via apu / 24.
blackmyst
Zealot
Posts: 1161
Joined: Sun Sep 26, 2004 8:36 pm
Location: Place.

Post by blackmyst »

I suppose at the processing power we can get these days, accuracy would take precedence over speed - even if the speed is very low...

Would variable clock cycles work? Or would that just break things all to hell?
[size=75][b]Procrastination.[/b]
Hard Work Often Pays Off After Time, but Laziness Always Pays Off Now.[/size]
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

You're syncing the two by finding a common denominator using the multiply trick.

While that works, it's not neccesarily the lowest common denominator.

Let's say the CPU is 6 ticks/second, the PPU is then 4 ticks/second, and the APU is 1.5 ticks/second. You could use 12 ticks/second for your main system-wide clock speed, and call one CPU cycle every 12/6 (2) ticks, the PPU every 12/4 (3) ticks, the APU every 12/1.5 (8) ticks.

Edit: I just gave a quick glance at the code. This isn't related but:

Code: Select all

inline void run_cpu(void) {
Is not valid C or C++ prototype.
You can only inline in C++ which is fine since that's what you seem to be defining the file as.
Setting void however as the parameter while fine in C is not allowed according to the C++ standard.
I doubt some compilers would care, but you should leave the () emtpy.
Last edited by Nach on Fri Apr 15, 2005 2:02 pm, edited 2 times in total.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Yay, more dev topics! (:
byuusan wrote:I'm trying to come up with a way to synchronize the SNES main clock (315/88*6mhz) with the APU clock (~1.024mhz) down to individual cycles of each clock -- I plan to emulate each at the cycle level for fun.
I'm thinking we'll have to do the 5A22 at the cycle level to get DMA/HDMA timing right. :( But will it work out better to code it as a state machine with a state for each cycle of the opcode, or to keep the opcodes as-is except test for events (DMA, HDMA, bit twiddling, etc) between each cycle? For SPC700 there's only the CPU<->APU communication to worry about, i wonder whether it would emulate faster running cycle by cycle or buffering writes to the comm registers?

As for the clocks... It gets worse when you remember that PAL has a different CPU clock speed from NTSC (best number we have is 21281370); it gets better when you remember that the SPC700 clock doesn't actually run at 1024000, IIRC the closest number we've gotten is ~1024800 and the highest is ~1026900. But whatever speed this is, at least it's still 1/24 the 24MHz APU clock. And any external chips will probably be clocked relative to one of the above two clocks, via CC1 (that's "Cart Connector"), CC33, CC57, EP21 ("Expansion Port"), EP22, and maybe CC56, all of which are based off of one of the two clocks above.

Why is the variable speed on the APU better? Since the APU clock is so variable, we can pick a convenient value to emulate that gives us pleasant numbers. Choose anywhere from 0xdf8c/0x10000 to 0xdf17/0x10000 as your CPU/APU ratio and you should fall into the observed range there, and you can use fixed point math.

As for converting CPU cycles into seconds (to run the emulation at the right speed), if you want to be really accurate for NTSC i suppose you could run 1ms = 21477 or 21478 cycles in a pattern something like 87778777877 87778777877 87778777877... PAL with the best number we have at least is a whole number of cycles per second.
So how do other emulators manage to perfectly (or close to perfectly) sync mhz+ clocks running at different speeds?
Do any emulators perfectly sync the two? snes9x doesn't, at least until i finish rewriting the CPU core again.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Nach wrote:You can only inline in C++ which is fine since that's what you seem to be defining the file as.
Hrm, the copy of the C (not C++) draft standard I'm looking at right now tells me that inline is valid in C, and suggests that the function be inlined...
Setting void however as the parameter while fine in C is not allowed according to the C++ standard.
I doubt some compilers would care, but you should leave the () emtpy.
It is allowed, for compatibility with C. But it's considered "less desirable".
byuu

Post by byuu »

You're syncing the two by finding a common denominator using the multiply trick.

While that works, it's not neccesarily the lowest common denominator.
It is when one of your numbers isn't a whole number. However, I wrote a brute-force factoring routine (it's in the code above, commented out) to find the LCD of rounded numbers.
There are two main problems with this. The SNES 21.477mhz clock is a prime number. Simply because it has decimal places means it can't be factorable. And I realize you can "factor" anyway, e.g. 3 = 2 * 1.5, but a) no such factor would exist between the cpu/apu anyway, and b) it would require insane amounts of processing power to solve for.
If I round down to 21477272, then the LCD between that and the APU (24576000) is 8. I can turn the clock speed to 21477280 and the LCD becomes 160. I would need abs(cpu_speed / 1600.0) to get below the 4 billion (32-bit) barrier and use integers, though. That has a loss of 472hz/second. While I wouldn't mind making it a speed-hack option, there is one other problem:

The highest LCD I can use for the main clock would be 4 (one PPU tick), and the APU can't go below 6 (expansion port tick), or 24 (if I ignore the probably unused expansion port). The LCD between 4 and 6 is only 2, so the best I could do would be to split both numbers in half and round the result to a whole number. The reason is that if I made the LCD even 8, how would I add one PPU cycle? I'd have to add 0.5/tick, a float. There's no performance gain possible if I'm still using float addition/subtraction for the entirity of the loop.

As for the (void) thing... that's just a habit of mine. I realize specifying the args as void does nothing, but it looks more like a function declaration to me that way. It annoys me greatly that class constructors/destructors don't work if you specify the args as void.

I've never actually seen inline do what it's supposed to. Every time I examine the assembly generated, it's always still a function call. But I compared the routine with and without inline, and the former seemed a tad bit faster, so just for the purpose of the demonstration app, I left it in.
I can barely get 240fps in my loop, slower than ZSNES already -- and it doesn't emulate anything at all! So I was trying anything possible to help speed besides manually inlining the function (not an option, I don't write spaghetti code).
I'm thinking we'll have to do the 5A22 at the cycle level to get DMA/HDMA timing right.
My thoughts exactly. I'm planning to actually rewrite bsnes as a result of all of the changes I'll need (don't worry, I plan to copy/paste a lot), and I'm trying to lay out the framework now so that when I start, it's as modular and clean as possible.
But will it work out better to code it as a state machine with a state for each cycle of the opcode, or to keep the opcodes as-is except test for events
The former will be much slower, but it will allow better synchronization between events, especially $2140-$2143<>$f4-$f7 communication. I plan to code each chip as their own entities. I'm actually planning to go dot by dot in the PPU, so that even mid-scanline writes to certain registers can be handled correctly. I realize the end result will run at ~3fps even on this. Hence I'm trying to think of a neat modular coding method so that I can make multiple versions of each that you can toggle based on the accuracy vs. speed level you desire. Wanna help me out here, as well? :) Be kinda cool if we could come up with standard APIs for each 'processor' and share code between snes9x. Otherwise, we're unable to use/test each others' work since I don't know how to code for *nix.
i wonder whether it would emulate faster running cycle by cycle or buffering writes to the comm registers?
The latter would be much faster, but far harder to get perfect.
As for the clocks... It gets worse when you remember that PAL has a different CPU clock speed from NTSC
The nice thing about my above example is that this is no problem whatsoever, just change the base clock speeds and you're done.
I kind of figured the APU clock speed was a made-up number (much like the CPU used to be), but since we can't get the true clock speed, we have to use something. I suppose it'd be best to let you specify the speed in a text file, and try and find a number that works for the most SNES games.
And any external chips will probably be clocked relative to one of the above two clocks
Yeah, definitely. CPU + PPU = Clock A, DSP + SPC700 + EP22 = Clock B.
We should come up with names for these. Calling the 21mhz clock the CPU clock is inaccurate, since the CPU clock varies between 1.7-3.7mhz.
Choose anywhere from 0xdf8c/0x10000 to 0xdf17/0x10000 as your CPU/APU ratio and you should fall into the observed range there, and you can use fixed point math.
Interesting, thanks. 0xdf8c * 0x10000 will work. I could make that an option for speed.
As for converting CPU cycles into seconds (to run the emulation at the right speed)
The nightmare I didn't get to yet. I plan to create n number of sleep loops inside the main timing loop to sync the two clocks to milliseconds. Let's say n = 10, then I'll make 10 step values for the (cpuclock*apuclock) range, and wait for 100ms to elapse between each step. Or I could make it more accurate if neccesary -- I'll basically need it to sync enough times to not break up sound. I really need to learn how sound works already ... Probably will use QueryPerformanceCounter on Windows, and whatever on Linux for this timing.
Do any emulators perfectly sync the two?
I just assumed you were relatively close :/

By the way anomie, if you're interesting in coming up with some sort of 'standard' API like I was talking about, send me an e-mail with some contact info and we can try and work something out. I've made a few neat diagrams of what I have in mind you might like :)
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

anomie wrote:
Nach wrote:You can only inline in C++ which is fine since that's what you seem to be defining the file as.
Hrm, the copy of the C (not C++) draft standard I'm looking at right now tells me that inline is valid in C, and suggests that the function be inlined...
I have before me a spec for standard C written when C++ was standardized in 98 which says no inline for C. MSVC I believe also does not support inline in C.
anomie wrote:
Setting void however as the parameter while fine in C is not allowed according to the C++ standard.
I doubt some compilers would care, but you should leave the () emtpy.
It is allowed, for compatibility with C. But it's considered "less desirable".
You sure all compilers support it?
I know GCC removed the old type func(var1,var2,var3) type var1, type var2, type var3 { thing when compiling C++.
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Nach wrote:I have before me a spec for standard C written when C++ was standardized in 98 which says no inline for C. MSVC I believe also does not support inline in C.
I don't care what MSVC does. I'm looking at this draft, referenced from here.
You sure all compilers support it?
I know GCC removed the old type func(var1,var2,var3) type var1, type var2, type var3 { thing when compiling C++.
Whether or not all compilers support it, I suspect this site has accurate information or it would be corrected by the comp.lang.c++ folks.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:I've never actually seen inline do what it's supposed to. Every time I examine the assembly generated, it's always still a function call. But I compared the routine with and without inline, and the former seemed a tad bit faster, so just for the purpose of the demonstration app, I left it in.
I've seen it work, at least in simple cases. Recall that in C, a non-inline version will still be generated unless the function is also static, and depending on your compiler you may have to push the optimization to make it do any inlining at all.
The former will be much slower, but it will allow better synchronization between events, especially $2140-$2143<>$f4-$f7 communication.
I'm not so sure of that, depending on just how the whole mess is done. For example, if S-CPU leads, and syncs the APU on access to $2140-3, it's enough to have CPUSetByte() call SyncAPUtoCPU() before making the write effective/performing the read. The SPC700 doesn't need to SyncCPUtoAPU() at all, since it knows if there was anything interesting the CPU would have synced already. OTOH, this means you can't easily use the soundcard to sync the emulator to wall time.

Same could be done with the PPU, except there are more registers involved. In some cases, it might well be easier to buffer the values rather than sync, and do all the syncing in one fell swoop later (for example, snes9x records the M7 matrix each line rather than calling the renderer for every individual scanline for most games). We still need to determine which registers can take effect mid-scanline and which are sampled per-line.
Hence I'm trying to think of a neat modular coding method so that I can make multiple versions of each that you can toggle based on the accuracy vs. speed level you desire. Wanna help me out here, as well? :) Be kinda cool if we could come up with standard APIs for each 'processor' and share code between snes9x.
That would be interesting. IOW, basically we'd draw up something resembling a C++ class specification for each chip. But i suspect if we went that route we'd end up with bsnes9x with two GUIs rather than two emulators...

Along these lines, one thing i would really like would be a standard serialization for the various components even if underlying emulation is completely different... But whether even that is possible, i'm not sure.
The latter would be much faster, but far harder to get perfect.
Not really, i think. Use a FIFO with the ability to peek at the next-from-top element. Writes would push a pair (timestamp,value), and reads would pop as long as next-from-top's timestamp<=current_time before choosing the value. And periodically run a cleanup in case the reader hasn't read in a while to prevent eating all memory.
I kind of figured the APU clock speed was a made-up number (much like the CPU used to be), but since we can't get the true clock speed, we have to use something. I suppose it'd be best to let you specify the speed in a text file, and try and find a number that works for the most SNES games.
Remember all my timing experiments towards finding the SPC700 clock speed relative to the 5A22 clock speed? Since the ratio is the only thing that matters for sync anyway, those results should be good enough even if they don't necessarily give the 'true' clock speed. And if we assume EP21 uses an integer scaling factor off the APU clock, we could even measure that more or less directly.
We should come up with names for these.
Ok... CPU Clock should be the variable-speed clock presumably seen on CC57. CC1 Clock is as good a name as any for the 21.477 MHz clock, "master clock" is inaccurate since the APU clock isn't a slave. Dot Clock works for the PPU's output, and DMA Clock works for the CC1/8 clock used by the DMA/HDMA circuitry. APU Clock can be the 24 MHz clock used by the various parts of the APU, with SPC700 Clock and EP21 Clock divided off of it. Sample Clock would be APU/768, the nominally 32000 Hz sample frequency output by the S-DSP. And we could even talk about the Timer Clocks used as Stage 1 input to the SPC700's timer registers. Any others?.
The nightmare I didn't get to yet. I plan to create n number of sleep loops inside the main timing loop to sync the two clocks to milliseconds. Let's say n = 10, then I'll make 10 step values for the (cpuclock*apuclock) range, and wait for 100ms to elapse between each step. Or I could make it more accurate if neccesary -- I'll basically need it to sync enough times to not break up sound. I really need to learn how sound works already ... Probably will use QueryPerformanceCounter on Windows, and whatever on Linux for this timing.
snes9x syncs the time at the end of each frame, either delaying if we're running fast or skipping a frame if we're running behind. Sound on the SNES outputs one 16-bit stereo sample every 768 APU Clock cycles, snes9x (in my rewrite) buffers these until the soundcard requests more data (either by an interrupt from the card, or by polling the soundcard periodically). And i've even written some code to try to handle fast forward/slow motion without pitch distortion, but it's only been tested to the extent of "mpg123 -s song.mp3 | sox -r 44100 -c 2 -t .sw - -r 32000 -c 2 -t .sw - | ./my_filter outrate inrate invariance | play -c 2 -r 32000 -t .sw" rather than being integrated into the emulator... Sound quality suffers when 'corrections' must be made at a decent frequency, but it's not all that horrible even if we're running at rates approaching 1:100 or 100:1. (in short, the idea is based on chopping the audio stream into ~1ms 'grains' and crossfading them to drop or duplicate samples as needed).

Current snes9x, BTW, just runs S-DSP completely out of sync with the SPC700 to generate the requested number of samples whenever the soundcard demands it. Surprisingly, it actually produces mostly decent sound with fast forward/slow motion.
Do any emulators perfectly sync the two?
I just assumed you were relatively close :/
snes9x runs the SPC700 at ~1022727.27 Hz for the most part (OTOH, there's no WRAM refresh either), and doesn't sync the S-DSP with the SPC700 at all. I'm not sure when CPU-SPC700 sync happens. Not even close. I don't know what other emulators might do.
By the way anomie, if you're interesting in coming up with some sort of 'standard' API like I was talking about, send me an e-mail with some contact info and we can try and work something out. I've made a few neat diagrams of what I have in mind you might like :)
Just reply to any of the emails i've sent you in the past, unless you want to try your luck looking for me in #zsnes (or #bzsneese9x, mwa ha ha).
byuu

Post by byuu »

You make interesting points about buffering things to handle the delays. I'm sure that method would work, and much faster at that... I guess I just want to try breaking everything down as small as possible :)
That would be interesting. IOW, basically we'd draw up something resembling a C++ class specification for each chip. But i suspect if we went that route we'd end up with bsnes9x with two GUIs rather than two emulators...
Yes, exactly. I wanted the main routine to create the base classes, like
class ppu {
public:
void update(); //called once for every PPU tick
...
};
and then each implementation would overload the class.
e.g.
class anomie_ppu : public ppu {
};
and at that point you could add your own functions and variables and what have you.
The main loop would just shell the new class and use a reinterpret_cast to call the base functions from the main ppu class.

The main drawback is that linux doesn't support DLLs (or does it?), so all of our classes would have to be statically linked. Which would be fine as long as each 'chip class' implementation didn't touch the global namespace except for its own class.

As far as bsnes9x... you make a good point. I don't mind there being two versions, I could just make the default 'chips' my own implementations in mine, or whatever; likewise for snes9x, or if you wanted to just start on a 'new' snes9x base, I wouldn't mind joining the 9x team, or we could always just make absnes :)
The main advantage would be that it would be easy for anyone to improve/replace one of these chips in one emulator, and the change would be instantly available to all emulators with no porting neccesary (each chip will be ansi-c++/assembler)... I'm sure we could differentiate them all in their own ways... 'sides, I kinda like the idea of joining our skills into one main working group :D
Along these lines, one thing i would really like would be a standard serialization for the various components even if underlying emulation is completely different... But whether even that is possible, i'm not sure.
I've thought about it a lot, I suspect that it very likely is. We just need a very clear communication system so that all chips can talk to each other. The overhead of having to communicate through messages will no doubt slow things down, but it will make it a lot easier to, say, just totally rewrite the PPU, or have multiple versions of the CPU emulator that have different speed vs accuracy trade-offs.

I think you have all the clocks down. So the CC1 clock is actually stored somewhere around the cart connector circuitry? I agree 'master clock' is inaccurate, don't even know where I got that from.
APU clock is fine, but doesn't the DSP run at the same speed as the APU clock?
snes9x syncs the time at the end of each frame
Neat idea, but since the system doesn't run at 60fps, that could get tricky, too.
Just reply to any of the emails i've sent you in the past, unless you want to try your luck looking for me in #zsnes (or #bzsneese9x, mwa ha ha).
You mean #super_bzsneesleuth9x :P
#zsnes on...? I was just meaning like via AIM/ICQ/whatever, to chat. IRC is fine, though.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote: void update(); //called once for every PPU tick
I'd rather (or additionally) want a function to run cycles up to an internal counter. So I could do something like APU.SyncTo(CPU.Cycles())...
The main drawback is that linux doesn't support DLLs (or does it?), so all of our classes would have to be statically linked. Which would be fine as long as each 'chip class' implementation didn't touch the global namespace except for its own class.
Linux tends to call them "shared libraries", and of course they must be recompiled. Using them with C++ will probably be painful, at least with the standard dynamic loader library. Name mangling, you know.
I've thought about it a lot, I suspect that it very likely is. We just need a very clear communication system so that all chips can talk to each other.
Huh? Serialization means "writing the object to disk" (or network, etc). So modulo any emulator-specific objects, the savestate format could be completely compatible.
So the CC1 clock is actually stored somewhere around the cart connector circuitry? I agree 'master clock' is inaccurate, don't even know where I got that from.
APU clock is fine, but doesn't the DSP run at the same speed as the APU clock?
"master clock" comes from "master cycles" used in measuring the 5A22/PPU timings, as opposed to CPU cycles. I couldn't say where the actual oscillator lives, but it connects all over the place, including CC1. The S-DSP is really a black box, it takes in the 24.576 MHz clock and outputs audio samples, the SPC700 clock, and a bunch of other stuff. But whether the DSP core runs at full speed, or APU/3, or whatever other speed we can only guess. IIRC, an analysis of data requirements for the whole BRR decoding and echo requires at least APU/3...
Neat idea, but since the system doesn't run at 60fps, that could get tricky, too.
Yeah, snes9x currently just assumes 60 Hz. What exactly i'll do in my rewrite, I don't know yet.
You mean #super_bzsneesleuth9x :P
#zsnes on...? I was just meaning like via AIM/ICQ/whatever, to chat. IRC is fine, though.
I never run ICQ anymore. #zsnes is on freenode.net, i'm there occasionally.
Reznor007
Lurker
Posts: 118
Joined: Fri Jul 30, 2004 8:11 am
Contact:

Post by Reznor007 »

You might try looking at the MAME timing code. Here's what the dev that wrote that part said:
The second core change I just submitted last night was a change to make the timer system use integers internally instead of floating point values. This has been on my to-do list for years, and I finally got the courage up to potentially break everything once again with a timer change. Fortunately, my limited testing indicates that things aren't really broken for the most part (except Hard Drivin', which always breaks anytime I change anything, and which I've already fixed). The main reason for using integers is accuracy. With floating point, the more time that accumulated, the more error we would see in the low bits. This was leading to some timing drift and other subtle errors. The new code keeps track of everything down to the nearest attosecond, which, by my calculations, only loses 1 cycle on a 96MHz CPU once every 5 minutes. And that's a case I manufactured to deliberately expose an error.
MAME is able to keep Race Drivin synced up quite well, which is even more demanding about timing accuracy than Hard Drivin(gotta love that early 3d hardware), and here's it's specs:
68010 8MHz
TMS34010 6MHz
ADSP2100 8MHz
DSP32C 40MHz
68000 8MHz

Or the Sega STV system:
2x SH2 28.6364MHz
68000 11.45456MHz

On both of these those specs don't include sound chips either(my list are actual system CPU's only), but those are also kept under the same timer system.
byuu

Post by byuu »

I'd rather (or additionally) want a function to run cycles up to an internal counter. So I could do something like APU.SyncTo(CPU.Cycles())...
Don't understand how to do this exactly right now, but I've been up for 16 hours straight... anyway, the format is definitely open for debate.
Linux tends to call them "shared libraries", and of course they must be recompiled. Using them with C++ will probably be painful, at least with the standard dynamic loader library. Name mangling, you know.
I'm well acquainted with name mangling. Luckily, I can use extern "C" or declare my imports with __stdcall. I like the former more because it works with assembler code. I never did understand that gibberish it adds to function prototypes (function@YXXRT), nor do I want to.
Linux really needs to take care of that 'everything has to be compiled' thing already :/
Huh? Serialization means "writing the object to disk" (or network, etc). So modulo any emulator-specific objects, the savestate format could be completely compatible.
Oops, sorry. That could be extremely difficult. At least, if cores are written as differently as they are now. Like for my save state format, I would have to export the DMA clock time position and I'd have to write out a lot more than just the values written to the $21xx registers (like whether the next read from $213c is low or high).
The only way I can see that working is to give everyone's 'chip' emulations their own signatures. For each 'chip', there is a base list of information. Say for the PPU, this would be the value written to all of the PPU registers, and basic info like that. And add a header that indicates a signature for whatever PPU core generated the input, and how many extra bytes of information that chip stored. Obviously, this wouldn't be 100% compatible between different PPU cores (the extra chip-specific information would be lost), but the missing data could be filled in with default values resulting in a ~99.9% recovered state. It's either that, or any potential PPU core would have to output the entire range of possible values to the save state (including the variables needed for perfect DMA timing, even if most cores didn't bother to emulate that correctly). The problem there is that we would have to keep expanding and breaking old formats as I'm sure we'll never know all the variables we'd need to perfectly emulate the entire SNES.
The new code keeps track of everything down to the nearest attosecond
...attosecond?! I realize it can be done using carries, but jesus man. One quintillionth of a second?! Just wow...
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:Linux really needs to take care of that 'everything has to be compiled' thing already :/
What, you write your C++ code on Windos and you don't have to compile it before running it? ;)
Oops, sorry. That could be extremely difficult. At least, if cores are written as differently as they are now. Like for my save state format, I would have to export the DMA clock time position and I'd have to write out a lot more than just the values written to the $21xx registers (like whether the next read from $213c is low or high).
The idea is that there are certain things that must be written out, and certain things that can be figured out from what is written. For the 5A22, we'd need to write all the CPU registers, the current cycle counter, pending IRQ/NMI flags, the DMA counter delta (CPU.Cycles%8==DMAdelta at the edge of a DMA Clock cycle, it can change depending on how we wrap CPU.Cycles), and such. But we probably don't need to write all the pending event list, we can refigure NMI based on $4200, IRQ based on $4200 and the [HV]TIMER registers, and so on. Savestate loading isn't speed critical, so all that recalculation is fine as long as it doesn't take more than a second or so.
(including the variables needed for perfect DMA timing, even if most cores didn't bother to emulate that correctly).
I'd rather have the "common" savestate be as accurate and complete as possible. Sure, if some emu doesn't do correct DMA timing this value won't be accurate or used, but it doesn't hurt anything to include it either.
The problem there is that we would have to keep expanding and breaking old formats as I'm sure we'll never know all the variables we'd need to perfectly emulate the entire SNES.
My solution for this: Each 'chip' stores a block to the savestate stream, consisting of a signature (like PNG or current snes9x savestates), a length, a version, and data. When loading, the main loader would dispatch to the chip based on the sig. The chip would check the version and abort if it's unsupported, and otherwise load the data. We can append new ('optional') fields just by incrementing length (chip should ignore extra bytes on the theory they're not important, and guess a default for missing bytes), and delete/reinterpret fields by upping the version.

Any implementation-specific data would go in an extra block, so for example "S-CPU" could be for the standard info, you could additionally use "b-CPU", ZSNES could use "z-CPU", and so on.
Reznor007 wrote:You might try looking at the MAME timing code. Here's what the dev that wrote that part said:
So what, he changed from floating point to fixed point with 64 bits in the fraction? I'm surprised they were ever using floating point in the timing, really.
Reznor007
Lurker
Posts: 118
Joined: Fri Jul 30, 2004 8:11 am
Contact:

Post by Reznor007 »

anomie wrote:So what, he changed from floating point to fixed point with 64 bits in the fraction? I'm surprised they were ever using floating point in the timing, really.
Early MAME code was written back in 1997, so back then it wasn't really considered. Over the past 2-3 years nearly every major core system in MAME(CPU interface, timing, memory system, input system, sound system) has been rewritten(all by the same guy amazingly) to be far more accurate and useful.
Noxious Ninja
Dark Wind
Posts: 1271
Joined: Thu Jul 29, 2004 8:58 pm
Location: Texas
Contact:

Post by Noxious Ninja »

The MAME guys are psycho freaks.


The world needs more people like them.
[u][url=http://bash.org/?577451]#577451[/url][/u]
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Reznor007 wrote:Early MAME code was written back in 1997, so back then it wasn't really considered.
It's not like fixed point is something new though...
Over the past 2-3 years nearly every major core system in MAME(CPU interface, timing, memory system, input system, sound system) has been rewritten(all by the same guy amazingly) to be far more accurate and useful.
Nifty.


In other news, anyone have any real info on registers $4202-6 and $4214-7?
byuu

Post by byuu »

In other news, anyone have any real info on registers $4202-6 and $4214-7?
Nothing you don't already have.

Multiplication/Division set/read. I guess it'd be pretty easy to verify the 8/16 master cycle delay that we have currently now that we can measure in 2 CC1 clock cycle intervals.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:I guess it'd be pretty easy to verify the 8/16 master cycle delay that we have currently now that we can measure in 2 CC1 clock cycle intervals.
Remember though that we have to write $4203/6, then delay, then read $4214-7, so we can't really test every 2 CC1 Cycles in this case... Oh, and we can't forget $211b/c multiplication into $2134-6. Particularly, does the 16-bit value written to $211b use the M7* latch, or is there a special latch just for the multiplication ($211b and $211c?).

Preliminary tests look like ~48 and ~96 cycles (IOW, the "machine cycles" in the doc are either 8 master cycles or are CPU Clock cycles), and during the waiting period you read back the partially-calculated value. So for 100% correct emulation we'll not only need to figure the delay but the method used to calculate the result as well. The division is probably similar to the SPC700 division, at least. For $211b/c, the multiplication value from $211b is the same as the M7A register value, nothing odd like $210d being both M7HOFS and BG1HOFS. No info on the tining for this yet.
bztunk
Hazed
Posts: 84
Joined: Mon Dec 27, 2004 9:08 pm
Location: In A.D. 2101, war was beginning.

Post by bztunk »

inline was added to C99. With other nice things like variables that can be declared anywhere and // comments.

C++ doesn't support compatibilty with C99 (it tries to, but it can't perfectly do so, thanks to contracticting standards)

Never tried dlls under linux, but there's that nice dlopen fuction that I think can do it: http://www.die.net/doc/linux/man/man3/dlopen.3.html
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

bztunk wrote:With other nice things like variables that can be declared anywhere.
You sure about that?

I know C99 forbids "for (size_t i = 0; i < x; i++){}"
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Nach wrote:I know C99 forbids "for (size_t i = 0; i < x; i++){}"
Does it? I can't find anything that claims that construct is forbidden.
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

anomie wrote:
Nach wrote:I know C99 forbids "for (size_t i = 0; i < x; i++){}"
Does it? I can't find anything that claims that construct is forbidden.

Code: Select all

#include <stdlib.h>

int main()
{
  for (size_t i = 0; i < ~0; i++){}
  return(0);
}

Code: Select all

/home/nach> gcc test.c
test.c: In function `main':
test.c:5: error: `for' loop initial declaration used outside C99 mode
/home/nach>  
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
Post Reply