I was interested in WIP releases before in the other thread, so where do I go to grab these versions?byuu wrote:I'm going to try and start posting WIP news in version release threads, instead of the never-ending "bsnes thread".
That said, new WIP posted.
bsnes v030 released
[i]"Change is inevitable; progress is optional"[/i]
I did some performance testing. For each bsnes version listed below, I ran CT intro twice with 3x scale without filters. The FPS ratings listed were taken during the Black Omen scene.
Caching makes a nice difference here. I really hope that you know what you're doing byuu. You have said many times that one of your goals is to document Snes's internal behavior. I understand that caching doesn't affect accuracy, but the code itself no longer documents true Snes behavior, does it? I really don't know how detailedly you want to document Snes.
Something what really annoyed me was how some people tested the Black Omen scene (in bsnes thread). Each one used different scale and/or filter. I noticed during my own tests that I had used 3x scale. This probably doesn't matter much because image scaling and such are offloaded to the GPU. But then again the filters do matter. To make our test result comparable, which settings should we use? 1x scale and no filters perhaps? Maybe official "How to test performance" instructions should be written.
Code: Select all
v28 wip20 70-72, 71-73
v30 70-71, 69-71
v30 wip2 84-86, 87
Something what really annoyed me was how some people tested the Black Omen scene (in bsnes thread). Each one used different scale and/or filter. I noticed during my own tests that I had used 3x scale. This probably doesn't matter much because image scaling and such are offloaded to the GPU. But then again the filters do matter. To make our test result comparable, which settings should we use? 1x scale and no filters perhaps? Maybe official "How to test performance" instructions should be written.
HQ2X filter stills causes segfault here and I already posted detailed backtrace from 0.030 in the other thread. Do you want new backtrace for the WIP?byuu wrote:I'm going to try and start posting WIP news in version release threads, instead of the never-ending "bsnes thread".
That said, new WIP posted.
[url=http://aur.archlinux.org/packages.php?ID=11576]Arch Linux bsnes package[/url]
The scanline PPU was a hack to begin with, this makes little difference. But in this case, yes. The SNES PPU is almost certainly caching tiles. It doesn't have the bandwidth to read 16-bit words every single pixel just for offset-per-tile mode alone.I understand that caching doesn't affect accuracy, but the code itself no longer documents true Snes behavior, does it?
No thanks. I can't reproduce it on my side, so I can't fix it. It'll have to stay broken for now.HQ2X filter stills causes segfault here and I already posted detailed backtrace from 0.030 in the other thread. Do you want new backtrace for the WIP?
Thanks. I corrected the d-pad shadow, which was off as you guys realized. Should be okay to add now. Sucks, I loaded the PSD yesterday and all the layers were merged. I must have accidentally saved it that way when making the last BMPbyuu wrote: The controller graphic looks great. I'll wait until you finish it (or are you done now?) before merging it in, though. Takes quite a while to encode it on my side.

......
Anyone who's serious about trying to fix the texture issue, post system specs for me of problem systems in the following format:
1. OS and service pack level
2. Video Card model and driver version
3. Motherboard chipset or model
-
- Regular
- Posts: 347
- Joined: Tue Mar 07, 2006 10:32 am
- Location: The Netherlands
Using MK 2, 0.030c and 0.030 and 0.028, I get these frame rates at the character select screen:
95/60 0.028
106/0 0.030c and 0.030. o_O
My specs are Vista X64 SP1, my motherboard is a Gigabyte GA-P31-S3G with the latest BIOS, my video card is an nVidia GeForce 8500 GT with the 163.75 version drivers.
95/60 0.028
106/0 0.030c and 0.030. o_O
My specs are Vista X64 SP1, my motherboard is a Gigabyte GA-P31-S3G with the latest BIOS, my video card is an nVidia GeForce 8500 GT with the 163.75 version drivers.
-
- Regular
- Posts: 347
- Joined: Tue Mar 07, 2006 10:32 am
- Location: The Netherlands
I.S.T. wrote:106/0 0.030c and 0.030. o_O
He could've been comparing 0.030 to my version and noticing no difference.. it doesn't say whether he was testing the WIP. I take it this bit of MK 2 uses OPT? (if so what you say is still valid, just not in the way I think you meant it)byuu wrote:Wow, 106 with 30c as well. That means that the two combined would give you 117fps or so. That's a fairly nice machine.
By the way, I think this question of mine got lost in a flurry of posts:
Verdauga Greeneyes wrote:byuu, I did have a quick look at the scanline filter, but what do 'pitch' and 'outpitch' stand for?
Geez, I've sent out that WIP link like 30 times now. There's only a dozen or so posters here ...
Okay, pitch is a very common term for video surfaces. It's the width in bytes of a scanline, which can be greater than the pixel width * resolution. I use it as a trick to store hires and lores in the same surface.
pitch is input surface, outpitch is output surface.
Scanline filter uses 50% blend, so all pixels need to be drawn. Sure you can speed it up by skipping the odd lines (and losing the 50% blend), but then when you get an interlaced image, that data won't ever clear out. The fastest way to do it is to rig it to clear whenever interlace is first disabled, so it only needs to happen once.
Okay, pitch is a very common term for video surfaces. It's the width in bytes of a scanline, which can be greater than the pixel width * resolution. I use it as a trick to store hires and lores in the same surface.
pitch is input surface, outpitch is output surface.
Scanline filter uses 50% blend, so all pixels need to be drawn. Sure you can speed it up by skipping the odd lines (and losing the 50% blend), but then when you get an interlaced image, that data won't ever clear out. The fastest way to do it is to rig it to clear whenever interlace is first disabled, so it only needs to happen once.
Some things you can try that you probably haven't: delete the bsnes cfg file and then try v030. bsnes currently shares a cfg file which is originally generated by god which knows version, I don't know if this could cause any issues. Also, find the latest chipset drivers for your motherboard and install them. Most people neglect to do this and it can result in certain oddities.
I'm kind of an updating-everything freak. I like to keep my software updated to the latest versions whenever I can. I also deleted the .cfg file several times without resulting in any change.FitzRoy wrote:Some things you can try that you probably haven't: delete the bsnes cfg file and then try v030. bsnes currently shares a cfg file which is originally generated by god which knows version, I don't know if this could cause any issues. Also, find the latest chipset drivers for your motherboard and install them. Most people neglect to do this and it can result in certain oddities.
[i]"Change is inevitable; progress is optional"[/i]
-
- Regular
- Posts: 347
- Joined: Tue Mar 07, 2006 10:32 am
- Location: The Netherlands
Haha, so am I. Maybe that's our problem?ShadowFX wrote:I'm kind of an updating-everything freak. I like to keep my software updated to the latest versions whenever I can. I also deleted the .cfg file several times without resulting in any change.

And thank you for your answer, byuu. Since the input is RGB555, would I be correct in assuming pitch is 2 * 256 for low res and 2 * 512 for high res? And outpitch is 4 * 256 * [scale] for low res and 4 * 512 * [scale] / 2 for high res? (assuming XRGB888 as the colourspace)
-
- Trooper
- Posts: 394
- Joined: Mon Feb 20, 2006 3:11 am
- Location: Space
Ok, tested the CT Black Omen scene with all settings defaulted, and speed regulation on normal, and no filters.
v0.28 47~fps
v0.30 50-53fps
v0.30.02 60fps
Pretty good, if I say so myself.
Will be testing SMAS: SM2 and SM3 in a bit. 
EDIT: SMAS:SM2 results on the character selection.
v0.28 & v0.30 both have around 57-58fps
v0.30.02 has perfect 60fps.
EDIT2: SMAS:SM3 battle mode results.
v0.28 & v0.30 both have between 56 to 58fps.
v0.30.02 has perfect 60fps.
Now, I'll have to disable speed regulation and see how fast they go.
v0.28 47~fps
v0.30 50-53fps
v0.30.02 60fps
Pretty good, if I say so myself.


EDIT: SMAS:SM2 results on the character selection.
v0.28 & v0.30 both have around 57-58fps
v0.30.02 has perfect 60fps.
EDIT2: SMAS:SM3 battle mode results.
v0.28 & v0.30 both have between 56 to 58fps.
v0.30.02 has perfect 60fps.
Now, I'll have to disable speed regulation and see how fast they go.

[url=http://www.eidolons-inn.net/tiki-index.php?page=Kega]Kega Fusion Supporter[/url] | [url=http://byuu.cinnamonpirate.com/]bsnes Supporter[/url] | [url=http://aamirm.hacking-cult.org/]Regen Supporter[/url]
Damn, this OPT optimization is awesome. Really helps me stay above 60 no matter what. Consequently, the foremost bsnes fps crusher is now "Liberty or Death," after the final aye of initial game setup. The game initiates some kind of clockwise transitionary vortex that makes the black omen seem snappy in comparison. Got any tricks for this one, byuu? 

v030 wip3 posted.
This one add's krom's ruby changes, meaning Windows OpenGL support.
For consistency, I changed the Windows system.video setting to "wgl", and Linux OpenGL to "glx". Linux users should be sure to update that to avoid SDL video output.
I get ~119fps with OpenGL, and ~120fps with Direct3D. I'd appreciate if everyone else would test OpenGL support. If it works everywhere that D3D works, and avoids that texture size slowdown issue, then we should make it the default driver.
The only issue I see with the driver now is that vsync is enabled no matter what. You can turn it off in eg the nVidia control panel by overriding the setting. I also recommend enabling triple buffering. With that, video is perfectly smooth and audio is ~99.5% perfect. So, so close. A slight cpu.freq change and you can probably get it perfect.
God, it's so nice having perfect video and audio. I really wish that worked across the board. It's absolute euphoria playing games like that.
This one add's krom's ruby changes, meaning Windows OpenGL support.
For consistency, I changed the Windows system.video setting to "wgl", and Linux OpenGL to "glx". Linux users should be sure to update that to avoid SDL video output.
I get ~119fps with OpenGL, and ~120fps with Direct3D. I'd appreciate if everyone else would test OpenGL support. If it works everywhere that D3D works, and avoids that texture size slowdown issue, then we should make it the default driver.
The only issue I see with the driver now is that vsync is enabled no matter what. You can turn it off in eg the nVidia control panel by overriding the setting. I also recommend enabling triple buffering. With that, video is perfectly smooth and audio is ~99.5% perfect. So, so close. A slight cpu.freq change and you can probably get it perfect.
God, it's so nice having perfect video and audio. I really wish that worked across the board. It's absolute euphoria playing games like that.
-
- Regular
- Posts: 347
- Joined: Tue Mar 07, 2006 10:32 am
- Location: The Netherlands
Booya. For ease of comparison to my earlier results, I tested it with the title screen of SMAS: 0.030 WIP 3 with 'wgl': 97 fps!
Note that's higher than even the most crippled Direct3D build I made, which topped out at 92. It's also faster than my desktop PC. Mind you there does seem to be something strange going on with vsync. Before I disabled it, I got a constant 50 fps, which makes no sense since this laptop's screen is locked to 60Hz.
Note that's higher than even the most crippled Direct3D build I made, which topped out at 92. It's also faster than my desktop PC. Mind you there does seem to be something strange going on with vsync. Before I disabled it, I got a constant 50 fps, which makes no sense since this laptop's screen is locked to 60Hz.
-
- Trooper
- Posts: 376
- Joined: Tue Apr 19, 2005 11:08 pm
- Location: DFW area, TX USA
- Contact:
wgl seems to be significantly slower on my system than the default "" from the config settings. I am using 30-wip3 and here's the MK II character select screen results:
Default "" for system.video: 118 fps
"wgl" for system.video: 84 fps
I'm on Vista 64-bit Ultimate with a Maximus Formula mobo and Q6600 quad. Graphics card is a Radeon 3850 HD or something like that.
Default "" for system.video: 118 fps
"wgl" for system.video: 84 fps
I'm on Vista 64-bit Ultimate with a Maximus Formula mobo and Q6600 quad. Graphics card is a Radeon 3850 HD or something like that.
Last edited by FirebrandX on Wed Mar 26, 2008 8:58 am, edited 1 time in total.
-
- Regular
- Posts: 347
- Joined: Tue Mar 07, 2006 10:32 am
- Location: The Netherlands
I get similar results to firebrandx on my ATI card (latest drivers). Around 30% slower performance in opengl mode. Okay, so that's one nVidia card card with identical performance and two ATI cards with poor performance. ATI's opengl drivers always were craptastic...
The good news about the opengl driver is that it doesn't have the scaling bug like the d3d one and vsync performance is probably better at 60hz as you suggest. My vsync on/off driver controls work as well as d3d as well. Makes me think bsnes should just remove those internal options and make vsync driver-controlled for simplicity's sake.
The good news about the opengl driver is that it doesn't have the scaling bug like the d3d one and vsync performance is probably better at 60hz as you suggest. My vsync on/off driver controls work as well as d3d as well. Makes me think bsnes should just remove those internal options and make vsync driver-controlled for simplicity's sake.
One thing we can always do is add some platform-specific profiling code. Have bsnes try and determine what the fastest driver is upon first run. As if I don't have enough to do already, heh.
New WIP, which converts the S-DSP ring buffers to an internal class object. Surprisingly, it actually does make the code a bit nicer to look at, although it's kind of unfortunate I can't hijack operator[]=, heh. I'd be forced to use modulus for that.
Even more surprising, it's about ~2% faster than before. Even though it's technically even more complex now with three writes instead of two. Makes no sense at all, but I won't complain. Getting 122fps now on Zelda 3 load screen.
---
ATI Radeon X300LS:
Direct3D = 64fps
OpenGL = 24(!!)fps
... as if we needed another reason not to buy ATI products. What the hell was AMD thinking, buying them?
Better yet, why do people buy ATI products? Laptops, I can understand. But for desktops?? Seriously. That performance is so terrible, you couldn't even play OpenGL games with that. We really need more OGL titles to rape ATI on benchmark tests, so that they'll get their heads out of their asses.
New WIP, which converts the S-DSP ring buffers to an internal class object. Surprisingly, it actually does make the code a bit nicer to look at, although it's kind of unfortunate I can't hijack operator[]=, heh. I'd be forced to use modulus for that.
Even more surprising, it's about ~2% faster than before. Even though it's technically even more complex now with three writes instead of two. Makes no sense at all, but I won't complain. Getting 122fps now on Zelda 3 load screen.
---
ATI Radeon X300LS:
Direct3D = 64fps
OpenGL = 24(!!)fps
... as if we needed another reason not to buy ATI products. What the hell was AMD thinking, buying them?
Better yet, why do people buy ATI products? Laptops, I can understand. But for desktops?? Seriously. That performance is so terrible, you couldn't even play OpenGL games with that. We really need more OGL titles to rape ATI on benchmark tests, so that they'll get their heads out of their asses.