• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PS3 Cell made Sonys first party what they are today.

PaintTinJr

Member
FYI, ATI Xenos has 8 concurrent contexts and 64 (hyper) threads for 48 unified shaders. The lowest latency for data storage known to man is an SRAM register file, hence why GpGPUs has thousands of them.

For ATI Xenos
vqxNYBx.png


CPU jobs can be kicked from the Xenos GPU using a Memexport.

AMD's GCN origins come from ATI Xenos.

----

Note that Amiga's Copper (short for co-processor) can run independently from the CPU since it can modify registers via DMA.

1992 Macintosh IIfx can support Apple 8*24GC NuBus Video Card that includes AMD 29000 RISC processor. Am29000 RISC architecture processor running at 30 MHz with 22 MIPS. Am29000 @ 30 Mhz with 22 MIPS is about 68040 @ 28 Mhz level performance. AMD 29000 RISC processor was used as a video accelerator processor.
You've missed the point entirely, sadly and quoted lots of irrelevant stuff to drown out my original point it seems.

None of what you refer to is the same situation. The other groups of SPUs within the Cell BE can crash their internal black-box hypervisor-ed programs and the remaining SPU groups can continue completely unaffected in isolation - if the software design intends that type of operating say for NASA/Airforce or bank use- the SPUs are more than Co-processors that run as slave devices to their primary CPU core thread as the examples you provided. The EiB being a ringbus where a token is shared is the reason why the SPUs can operate in isolated arbitrary groups in this way, and in theory even the PPE could crash and recover its program - from a fresh bootstrap sequence - without killing the existing SPU groups working away AFAIK.

I also used the word "reasonably" unique, rather than just "unique", on the off chance other weak hardware from the past, not used like the Cell BE in consoles and Roadrunner had a similar independent satellite processor setup.
 

Fafalada

Fafracer forever
PS3's 3.2 Ghz CPU/SPE (5 SPEs were used) is showing a real-world IPC disadvantage when compared to PS4's Jaguar @ 1.6 Ghz/1.75Ghz (six cores were used).
That's not a measure of IPC though.
At 'most' you can view it as a measure of FP throughput (we don't really know how much FP math is done in the benchmark, but it can be reasonably argued it's FP 'heavy' workload), but it says nothing about IPC.

Also even as a benchmark, it's a sample of '1'.
Using this type of methodology, MLAA workload runs at 4ms on 5 SPEs, and a bit over 4ms on an 8-thread SandyBridge I7 @~3Ghz.
Still says nothing about IPC, but it offers a wildly different relative performance lens (I never saw the same code on a 1.6Ghz Jaguar but I'd expect it to be at least 2x slower).
 

rnlval

Member
That's not a measure of IPC though.
At 'most' you can view it as a measure of FP throughput (we don't really know how much FP math is done in the benchmark, but it can be reasonably argued it's FP 'heavy' workload), but it says nothing about IPC.

Also even as a benchmark, it's a sample of '1'.
Using this type of methodology, MLAA workload runs at 4ms on 5 SPEs, and a bit over 4ms on an 8-thread SandyBridge I7 @~3Ghz.
Still says nothing about IPC, but it offers a wildly different relative performance lens (I never saw the same code on a 1.6Ghz Jaguar but I'd expect it to be at least 2x slower).
Problems with your argument
1. MLAA is a needed workaround for aging G7X's MSAA performance issue. PCs with DX10 class or ATI X1xxx DX9c GPU don't have this issue. AMD added the MLAA option for its GPU drivers.

2. There's no visibility for AVX usage for SandyBridge.

3. There's no visibility MLAA codebase from the same developer. You didn't provide proper sourcing with your argument.

4. Depending on scene complexity and frame buffer resolution, MLAA's render time can change. Intel's MLAA demo was processing 1600x1200 resolution, not PS3's 1280x720p level resolution targets. Cited reference https://hothardware.com/news/intel-details-option-to-move-antialiasing-to-the-cpu

My Ubisoft's argument deals with the same content.

-----
Back in time arguments between PS3 vs PC (and its Xbox proxy)

In reference to eurogamer.net/articles/digitalfoundry-saboteur-aa-blog-entry and I quote.

"In the meantime, what we have is something that's new and genuinely exciting from a technical standpoint. We're seeing PS3 attacking a visual problem using a method that not even the most high-end GPUs are using."
Eurogamer didn't factor in AMD's developer.amd.com/gpu_assets/AA-HPG09.pdf

It was later corrected by Christer Ericson, director of tools and technology at Sony Santa Monica and I quote

"The screenshots may not be showing MLAA, and it's almost certainly not a technique as experimental as we thought it was, but it's certainly the case that this is the most impressive form of this type of anti-aliasing we've seen to date in a console game. Certainly, as we alluded to originally, the concept of using an edge-filter/blur combination isn't new, and continues to be refined.This document by Isshiki and Kunieda published in 1999 suggested a similar technique, and, more recently, AMD's Iourcha, Yang and Pomianowski suggested a more advanced version of the same basic idea".
AMD's Iourcha, Yang and Pomianowski's papers refers to developer.amd.com/gpu_assets/AA-HPG09.pdf

To quote AMD's paper

"This filter is the basis for the Edge-Detect Custom Filter AA driver feature on ATI Radeon HD GPUs".
Eurogamer's "not even the most high-end GPU are using" assertion IS wrong.
 
Last edited:

rnlval

Member
I absolutely fear responding to you and going down this rabbit hole, but can I have several sources for the 0.5 "real-world" IPC on SPU code?

Also, I find it interesting that you talk about Cell being not 100% efficient in it's SPU given it's memory architecture, yet automatically peg the XGPU at the 100% optimal 8 contexts, in all situations.

You're forgetting DAMMIT team's Xenos has 64 hyper-threads (aka ATI's Ultra-Threads) that enable the host CPUs to frontload GPU commands on the GPU. CELL's SPUs and NVIDIA RSX do not have hyperthreading features.

NVIDIA's GTX 8800 (G80 CUDA, DX10.x) introduced GigaThreads for NVIDIA's lineup.

AMD RDNA 2/RDNA 3, NVIDIA Turing/Ampere/Ada Lovelace, and Intel ARC GPUs include BVH hardware accelerated raytracing. IBM is way behind in the game hardware rendering race.

Remember, modern GPUs are not DSPs.

CELL failed for a reason, not political.
 

rnlval

Member
The Cell also offered a vastly superior audio quality compared to the PS4.
I think the PS5 tempest audio engine is somewhat similar to a Cell SPU.


AMD's 1st gen TrueAudio is based on Cadence Tensilica HiFi EP DSP with Tensilica Xtensa SP float support. For Xbox One, MS doubled the amount of Tensilica's DSPs for Kinect.

For Polaris GCN, True Audio Next moved DSP work on the GpGPU. This development path has evolved into PS5's AMD CU-based DSP, hence one less IP vendor to pay.

AMD TrueAudio is found on-die of select AMD graphics cards and APUs. A die can house multiple AMD TrueAudio DSP cores, each having 32KiB instruction and data caches and 8KiB of scratchpad memory for local operation.

AMD TrueAudio SIP blocks are found on the dies of some GPUs of the AMD Radeon Rx 200 Series; namely the Radeon R7 260, Radeon R7 260X, Radeon R9 285, Radeon R9 290, Radeon R9 290X and the Radeon R9 295X2, and in Kaveri and Carrizo-based APUs. TrueAudio is also supported by the PlayStation 4 hardware. Xbox One shares GPU design with Bonaire GCNs e.g. Radeon R7 260/R7 260X.

PS5 has 36 CU-RX for graphics with 1 CU-DSP (~100 GFLOPS) for audio.
 
Last edited:

rnlval

Member
You've missed the point entirely, sadly and quoted lots of irrelevant stuff to drown out my original point it seems.

None of what you refer to is the same situation. The other groups of SPUs within the Cell BE can crash their internal black-box hypervisor-ed programs and the remaining SPU groups can continue completely unaffected in isolation - if the software design intends that type of operating say for NASA/Airforce or bank use- the SPUs are more than Co-processors that run as slave devices to their primary CPU core thread as the examples you provided. The EiB being a ringbus where a token is shared is the reason why the SPUs can operate in isolated arbitrary groups in this way, and in theory even the PPE could crash and recover its program - from a fresh bootstrap sequence - without killing the existing SPU groups working away AFAIK.

I also used the word "reasonably" unique, rather than just "unique", on the off chance other weak hardware from the past, not used like the Cell BE in consoles and Roadrunner had a similar independent satellite processor setup.
FYI, the "DSP-Like" wording is from IBM themselves (Cite reference: Cell Broadband Engine Architecture and its first implementation, https://www.researchgate.net/public...d_its_first_implementation-A_performance_view )
----

"The SPUs are less complex computational units than PPEs because they do not perform any system management functions" - Introduction to Cell Broadband Engine Architecture, Version 1.02, October 11, 2007

There is a vast swath of tasks the SPE
1) cannot do (software permissions, various interrupts, run the priviledged software that handles memory mapping)
2) cannot do very well (anything needing a context switch, tons of branches, a lot of synchronization)

One of the SPEs is reserved for the use of the OS to provide certain services but an SPE is not capable of running a modern OS on its own. SPEs do not support different privilege modes for code (on a modern OS the kernel runs at a higher privilege level than user code), they do not support any kind of virtual memory or memory protection, they have very limited support for interrupts and they do not have full access to hardware for I/O.

The SPU has the following restrictions:
• No direct access to main storage (access to main storage using MFC facilities only)
• No distinction between user mode and privileged state
• No access to critical system control such as page-table entries (this restriction should be enforced by PPE privileged software).
• No synchronization facilities for shared local storage access

ARM has little.Big CPU cores with the same instruction set among the CPUs.
 
Last edited:
The Cell was a big reason why Sony went from the utter dominance they had during the PS2 to being neck and neck with the 360.

The only difference between the PS3 and the Saturn was the insane headwind the prior gen gave Sony.
 

PaintTinJr

Member
FYI, the "DSP-Like" wording is from IBM themselves (Cite reference: Cell Broadband Engine Architecture and its first implementation, https://www.researchgate.net/public...d_its_first_implementation-A_performance_view )
----
...
This again is just gibberish you are responding with - and your other responses to others too as far as I can tell, and these responses are ruining what should be a good nostalgic thread, IMHO, as you are consistently posting appendices/bibliographies at a ratio of (at least) 10:1 of your own words - your own words that are supposed to make any non-strawman type counter argument to the posts you are responding to..

I've purposely not quoted your last post as is, so if you wish to have a do over - and you don't need to provide the factoids at all, just your own unique take because if you make a statement of fact, unless I know it to be false, I'll happily take you at "your words" so you can focus entirely on making a credible counter-argument to match the argument you are responding to.
 
Last edited:

PaintTinJr

Member
The Cell was a big reason why Sony went from the utter dominance they had during the PS2 to being neck and neck with the 360.

The only difference between the PS3 and the Saturn was the insane headwind the prior gen gave Sony.
If you take the PS3 and PS4 vs the 360 and X1 as a joint versus, as the thread eludes to, the software ground work done with the PS3 allowed PlayStation to dominate with the PS4 quite easily,

The "PS4 sold over double the X1" flatters the X1 sales IMO by how many OG X1 consoles (and slims) ended up in trade-in stores or unused (IMO) after the active base moved to the X1X (or slim) because unlike the OG PS4 which still had high demand throughout - and identical performance to revised ps4 - as the market leading device - to play the gaming lineup eg Spiderman, deathstranding, Ghost of tshusima, FF7 remake, etc - the lack of constant hits from the middle of the 360 lifetime led to an unbelievable run of "wait until next E3" years for Xbox, and that IMO is because the 360 generation did make Xbox first parties what they are today, which is developers that spent years in a kinect focused wilderness, and are now being moved to a service game model by gamepass - going by the new Xbox gdk name being Xbox & PC Game Pass GDK in their own recent video.
 
Last edited:

Fafalada

Fafracer forever
Problems with your argument
Let's separate the discussion here - you are arguing about things I didn't even bring up, so to 'performance' points:

3. There's no visibility MLAA codebase from the same developer. You didn't provide proper sourcing with your argument.
I didn't bother with sourcing because Intel solution was open-sourced and cited MLAA work that was done on PS3. And more importantly because it was an illustration of the point (statistical irrelevance of both data-points), not a debate of what the numbers meant (other than - the fact they don't mean anything for IPC - which you apparently agreed with me, so why is this discussion still going?).
Implementation not being in the same source-base is irrelevant to entire discussion, as none of this (including Ubi example) is a traditional benchmark. It's custom-tailored code-bases for each individual platform on display (Ubi's GPU codebase had like - nothing in common with x86 one, which in turn has almost no similarity to Cell SPE one). The purpose of their tests was to max-out hw utilization on every scenario, not measure how 'code-x' runs on each hw like a commercial benchmark would.

4. Depending on scene complexity and frame buffer resolution, MLAA's render time can change.
Very true. The Intel number I cited was running the demo app in 1280x720p on Sandy Bridge development kit platform. That is admittedly not 1:1 comparable to running it on a God Of War image sample, which I could have probably tried out back then (nice thing about MLAA is that you can feed it any image) - but it never crossed my mind I'd need to defend the figure in an online forum 11 years later.



Now - going off-topic to other points you raised because I'm morbidly curious what that was all about.
1. MLAA is a needed workaround for aging G7X's MSAA performance issue. PCs with DX10 class or ATI X1xxx DX9c GPU don't have this issue. AMD added the MLAA option for its GPU drivers.
When did anyone mention use-cases, and why would that matter to IPC (or other performance utilization) of a random piece of hardware? We weren't discussing useability, or practical value - just performance?
Btw - while we're on topic - if PS3 was still actively developed for today - titles would all be running one of the 1001 variants of TAA/TSSAA, on SPEs. The reason they didn't back then is that algorithms haven't matured yet - not because it wasn't practical (in fact, it would have been more practical than MLAA was - for obvious reasons).

2. There's no visibility for AVX usage for SandyBridge.
I really don't recall - but again, why would that matter? Intel made a point specifically towards running MLAA on SB because it highlighted how fast those CPUs were, and in laptops there was a really bizarre amount of CPU power weighed down by a really slow integrated GPU (if you think PS3 was skewed, this was way worse), so CPU could pickup a lot of slack for the GPU. The reason tech didn't take-off was mainly down to how badly DirectX limited such workloads to be used more generally, but that's another story.
 
Last edited:

rnlval

Member
Let's separate the discussion here - you are arguing about things I didn't even bring up, so to 'performance' points:


I didn't bother with sourcing because Intel solution was open-sourced and cited MLAA work that was done on PS3. And more importantly because it was an illustration of the point (statistical irrelevance of both data-points), not a debate of what the numbers meant (other than - the fact they don't mean anything for IPC - which you apparently agreed with me, so why is this discussion still going?).
Implementation not being in the same source-base is irrelevant to entire discussion, as none of this (including Ubi example) is a traditional benchmark. It's custom-tailored code-bases for each individual platform on display (Ubi's GPU codebase had like - nothing in common with x86 one, which in turn has almost no similarity to Cell SPE one). The purpose of their tests was to max-out hw utilization on every scenario, not measure how 'code-x' runs on each hw like a commercial benchmark would.


Very true. The Intel number I cited was running the demo app in 1280x720p on Sandy Bridge development kit platform. That is admittedly not 1:1 comparable to running it on a God Of War image sample, which I could have probably tried out back then (nice thing about MLAA is that you can feed it any image) - but it never crossed my mind I'd need to defend the figure in an online forum 11 years later.



Now - going off-topic to other points you raised because I'm morbidly curious what that was all about.

When did anyone mention use-cases, and why would that matter to IPC (or other performance utilization) of a random piece of hardware? We weren't discussing useability, or practical value - just performance?
Btw - while we're on topic - if PS3 was still actively developed for today - titles would all be running one of the 1001 variants of TAA/TSSAA, on SPEs. The reason they didn't back then is that algorithms haven't matured yet - not because it wasn't practical (in fact, it would have been more practical than MLAA was - for obvious reasons).


I really don't recall - but again, why would that matter? Intel made a point specifically towards running MLAA on SB because it highlighted how fast those CPUs were, and in laptops there was a really bizarre amount of CPU power weighed down by a really slow integrated GPU (if you think PS3 was skewed, this was way worse), so CPU could pickup a lot of slack for the GPU. The reason tech didn't take-off was mainly down to how badly DirectX limited such workloads to be used more generally, but that's another story.
1. Your argument didn't have the same artwork content. I have searched and verified your claim and found out it's an apple to pears comparison!

You haven't disclosed your cited source.

From https://www.codeproject.com/Articles/229353/MLAA-Efficiently-Moving-Antialiasing-from-the-GPU which cites Intel's MLAA example, it uses Intel SSE, not AVX.

From http://www.iryoku.com/mlaa/

SUMIpjA.png


MLAA render time on GeForce 9800 GTX is tiny. GeForce 9800 GTX (G92) is 65 nm refined GeForce 8800 GTX (G80). NVIDIA released GeForce 8800 GTX a few weeks before PS3's release.

Xbox 360's Xenos GPU can handle MLAA around 3.18 ms to 4 ms range.

Using CPUs for MLAA when competent DX10 class GpGPU is available that can deliver tighter render times is wasteful.

33 ms render time target is 30 hz.

16 ms render time target is 60 hz.

I'm currently rebuilding my Intel IvyBridge Core i7-3770K (8 threads/4 cores with >4Ghz) with GeForce 8600 GTS-based PC (I'm waiting for $22 MSI Big Bang Z77 MPower motherboard to replace my bricked ASUS P8P67 motherboard that needs a BIOS re-flash, I have DDR3-2400 for 38.3GB/s memory bandwidth ) and I re-activated 12 cores/24 threads Xeon X5690/64 GB RAM (with 64 GB/s memory bandwidth) decommissioned workstation for this purpose. I'm attempting to find Intel's MLAA sample download.

I don't believe you.

---

Ubisoft's PPE vs SPE vs Jaguar vs XBO Bonaire GCN vs PS4 Liverpool GCN example is processing the same art assets. The workload is physics.

---

The PC has gained low overhead DirectX12/Vulkan APIs and ReBar that enables PC CPUs to access the entire GPU VRAM without a 256 MB access window.


----

MLAA on RX Vega 7 integrated GPU via Ryzen Pro 7 4750U (15 watts) Thinkpad L14 business laptop.
dYad7hd.png


At 1019 fps, the MLAA render time is about 0.94 ms for RX Vega 7 (~1.433 TFLOPS at 1.6 Ghz) integrated GPU via Ryzen Pro 7 4750U. This test is with the default TDP configuration.

My Thinkpad L14 business laptop is used to access the corporate network and it can play Genshin Impact.

Intel Xe Iris IGPs are about 7nm RX Vega 7/8 level performance. Mobile Ryzen 6000U's RDNA 2-based 680M almost doubles 7 nm RX Vega 8's performance. RX 680M is ~30-50% higher framerates versus the Iris Xe chip in the Zephyrus M16.
Cited https://www.ultrabookreview.com/54099-amd-radeon-680m-rnda2-benchmarks/

With APU ultrabooks in the 15 to 25 watts range, the priority is to allocate available TDP to the GPU, not the CPU. The limitation is the TDP design envelope.

I can set the TDP limit to 35 watts via the Ryzen Controller tool and the mobile GPU will use the available TDP headroom. I use Windows Power management to limit the CPU's TDP allocation. Allocating higher TDP to the CPU is counterproductive to the modern game's frame rates. Heavy CPU AVX usage is not recommended when the iGPU's performance is a priority. Similar guidelines are applicable for Intel APUs with Xe Iris.
 
Last edited:

DonkeyPunchJr

World’s Biggest Weeb
Are there any devs who have commented to this effect? That Cell programming forced them to become master computer wizards and that their elite skills paid off in the PS4 and PS5 era?

Gabe Newell infamously said something like “it’s pointless to learn Cell programming because all that makes you is good at Cell programming” I.e. it’s a very specialized architecture and the skills you develop aren’t transferable to anything else.

And it’s not like Cell invented multi-core CPUs. Xbox 360 also had a multi-core CPU that is conceptually a lot more like the ones we have today. Is there any reason to believe that Cell somehow prepared Sony devs for the multi-core future any better than Xenos did?
 

Fafalada

Fafracer forever
1. Your argument didn't have the same artwork content.
I acknowledged that - there's no easy way to go back in time to re-benchmark against different art, but the 4ms ballpark was consistent enough that I don't see it mattering much.

Using CPUs for MLAA when competent DX10 class GpGPU is available that can deliver tighter render times is wasteful.
Not if you end up with a tighter render-time freeing that GPU time up completely.
The whole point of CPU MLAA on SB/PS3 was that you had a large CPU compute that was on par with the GPU (or better). So in most cases, AA becomes 'free' which is always better/more efficient than 'X miliseconds'. It would be even better if TAA was available at the time for visual outputs, but compute costs/freeing up GPU cycles would have been same.

Ubisoft's PPE vs SPE vs Jaguar vs XBO Bonaire GCN vs PS4 Liverpool GCN example is processing the same art assets. The workload is physics.
It's a custom dynamics solver - there's no 'generic' workloads, the algorithm had to be ported to different architectures. Ubi even described how they struggled getting performance out of GPU compute (there were numerous optimizations described that are completely unique to GPGPU/not applicable to any other port). Same can be said for SPE version of course.

I can set the TDP limit to 35 watts via the Ryzen Controller tool and the mobile GPU will use the available TDP headroom and I use Windows Power management to limit the CPU's TDP allocation. Allocating higher TDP to the CPU is counterproductive to the modern game's frame rates. Heavy CPU AVX usage is not recommended when the iGPU's performance is a priority. Similar guidelines are applicable for Intel APUs with Xe Iris.
This is a topic discussing how 12-17 year old hardware ran compute-heavy workloads.
I have no idea what you're bringing present-day hw into this for - does it come with a time-traveling machine?


which cites Intel's MLAA example, it uses Intel SSE, not AVX.
I asked in previous post - I'll be more direct now: So what?
MLAA render time on GeForce 9800 GTX is tiny.
See above.
Xbox 360's Xenos GPU can handle MLAA around 3.18 ms to 4 ms range.
Again, see above.
 
Last edited:
The Cell was a big reason why Sony went from the utter dominance they had during the PS2 to being neck and neck with the 360.

The only difference between the PS3 and the Saturn was the insane headwind the prior gen gave Sony.

Actually they were not close at all originally, the financial crisis delayed the gen, it would have been a bloodbath otherwise.
 

rnlval

Member
I acknowledged that - there's no easy way to go back in time to re-benchmark against different art, but the 4ms ballpark was consistent enough that I don't see it mattering much.


Not if you end up with a tighter render-time freeing that GPU time up completely.
The whole point of CPU MLAA on SB/PS3 was that you had a large CPU compute that was on par with the GPU (or better). So in most cases, AA becomes 'free' which is always better/more efficient than 'X miliseconds'. It would be even better if TAA was available at the time for visual outputs, but compute costs/freeing up GPU cycles would have been same.


It's a custom dynamics solver - there's no 'generic' workloads, the algorithm had to be ported to different architectures. Ubi even described how they struggled getting performance out of GPU compute (there were numerous optimizations described that are completely unique to GPGPU/not applicable to any other port). Same can be said for SPE version of course.


This is a topic discussing how 12-17 year old hardware ran compute-heavy workloads.
I have no idea what you're bringing present-day hw into this for - does it come with a time-traveling machine?



I asked in previous post - I'll be more direct now: So what?

See above.

Again, see above.
On any device, whether it is a PC, netbook, or smartphone, you can always perform anti-aliasing an order of magnitude faster (or more) on the GPU than on the CPU. This is assuming the machine has a competent GpGPU feature set.

If MSAA were impractical for any given hardware, there are post-processing antialias techniques similar to "MLAA" that can be performed on the GPU e.g. Doom 3 Vulkan with Async Compute has TSSAA (temporal antialiasing) since using compute shader's texture units path as read/write units doesn't have ROP's MSAA hardware.

For graphics, GpGPUs are always much, much faster than a CPU.

The process of sending an image from the GPU to the CPU and back requires that the CPU and GPU sync up. CELL's SPEs and RSX are not integrated on the same chip package, hence already incurred render time consumption.

The low-powered graphics chips this is being touted for are going to have even more problems with texture bandwidth.

https://www.gamesindustry.biz/digitalfoundry-tech-focus-mlaa-heads-for-360-pc?page=2

MLAA was further optimized for Xbox 360's GPU that resulting in 2.47ms

Jorge Jimenez
It's difficult to say given that we are running on different platforms and configurations. On PC, we're quite fast. In fact, we're almost free on the mid-high GPU range (around 0.4ms on a GeForce GTX 295) and, as far as we know, the fastest approach given the maximum line length we are able to handle. On the Xbox 360 we run at 2.47ms, with still a lot of possible optimisations to try.
---


No modern GPU solution would advocate for two graphics processors on two separate chip packages since this incurs higher latency issues. The superior solution is to keep the render tile to remain within GPU's cache from Geometry to ROPS.

Nzasrbw.jpg


From NVIDIA's point of view, exploiting GPU L2 coherency is part of NVIDIA's tiled caching enhancements.

With Intel's ARC's and Xe Iris's existence, Intel has withdrawn the MLAA CPU sample from their website.
 
Last edited:

rnlval

Member
The PS3 is by far, the worst console Sony has ever made. Over engineered, expensive crap. Only redeeming qualities was that it did have some killer games.
PS3's CELL is a dead-end architecture. In the embedded markets, PowerPC is getting smashed by ARM and RISC-V.
 
PS3's CELL is a dead-end architecture. In the embedded markets, PowerPC is getting smashed by ARM and RISC-V.

Might I suggest this is the wrong way to think of it.

It's a 2005-era architecture that achieved extremely high computational density, data locality and bandwidth that were superior to any other product at the time. In fact it was so good, it was competitive with the next generations offerings.

Time are different and just passing judgement without looking at it in the paradigm of it's time is so wrong. OpenCL and Cuda didn't even exist when Cell was around! Of course, if Sony chose to stick with it it would have evolved with the field and we'd be looking at a drastically different offering, but that's what happens in time -- things evolve. GPUs have evolved immensely. So could a hypothetical Cell-derivative that could retain the high computational density and data locality with tighter integration than that which we see today in these off-the-shelf commercial SoCs exist? Of course, you just need the desire, money and talent.

And PPC is and always has been arbitrary. You fixating on it is just silly.
 
Last edited:

ByWatterson

Member
Xbox is very attractive to me today because of the combination of back-compat, resolution/FPS boost, and Gamepass.

Sony cannot get all the way there because of Cell.

Fuck Cell.

(still love Sony output, but I'm preferring the Xbox ecosystem these days)
 

Yoboman

Member
Cell was a beast

It's interesting to wonder what games would be like if consoles continued being super CPU centric like envisioned in that gen
 

dotnotbot

Member
I actually hooked up my PS3 again and I gotta say, it's a pretty good console. Multimedia capabilities are solid, full BC with PS1, Blu ray support and a great library of games. Don't pay any attention to my post, I have a bad temper sometimes which makes me say stupid shit.

It's like an ugly girlfriend with amazing personality. Still fun. Amazing library is all that matters in the end.
 

AngelMuffin

Member
.

While I understand why both Sony and MS moved to PC parts for their new consoles, I really miss the days of proprietary processors from Sony, Sega etc.
Why, exactly? The Saturn, PS2/PS3 were all hard to develop for with Saturn never really coming close to its full potential and the PZ2/PS3 didn’t hit their stride until well into their respective console generation.
 

Fafalada

Fafracer forever
Why, exactly?
Unique hw produced actual differentiation in the market. It made life harder for cross platform devs, but for consumers it was undeniably more interesting choice.
In 2022 we have 9 console hw profiles on the market that are all basically indistinguishable from one another aside for performance.
Sure there's still exclusives (but that's rapidly diminishing as almost everything is coming to pc) and service differences that are again, rapidly dissapearing.
 

SkylineRKR

Member
The PS3 is by far, the worst console Sony has ever made. Over engineered, expensive crap. Only redeeming qualities was that it did have some killer games.

I agree. I have a PS3 again since late summer and its the only way to play certain games like Ridge Racer 7 etc, outside of emulation. But its slow when you have to update something, and when you have to unpack an already slow download... The console can be very annoying. Back then I would just eject a game if an update screen came up.

The original sixaxis was a step down from the DS2. The software wasn't stellar during the first year. Performance and IQ of lots of games was horrible accross the board. 360 also suffered from this mind you, but it was still a quicker, more accessible console with usually better perf and IQ.

But PS3 had some more redeeming qualities, like upgrade off the shelf HDD, USB wasn't picky, it could eat almost anything. Online play was free. And the games are the most important, and to be frank, they were good during the latter half of its run.
 

dave_d

Member
The original sixaxis was a step down from the DS2.

I got to disagree with you on the sixaxis vs. the DS2. I mean the DS2 was wired and the sixaxis was wireless and oddly enough I had connectivity issues with the DS2. After having the PS2 for a year or so the DS2 would constantly switch out of analog mode.(Cleaning the contacts helped a little but once it started it just kept happening.) The sixaxis never did that which was a huge improvement. Also I'm not a big fan of rumble so I preferred the fact the sixaxis was so much lighter than the DS2. The built in battery was a nice feature so no searching for batteries like the Xbox controller. I'm just annoyed over the fact they had a controller that could have made FPS far more playable and they never figured out how to do gyro aiming. (Playing BotW and Splatoon show how much better aiming can be with gyro aiming and yet Sony had it years earlier.)
 

SkylineRKR

Member
I got to disagree with you on the sixaxis vs. the DS2. I mean the DS2 was wired and the sixaxis was wireless and oddly enough I had connectivity issues with the DS2. After having the PS2 for a year or so the DS2 would constantly switch out of analog mode.(Cleaning the contacts helped a little but once it started it just kept happening.) The sixaxis never did that which was a huge improvement. Also I'm not a big fan of rumble so I preferred the fact the sixaxis was so much lighter than the DS2. The built in battery was a nice feature so no searching for batteries like the Xbox controller. I'm just annoyed over the fact they had a controller that could have made FPS far more playable and they never figured out how to do gyro aiming. (Playing BotW and Splatoon show how much better aiming can be with gyro aiming and yet Sony had it years earlier.)

Wireless is a huge boon ofcourse, but built in batteries aren't that great. My sixaxis eventually died. With Xbox you could just take the battery out if you wanted to store the console. My Ps2 with DS2 still works ofcourse, as its wired. Personally I never had issues with DS2's.

Gyro aiming just didn't work as well as it did later on. I felt it worked good on Vita.
 

dave_d

Member
Wireless is a huge boon ofcourse, but built in batteries aren't that great. My sixaxis eventually died. With Xbox you could just take the battery out if you wanted to store the console. My Ps2 with DS2 still works ofcourse, as its wired. Personally I never had issues with DS2's.

Gyro aiming just didn't work as well as it did later on. I felt it worked good on Vita.
You got a point there. I probably should replace the batteries in my sixaxis since it barely holds a charge and that's going to be a pain. Actually I think I agree with you to a point with the XBox way of handling it. You can use AA's or a battery pack that the system can recharge. Also I had tons of connectivity problems with both the DS1 and DS2 so wireless was a godsend. No idea why gyro aiming on the sixaxis didn't work so well. I mean they had it in Heavenly Sword for the bow sections but that didn't work so well. (And the sixaxis has the 3 gyros and 3 accelerometer so no idea why it's junk.)
 

rnlval

Member
Might I suggest this is the wrong way to think of it.

It's a 2005-era architecture that achieved extremely high computational density, data locality and bandwidth that were superior to any other product at the time. In fact it was so good, it was competitive with the next generations offerings.

Time are different and just passing judgement without looking at it in the paradigm of it's time is so wrong. OpenCL and Cuda didn't even exist when Cell was around! Of course, if Sony chose to stick with it it would have evolved with the field and we'd be looking at a drastically different offering, but that's what happens in time -- things evolve. GPUs have evolved immensely. So could a hypothetical Cell-derivative that could retain the high computational density and data locality with tighter integration than that which we see today in these off-the-shelf commercial SoCs exist? Of course, you just need the desire, money and talent.

And PPC is and always has been arbitrary. You fixating on it is just silly.
The 1st CUDA GpGPU family is GeForce 8800 GTX (G80, November 8th, 2006) and GeForce 8800 GTS 640 (G80, November 8th, 2006) which was released a few days before PS3's release.

https://en.wikipedia.org/wiki/Timeline_of_PlayStation_3_SKUs
  • On November 11, 2006, the PlayStation 3 was launched in Japan with both 20 GB and 60 GB models available.
  • On November 17, 2006, the PlayStation 3 was launched in the United States, with both 20 GB and 60 GB models available.

SPE's 128 registers are tiny when compared to the monstrous register file in the G80.

--------------------
There is a vast swath of tasks the SPE

1) cannot do (software permissions, various interrupts, run the priviledged software that handles memory mapping)

2) cannot do very well (anything needing a context switch, tons of branches, a lot of synchronization)

One of the SPEs is reserved for the use of the OS to provide certain services but an SPE is not capable of running a modern OS on its own. SPEs do not support different privilege modes for code (on a modern OS the kernel runs at a higher privilege level than user code), they do not support any kind of virtual memory or memory protection, they have very limited support for interrupts and they do not have full access to hardware for I/O.

The SPU has the following restrictions:
• No direct access to main storage (access to main storage using MFC facilities only)
• No distinction between user mode and privileged state
• No access to critical system control such as page-table entries (this restriction should be enforced by PPE privileged software).
• No synchronization facilities for shared local storage access

------------
CELL's SPU does NOT share PowerPC pointer with the host PPE!

AMD GCN can share the X86-64 pointer with the AMD64 CPU!

nMOR4an.jpg


CELL'S SPU can't do AMD GCN's lite-X86-64 CPU behavior.

IBM's PPE and SPU fusion is a con job!
 
Last edited:

rnlval

Member
This again is just gibberish you are responding with - and your other responses to others too as far as I can tell, and these responses are ruining what should be a good nostalgic thread, IMHO, as you are consistently posting appendices/bibliographies at a ratio of (at least) 10:1 of your own words - your own words that are supposed to make any non-strawman type counter argument to the posts you are responding to..

I've purposely not quoted your last post as is, so if you wish to have a do over - and you don't need to provide the factoids at all, just your own unique take because if you make a statement of fact, unless I know it to be false, I'll happily take you at "your words" so you can focus entirely on making a credible counter-argument to match the argument you are responding to.
Your post is irrelevant when this thread should not be about me.
 
Top Bottom