I posted this speculation in the other thread before it got locked for 24hrs, and in light of what you mentioned above I would be interested to know if you agree with my speculation of a 3rd asymmetric speed of 280GB/s.
Additionally, the IO complex is a co-processor setup(so it should be tightly coupled to the CPU) working seamlessly hand in glove. But that does raise another issue.
General purpose cores of a CPU are about 40-50% maximum utilisation efficient because of branch prediction, a stream processor like the SPU and some AMD cpu cores like those in the jaguar (IIRC) can hit 70% utilisation, whereas an ASIC/vector processor like a co-processor is(remembering my old Intel 286's co-processor from decades ago) can hit GPU utilisation of +90%.
So when comparing achievable real-world performance of the two IO decompression systems. If the Zen2 cores are steam processor efficient, then the 6GB/s max XsX use of a Zen2 core is losing - in a best case scenario - 30% of its theoretical performance, and similarly, the PS5 22GB/s IO is losing about 10% of its theoretical maximum in a best case scenario (AFAIK). Which I believe would make an possible co-processor contention issue for RAM a small performance loss in comparison.
This is really interesting actually, I am wondering if the utilization rate on Zen 2 will have improved over Jaguar. Certainly would think so, so we could be looking at maybe 80% utilization, maybe even a tad higher depending on how some of the rumors that float around shake out (like unified L3 cache between the CCXs). I remember
NXGamer
doing a video on the next-gen GPUs and they threw out a random GPU utilization number for the PS4 and XBO that seemed a bit low. I dunno if that was just them going with a gut feeling of a number or if there's research that's been done (by whomever) to back it up but in that vid they also suggest that GPU throughput utilization for the next-gen systems should be much higher, his number was 65% and was maybe being conservative IMHO.
Custom ASICs like DSPs, as you say, they tend to have higher utilization than even those, the trade-off tho being they are limited in what tasks they actually focus on. So getting back to the SSD stuff you bring up, yes I figure when you put it that way the contention with the memory bus could be negated a bit with PS5 due to the co-processor setup they're going, but there's two big variables here we don't know yet that impact things greatly IMO:
1): What processors/architectures are serving the basis of those co-processors in the I/O block? Sony's compared their capabilities to Zen 2 cores, so if we take what they've done with Tempest for example, I'm going to guess they have - literally - taken Zen 2 cores and customized them for the coprocessors in the I/O block by stripping them of extraneous features, maybe also reducing their cache, etc (maybe that might've influenced them choosing SRAM for the cache on the flash memory controller, besides the obvious benefits?).
That being the case, it would mean that the co-processors of the PS5 I/O block would be hitting closer to the utilization rates of the CPU; maybe a little bit higher depending on what features they felt were extraneous and got cut. Then again, there's the off-chance it could be a little bit lower if specific things like the cache were slashed because even if they have SRAM cache on the controller block that's still not exactly the same as having a larger pool of cache on the processor die itself. But I guess we'll have to wait and see.
2): We don't necessarily know what specific aspects of the I/O stack the Series systems are relegating to the CPU. We know MS have said "1/10th of a single CPU core", which I would say is the same core as the OS core itself, but that still leaves a couple of questions of its own. Mainly, is ALL of the I/O access being done in that reserved CPU space or is that 1/10th of a core part referring to specific parts of the stack and other parts of the I/O access uses further CPU cores?
Now, I think that's actually a ridiculous question to ask TBH, because since both MS and Sony have to guarantee developers a certain amount of resources, they can't go and say
"alright devs, you have seven cores now! Oh wait, except when you need your game to do I/O on the Series systems. In that case you better maybe limit that game logic to like two cores.". I don't think devs would like that at all and if it were the case with the Series systems we'd have been hearing stuff alluding to that from devs on the DL, or getting leaked...rather persistently.
By and large though that is not happening
at all; devs seem to be
very pleased with the I/O in both Sony and Microsoft's setups, so I'm inclined to believe that yes, MS has their own co-processor (seemingly) elements in their XvA setup (the hardware side, anyway), and they probably either have some smaller MPU block, some DSP or maybe just repurposed Zen 2 cores. In fact if you remember there was a LinkedIn listing from an Indian AMD engineer who worked on the MS team months ago mentioning ARM cores in the APU. I personally think those ARM cores are something for the GPU (maybe extending executeIndirect capabilites?), but it's very possible they could be for XvA as well.
I know people'll think
"But ARM? That's not Zen 2!?". But again, Zen 2 has a wide range of performance capabilities, I'm more than sure some modern ARM processors can compare (if not beat) at least some of the lower-end Zen 2 CPU configurations in terms of raw capability (never minding they have a different type of architecture, RISC vs. x86's CISC; there's something to this I'm forgetting but IIRC modern x86/x86-64 CPUs are
physically designed as RISC but implement their functionality as CISC...or something like that?).
So, if those two big factors play out the way described, you're still right that the matter of bus contention would be reduced in terms of stall times for CPU/GPU etc. getting privilege back from the I/O block on PS5, and since Sony's solution just has more physical hardware dedicated to the task that would also help with cutting down stall times. I'm also
not saying that even with the Series setup you'd get a game going full-tilt with CPU logic simultaneously to I/O bus access (i.e game logic might have a slight reduction in throughput during that type of operation, plus the game still would need to make sure it's not trying to access data that's actively being replaced otherwise you still end up with misses).
However I guess what I am saying there is Microsoft's approach despite it's own "limitations" (if you want to compare them apples-to-apples, which I personally don't see them as), still has some transient benefits that games can utilize if they're aware of it, analogous with benefits that are available to games with Sony's solution. There's still strengths that can be played upon even in this type of situation.
The bus isn´t restricted to one OP at a time tho´. It´s the bandwith that´s the limiting factor. So ofc if your loading data off the ssd into memory, the bus will be filled, but other than that the bus is happy to give you a ticket to ride.
This is the same for XSX and PS5 (or all computer systems in general)
Sony added more priority levels For their SSD too, so that they «always» have a ticket ready for a VIP passenger as well.
Oh I understand xD; I wasn't trying to imply any scenario where the I/O block, in accessing the bus, suddenly means other processor components have to reset their operations on the bus or anything like that.
I understand that with most memory buses access is done in parallel across whatever many chips comprise of the bus so the data is evenly "striped" (the same thing is done with NAND on SSDs too of course).