Let's separate the discussion here - you are arguing about things I didn't even bring up, so to 'performance' points:
I didn't bother with sourcing because Intel solution was open-sourced and cited MLAA work that was done on PS3. And more importantly because it was an illustration of the point (statistical irrelevance of both data-points), not a debate of what the numbers meant (other than - the fact they don't mean anything for IPC - which you apparently agreed with me, so why is this discussion still going?).
Implementation not being in the same source-base is irrelevant to entire discussion, as none of this (including Ubi example) is a traditional benchmark. It's custom-tailored code-bases for each individual platform on display (Ubi's GPU codebase had like - nothing in common with x86 one, which in turn has almost no similarity to Cell SPE one). The purpose of their tests was to max-out hw utilization on every scenario, not measure how 'code-x' runs on each hw like a commercial benchmark would.
Very true. The Intel number I cited was running the demo app in 1280x720p on Sandy Bridge development kit platform. That is admittedly not 1:1 comparable to running it on a God Of War image sample, which I could have probably tried out back then (nice thing about MLAA is that you can feed it any image) - but it never crossed my mind I'd need to defend the figure in an online forum 11 years later.
Now - going off-topic to other points you raised because I'm morbidly curious what that was all about.
When did anyone mention use-cases, and why would that matter to IPC (or other performance utilization) of a random piece of hardware? We weren't discussing useability, or practical value - just performance?
Btw - while we're on topic - if PS3 was still actively developed for today - titles would all be running one of the 1001 variants of TAA/TSSAA, on SPEs. The reason they didn't back then is that algorithms haven't matured yet - not because it wasn't practical (in fact, it would have been more practical than MLAA was - for obvious reasons).
I really don't recall - but again, why would that matter? Intel made a point specifically towards running MLAA on SB because it highlighted how fast those CPUs were, and in laptops there was a really bizarre amount of CPU power weighed down by a really slow integrated GPU (if you think PS3 was skewed, this was way worse), so CPU could pickup a lot of slack for the GPU. The reason tech didn't take-off was mainly down to how badly DirectX limited such workloads to be used more generally, but that's another story.
1. Your argument didn't have the
same artwork content. I have searched and verified your claim and found out it's an apple to pears comparison!
You haven't disclosed your cited source.
From
https://www.codeproject.com/Articles/229353/MLAA-Efficiently-Moving-Antialiasing-from-the-GPU which cites Intel's MLAA example, it uses Intel SSE, not AVX.
From
http://www.iryoku.com/mlaa/
MLAA render time on GeForce 9800 GTX is tiny. GeForce 9800 GTX (G92) is 65 nm refined GeForce 8800 GTX (G80). NVIDIA released GeForce 8800 GTX a few weeks before PS3's release.
Xbox 360's Xenos GPU can handle MLAA around
3.18 ms to 4 ms range.
Using CPUs for MLAA when competent DX10 class GpGPU is available that can deliver tighter render times is wasteful.
33 ms render time target is 30 hz.
16 ms render time target is 60 hz.
I'm currently rebuilding my Intel IvyBridge Core i7-3770K (8 threads/4 cores with >4Ghz) with GeForce 8600 GTS-based PC (I'm waiting for $22 MSI Big Bang Z77 MPower motherboard to replace my bricked ASUS P8P67 motherboard that needs a BIOS re-flash, I have DDR3-2400 for 38.3GB/s memory bandwidth ) and I re-activated 12 cores/24 threads Xeon X5690/64 GB RAM (with 64 GB/s memory bandwidth) decommissioned workstation for this purpose. I'm attempting to find Intel's MLAA sample download.
I don't believe you.
---
Ubisoft's PPE vs SPE vs Jaguar vs XBO Bonaire GCN vs PS4 Liverpool GCN example is processing the same art assets. The workload is physics.
---
The PC has gained low overhead DirectX12/Vulkan APIs and ReBar that enables PC CPUs to access the entire GPU VRAM without a 256 MB access window.
----
MLAA on RX Vega 7 integrated GPU via Ryzen Pro 7 4750U (15 watts) Thinkpad L14 business laptop.
At 1019 fps, the MLAA render time is about
0.94 ms for RX Vega 7 (~1.433 TFLOPS at 1.6 Ghz) integrated GPU via Ryzen Pro 7 4750U. This test is with the default TDP configuration.
My Thinkpad L14 business laptop is used to access the corporate network and it can play Genshin Impact.
Intel Xe Iris IGPs are about 7nm RX Vega 7/8 level performance. Mobile Ryzen 6000U's RDNA 2-based 680M almost doubles 7 nm RX Vega 8's performance. RX 680M is ~30-50% higher framerates versus the Iris Xe chip in the Zephyrus M16.
Cited
https://www.ultrabookreview.com/54099-amd-radeon-680m-rnda2-benchmarks/
With APU ultrabooks in the 15 to 25 watts range, the priority is to allocate available TDP to the GPU, not the CPU. The limitation is the TDP design envelope.
I can set the TDP limit to 35 watts via the Ryzen Controller tool and the mobile GPU will use the available TDP headroom. I use Windows Power management to limit the CPU's TDP allocation.
Allocating higher TDP to the CPU is counterproductive to the modern game's frame rates. Heavy CPU AVX usage is not recommended when the iGPU's performance is a priority. Similar guidelines are applicable for Intel APUs with Xe Iris.