• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

RDNA 3 To Offer More Than 50% Performance Per Watt Over RDNA 2, Confirmed To Be Chiplet Based.

Dream-Knife

Banned
4K is not even the standard yet. What kind of console hardware are we looking at to run 8k content. PS5 won't hit native 4k on next gen exclusives. We're looking at 1440p for games, at best.
Ps4 pro was advertised as 4k. I imagine this will likely be the same. PS5 already has 8k on the box though, so who knows.
 

Boss Mog

Member
7nm to 5nm yields 40% better performance alone

So basically 10% performance gains from RDNA3 over 2.

50% increase in power is not enough to justify new Pro consoles. That’s 15 TF vs 10 TF.

PS4 Pro doubled the TF
+50% in performance/watt is completely different than +50% performance and thus it has nothing to do with determining the TF increase of a potential PS5 Pro. The first is about efficiency, squeezing 50% more performance from the same amount of electricity. Nothing says the amount of electricity needed for a PS5 Pro would be the same that is currently needed for PS5. The PS4 Slim has a 165W rated power supply and the PS4 Pro has a 310W rated power supply; so almost double. This is basically because the PS4 Pro had double the number of CUs and raised the clock rates.
 
Last edited:

01011001

Banned
AMD avoiding any talk about RT makes be believe it's gonna be yet again pretty much non-existent...

yeah, doesn't seem like they're all too happy about their RT performance still.

wanna see something sad?
this is a 6600XT playing Watch Dogs Legion at 1080p with Raytracing





and this is a GTX1080ti doing the same thing 😬 without having any RT acceleration





and here's a 3060

 
Last edited:

DenchDeckard

Moderated wildly
RDNA 2 didn't even get chance to flex its muscles.....

Bye, bye rdna 2 welcome RDNA 3.....

Looking forward to seeing these. Chiplets were amazing for Ryzen so I'm interested to see how this pans out for GPUs.

AMD need to sort their raytracing out though or it's a none starter for me.
 
Last edited:

Boss Mog

Member
PS5 is already near the very top end of power consumption for a console
What does that have to do with what I said about not being able to determine the TF of a potential PS5 Pro based on efficiency gains though?

But I'll play along... Notice how I chose to compare the PS4 Pro to the PS4 Slim and not the OG model because the Pro and the Slim came out around the same time on the same reduced node. The OG PS4's power supply was 250W. So the PS4 Pro only had a 24% increase over the OG PS4. Which would put a PS5 Pro on a reduced node at 434W (based on the PS5's 350W) which is still completely doable. There have been more powerful PSUs in console sized PCs before. Of course this is assuming similar gains in terms of extra power versus wattage which can't really be determined either way at this point.
 

winjer

Gold Member
AMD avoiding any talk about RT makes be believe it's gonna be yet again pretty much non-existent...

This is a presentation for investors. Not an architecture deep dive.
In these types of presentations it's never about hardware details. It's about ensuring shareholders about projected growth.

Shareholders care about ROI, not RT.
 
Last edited:
What does that have to do with what I said about not being able to determine the TF of a potential PS5 Pro based on efficiency gains though?

Because those gains are due to a smaller node. Your power drops per transistor, but not power per area of wafer
 
Last edited:

winjer

Gold Member
Because those gains are due to a smaller node. Your power drops per transistor, but not power per area of wafer

You have a knack to make up stats on the fly, constantly with the wrong numbers.

A few things to consider.
AMD is probably going to pick increasing clocks. N7 vs N5 can provide 20% more clocks at the same power envelope.
N5 is 80% more dense in transistors, then N7.
Process node is not the only thing that accounts for transistor count. The libraries can make up a significant difference in space efficiency.
Considering that AMD is revamping it's CU's, library density will be somewhat different from RDNA2.
AMD might add new units, that will use up space. Such as matrix units and more complex RT units. These do not count to TFLOPs count.
There are also tweaks to the architecture, that can increase IPC. And IPC does not show in TFLOPs count.
 
Last edited:

FireFly

Member
You have a knack to make up stats on the fly, constantly with the wrong numbers.

A few things to consider.
AMD is probably going to pick increasing clocks. N7 vs N5 can provide 20% more clocks at the same power envelope.
N5 is 80% more dense in transistors, then N7.
Process node is not the only thing that accounts for transistor count. The libraries can make up a significant difference in space efficiency.
Considering that AMD is revamping it's CU's, library density will be somewhat different from RDNA2.
AMD might add new units, that will use up space. Such as matrix units and more complex RT units. These do not count to TFLOPs count.
There are also tweaks to the architecture, that can increase IPC. And IPC does not show in TFLOPs count.
Well, the IPC + clock improvements would already be included in the performance per watt figures. But yes, saying >50% doesn't tell us where it will land exactly, and there is also 4nm to consider.
 
You have a knack to make up stats on the fly, constantly with the wrong numbers.

A few things to consider.
AMD is probably going to pick increasing clocks. N7 vs N5 can provide 20% more clocks at the same power envelope.
N5 is 80% more dense in transistors, then N7.
Process node is not the only thing that accounts for transistor count. The libraries can make up a significant difference in space efficiency.
Considering that AMD is revamping it's CU's, library density will be somewhat different from RDNA2.
AMD might add new units, that will use up space. Such as matrix units and more complex RT units. These do not count to TFLOPs count.
There are also tweaks to the architecture, that can increase IPC. And IPC does not show in TFLOPs count.

I am not claiming all the gains are due to transistor size, but the architecture gains are small

We are looking at around 15TF at same power envelope. I do not see your post disputing any of this
 
Last edited:
You have no data to claim that either 15TF for anything, or to claim small architecture gains.

There isn't going to be any official data that the PS5 Pro will be 15TF given this latest RDNA 3 info. But I do agree with him that it will most likely be around 15TF at most, and that's not a big enough of a jump for a pro-argument for a Pro version. Hopefully we'll see a Slim version based on 6nm SoC.
 

winjer

Gold Member
There isn't going to be any official data that the PS5 Pro will be 15TF given this latest RDNA 3 info. But I do agree with him that it will most likely be around 15TF at most, and that's not a big enough of a jump for a pro-argument for a Pro version. Hopefully we'll see a Slim version based on 6nm SoC.

Considering the current state of the semi conductor busyness, I doubt we'll see a Pro version anytime soon.
Maybe in time for RDNA4, 2 years from now.
 

winjer

Gold Member
It’s a ballpark figure based on the power data we have, and the size of the shrink

Is it?
Because just from the nod reduction, it would be 80% more transistors. So that would make it 18.5 TFLOPs
Add 20% clock speed and that makes it 22.2 TFLOPs.

And then there might be architectural improvements. These don't show in TFLOPs, just in performance.
And then there is the new WGP, that replaces CU. This is bound to change TFLOP count even more.
 

FireFly

Member
Is it?
Because just from the nod reduction, it would be 80% more transistors. So that would make it 18.5 TFLOPs
Add 20% clock speed and that makes it 22.2 TFLOPs.

And then there might be architectural improvements. These don't show in TFLOPs, just in performance.
And then there is the new WGP, that replaces CU. This is bound to change TFLOP count even more.
All of these things add power though. The question is what could be achieved in the ~200W power budget of the PS5, and if it is possible to scale power consumption further. That's where the performance per watt improvements come in.
 
Is it?
Because just from the nod reduction, it would be 80% more transistors. So that would make it 18.5 TFLOPs
Add 20% clock speed and that makes it 22.2 TFLOPs.

And then there might be architectural improvements. These don't show in TFLOPs, just in performance.
And then there is the new WGP, that replaces CU. This is bound to change TFLOP count even more.

PS4 (28nm) -> PS4 Pro (16nm). 43% reduction in size. 1.84 to 4.2 TF (228%)

PS5 (7nm) -> PS5 Pro (5nm). 29% reduction in size. 10TF to ?? - if similar to PS4 situation, this would be 67% of those gains due to smaller shrink, which is around 15TF.
 
Last edited:

winjer

Gold Member
PS4 (28nm) -> PS4 Pro (16nm). 43% reduction in size. 1.84 to 4.2 TF (228%)

PS5 (7nm) -> PS5 Pro (5nm). 29% reduction in size. 10TF to ?? - if similar to PS4 situation, this would be 67% of those gains due to smaller shrink, which is around 15TF.

Now you making up even more numbers....
 

Akuji

Member
Tower PCs gonna have a harder and harder time.
My last Tower was a 2080ti with a 3900x in a custom water loop.
Sold everything and bought a laptop with a 6800m + 5900hx.
Its not as powerfull but i can use it to play games and have a nice device so programm my home theatre setup.

With a 50% performance/watt gain it will be harder to get me back to a tower pc ...
i still have the whole watercooling system stored safely away ... was intending to build a monster pc whenever Ashes of creation launches ...
not sure if i will do that. USB-C is so versatile. its not long until i can do a 4k120hz dock station and dont even have the fan sound ...
 

winjer

Gold Member

AMD Radeon 7000 “RDNA3” series rumored to launch between late October and mid November


The date is not set in stone, but the information which came from Greymon55’s “a reliable source” suggests Radeon 7000 cards based on RDNA3 architecture should appear between late October to mid-November. A bit sooner, AMD will be unveiling its Ryzen 7000 CPU series based on Zen4 architecture. Therefore, within 2 months time, AMD will shift its entire desktop platform to next-gen hardware.

Should rumors be considered, the deployment of AMD RDNA3 architecture should start with the flagship Navi 31 GPU for Radeon 7900 series, then followed by Navi 33 and Navi 32.

AMD competitors will not let AMD steal all the attention though. Intel will probably be the first to release its highly anticipated Arc desktop graphics cards globally, except no one really knows when. Officially it’s late summer, so just weeks before AMD/NVIDIA switch to even faster GPU architecture. Furthermore, Intel will be launching its 13th Gen Core desktop CPU series codenamed Raptor Lake-S, an updated Intel 7 architecture with increased total CPU core count to 24.

And speaking of NVIDIA, the company is now expected to launch its GeForce RTX 40 series codenamed “Ada Lovelace” starting this September or October. There is still a question whether board partners manage to sell the excess of their RTX 30 inventory before new series deploy. This may affect NVIDIA’s decision either to push back or bring forward the release of Ada GPUs.
 

Rudius

Member
Wait, just perf per watt? No Perf per clock increase? Thats disappointing.

Still, Perf per watt means they can run the PS5 much cooler if they ever decide to make a slim model. I really dont know how they can get a 20 tflops PS5 Pro without going over 250 watts even with this 50% perf per watt increase.

For PCs, this is going to be fun because 6900xt is already only 260 watts or so while the 3080 12 GB models can go up to 400 watts. Even the 10 GB ones can hit 320 watts. So if their 80 CU card comes in around 170 watts they can potentially aim for 100 - 160 CUs and still come under 400 watts. Thats probably how they get to 50 tflops.
They can make PS5 smaller. It is very quiet already, unlike the super noisy Pro, but the size bothered me when I had to move to another city.
 
The following article was published today by Angstronomics who have had a very good track record of leaking AMD's RDNA specifications (apparently). Here's the link : https://www.angstronomics.com/p/amds-rdna-3-graphics



Hopefully the more tech savy folks on here can give us some insights.

What AMD has officially detailed so far about RDNA 3 is yet another significant increase in performance per watt over RDNA 2, with contributions from process node and microarchitectural design choices. However, the design philosophy of gfx11 is all about area, area, area. What is the best way to achieve the performance target with minimal area? The rearchitected Compute Unit and Optimized Graphics Pipeline changes are mostly about trimming the fat in pursuit of the lowest area and cost (example: halving relative FP64 rate to 1/32). As a result of this focus, PPA is significantly increased. In fact, at the same node, an RDNA 3 WGP is slightly smaller in area than an RDNA 2 WGP, despite packing double the ALUs.

OREO
One of the features in the RDNA 3 graphics pipeline is OREO: Opaque Random Export Order, which is just one of the many area saving techniques. With gfx10, the pixel shaders run out-of-order, where the outputs go into a Re-Order Buffer before moving to the rest of the pipeline in-order. With OREO, the next step (blend) can now receive and execute operations in any order and export to the next stage in-order. Thus, the ROB can be replaced with a much smaller skid buffer, saving area.

Infinity Cache Updates
The Memory Attached Last Level (MALL) Cache blocks are each halved in size, doubling the number of banks for the same cache amount. There are also changes and additions that increase graphics to MALL bandwidth and reduce the penalty of going out to VRAM.


Navi3x dGPU Configurations
Now we will go through the specifications of each die configuration of discrete RDNA 3 GPU. To be abundantly clear, these configurations for Navi3x were done in 2019 and finalized sometime in 2020, with no changes since.

Navi 31
  • gfx1100 (Plum Bonito)
  • Chiplet - 1x GCD + 6x MCD (0-hi or 1-hi)
  • 48 WGP (96 legacy CUs, 12288 ALUs)
  • 6 Shader Engines / 12 Shader Arrays
  • Infinity Cache 96MB (0-hi), 192MB (1-hi)
  • 384-bit GDDR6
  • GCD on TSMC N5, ~308 mm²
  • MCD on TSMC N6, ~37.5 mm²
The world’s first chiplet GPU, Navi31 makes use of TSMC’s fanout technology (InFo_OS) to lower costs, surrounding a central 48 WGP Graphics Chiplet Die (GCD) with 6 Memory Chiplet Dies (MCD), each containing 16MB of Infinity Cache and the GDDR6 controllers with 64-bit wide PHYs. The organic fanout layer has a 35-micron bump pitch, the densest available in the industry. There is a 3D stacked MCD also being productized (1-hi) using TSMC’s SoIC. While this doubles the Infinity Cache available, the performance benefit is limited given the cost increase. Thus, the main Navi31 SKU will have 96MB of Infinity Cache (0-hi). This is lower than the 128MB in Navi21. A cut-down SKU will offer 42 WGP and 5x MCD (80MB Cache, 320-bit GDDR6).
The reference card appears to have an updated 3-fan design that is slightly taller than the previous generation, with a distinctive 3 red stripe accent on a section of the heatsink fins near the dual 8-pin connectors.
There were early plans for a version with 288MB of Infinity Cache (2-hi), but this was shelved as the cost-benefit was not worth it.

Navi32
  • gfx1101 (Wheat Nas)
  • Chiplet - 1x GCD + 4x MCD (0-hi)
  • 30 WGP (60 legacy CUs, 7680 ALUs)
  • 3 Shader Engines / 6 Shader Arrays
  • Infinity Cache 64MB (0-hi)
  • 256-bit GDDR6
  • GCD on TSMC N5, ~200 mm²
  • MCD on TSMC N6, ~37.5 mm²
Coming in 2023, Navi32 is a smaller version of Navi31, reusing the same MCDs. Navi32 will also be coming to mobile as a high-end GPU offering in AMD Advantage laptops. There were plans for a 128MB (1-hi) version, however it might not be productized due to the aforementioned costs. Thus Navi32’s 64MB is also smaller than Navi22’s 96MB.

Navi33
  • gfx1102 (Hotpink Bonefish)
  • Monolithic
  • 16 WGP (32 legacy CUs, 4096 ALUs)
  • 2 Shader Engines / 4 Shader Arrays
  • Infinity Cache 32MB
  • 128-bit GDDR6
  • TSMC N6, ~203 mm²
Navi33 is the mobile-first push for AMD. They expect robust sales of AMD Advantage laptops with it, as the design is drop-in compatible with Navi23 PCBs, minimizing OEM board re-spin headaches. They aim to ship more Navi33 silicon for mobile than to desktop AIB cards. The first concepts showed Navi33 as a chiplet design with 18 WGP and 2x MCD, but this could not meet the volume and cost structure of this class of GPU vs a monolithic design.
As an aside, Navi33 outperforms Intel’s top end Alchemist GPU while being less than half the cost to make and pulling less power.

 
i think RDNA 3's main advantage will be on intro level gaming pc's in APU format: Zen 4/RDNA 3 combo, finally replacing garbage iGPU's from intel and establishing a new standard on entry level gaming.

NVIDIA will brute force everything with big power usage, but they dont have iGPU and APU form factor. This I feel is like a 'blue ocean' potential for AMD. I hope AMD succeeds really well. Amazing company with less resources, money yet tackling the server and consumer side of PC's at the same time.
 

kikkis

Member
i think RDNA 3's main advantage will be on intro level gaming pc's in APU format: Zen 4/RDNA 3 combo, finally replacing garbage iGPU's from intel and establishing a new standard on entry level gaming.

NVIDIA will brute force everything with big power usage, but they dont have iGPU and APU form factor. This I feel is like a 'blue ocean' potential for AMD. I hope AMD succeeds really well. Amazing company with less resources, money yet tackling the server and consumer side of PC's at the same time.
9.2 tflops phoenix apu might be sweet deal on laptops, though I still expect it to be kind of expensive.
 
9.2 tflops phoenix apu might be sweet deal on laptops, though I still expect it to be kind of expensive.
Remember RDNA 3 Tflops is not the same as RDNA 2 tflops :messenger_grinning_squinting:

Zen 4/RDNA 3 combo along with PCIE 5.0, USB 4.0, Wifi6E, NVME SSD read/write speeds greater than 5GB/sec with Directstorage API, Smartshift, Infinity Fabric, Infinity Cache, FSR 2.0, ML cores and better RT performance, DDR5 RAM, you got a lot of jam packed goodness!
 
no sure what you wanna say with that but yeah, RDNA 2 cards at the same Tflops performance level are worse than RDNA 1

so maybe RDNA 3 will be even worse

i made assumption that TFLOPS performance would be better because it is newer hardware, plus with the discrete RDNA 3 hitting 3 Ghz, why would it not be better and more gains? Is that really AMD's objective to make TFLOP performance worse with newer hardware?
 
Last edited:

01011001

Banned
i made assumption that TFLOPS performance would be better because it is newer hardware, plus with the discrete RDNA 3 hitting 3 Ghz, why would it not be better and more gains? Is that really AMD's objective to make TFLOP performance worse with newer hardware?

well all I know is that's exactly what happened going from RDNA1 to RDNA2, so who knows.

also on the Nvidia side it happened as well. I think going from RTX20 to RTX30 the performance per TFLOP went down by almost 40%

the RTX2080ti has 13.5 TF
the at best 10% faster RTX3070 has 20 TF
 
Last edited:

Crayon

Member
If there are good, I'm in for an n32 card when they come around. I want to see ray tracing performance in particular. I like effects. If Nvidia makes an rt improvement that maintains such a large gap, I will think twice.

I don't play anything particular on PC that needs more than the card I have, but there are going to be some great games that won't come out on ps so I probably can't avoid an upgrade for all that long.

Torn tho because that might also be time to upgrade the whole machine and waiting as long as I can will save me money when I don't overshoot the spec I need.
 

jaysius

Banned
Doesn’t take long in a tech thread for some obsessive Sony Jingos(OSJ’s) to make it all about PS5 and then start wildly speculating further on things they clearly have no clue about.

It’s entertaining though.

Awkward Jimmy Fallon GIF by The Tonight Show Starring Jimmy Fallon


Sounds like now is not the time to buy a graphics card.
 
Last edited:

Sosokrates

Report me if I continue to console war
It needs to be a 2x jump

What’s the point of releasing a Pro that goes from 30 fps to 45 fps?

They could do the sort of visual improvements we have seen in spiderman on PC, higher res and more detailed reflections, better draw distance etc

But for me it aint worth, I would sticks with PS5/XSX.

J jaysius with my emoji I was not laughing at you, i agree with what u say and if people are doing that its pretty funny.
 
Last edited:
Top Bottom