Support NeoGAF

Vognerful · Jun 19, 2021

PaintTinJr said:
I'm not sure if you've taken any interest in the topic of thread, but both Nanite, and the SW RT of lumen - which is the major advancement - is pixel-rate limited, so where your comparison for old PC and last-gen games - like the Medium - using HW RT will be better than a PS5 running them on those cards.

For UE5 where GPU clock matters and total ROPs matter, even a RTX 3060 TI with 80 ROPs and a 1500Mhz clock will be 30GigaPixels/s short of the PS5 for running those UE5 algorithms - even without optimization for cache scrubbing or the PS5's new power constant boost clock paradigm or IO. So I'm not sure that's safe advice looking ahead.

DLSS is ML guess work, and requires more developer effort akin to offline lightmap baking for work required, and isn't a silver bullet, so is a bit of a fudge if claiming superiority from scaling up native 600p(? or something low) to sort of look correct....

Thanks for the clarification but I never mentioned The Medium in my comparison because I know it doesn't run well on console or PC. But I made the point based on what games like spiderman, watch dog legions and others.

Going back to your comparison, 3060 ti is 133GPixel/s compared to 142 of PS5 (very slight difference). This is assuming all next games will utilize UE5. looking ahead on the subject of my comment, I am taking overall games in general as we know for example PS games do not utilize UE historically and no news on future projects. But I will keep in mind this point when looking for next GPU.

Second thing, I don't get why you took my comparison to old generation in a negative way when the ps5 GPU is constantly compared to AMD RX 5700 or Rx 5700 xt.

PaintTinJr said:
DLSS is ML guess work, and requires more developer effort akin to offline lightmap baking for work required, and isn't a silver bullet, so is a bit of a fudge if claiming superiority from scaling up native 600p(? or something low) to sort of look correct....

and for your comment regarding DLSS, whatever is needed from the developer to the consume to enhance the experience is required. your comment strike weird as I would believe as PS5 user that you would not use the same tone if a 3rd party developer would make similar comment about utilizing PS5 IO to reduce load time or having features on dualsense.

PaintTinJr · Jun 19, 2021

Vognerful said:
Thanks for the clarification but I never mentioned The Medium in my comparison because I know it doesn't run well on console or PC. But I made the point based on what games like spiderman, watch dog legions and others.

Going back to your comparison, 3060 ti is 133GPixel/s compared to 142 of PS5 (very slight difference). This is assuming all next games will utilize UE5. looking ahead on the subject of my comment, I am taking overall games in general as we know for example PS games do not utilize UE historically and no news on future projects. But I will keep in mind this point when looking for next GPU.

Second thing, I don't get why you took my comparison to old generation in a negative way when the ps5 GPU is constantly compared to AMD RX 5700 or Rx 5700 xt.

and for your comment regarding DLSS, whatever is needed from the developer to the consume to enhance the experience is required. your comment strike weird as I would believe as PS5 user that you would not use the same tone if a 3rd party developer would make similar comment about utilizing PS5 IO to reduce load time or having features on dualsense.

You are using the wrong clock data for the ROPs comparison, as the boost clock on a MSRP reference Ti isn't sustained because of power and thermals, whereas the paradigm shift on ps5, - excluding delta benefits of smartshift - is constantly boosting based on the power draw optimisation of the code, which will definitely be realised in the next 6years.

You are talking DLSS for a face-off comparison. Scientific method surely stipulates things are a match, not a +80% good enough guess of native to claim superiority, no? DLSS also wouldn't work for a game like No Man's Sky, because you can't train the model for new unknown procedurally generated data. And anywhere where native resolution determines gameplay, you aren't providing coherent gameplay data for the DLSS output resolution.

Werewolfgrandma · Jun 19, 2021

Md Ray said:

Oh is it different then the other 8 ish Ratchet games I've played? From watching gameplay sure as fuck looks like every other Ratchet game. Or does this one change the formula of break shit and shoot shit with super easy platforming?

Vognerful · Jun 19, 2021

PaintTinJr said:
You are using the wrong clock data for the ROPs comparison, as the boost clock on a MSRP reference Ti isn't sustained because of power and thermals, whereas the paradigm shift on ps5, - excluding delta benefits of smartshift - is constantly boosting based on the power draw optimisation of the code, which will definitely be realised in the next 6years.

You can't be serious here. Are you talking about something that can be done in 6 years? Do you have any actual data what the current clock is running in PS5? Do you know that there are already in the market cards from EVGA with boosted clocks at 1710Mhz?

PaintTinJr said:
You are talking DLSS for a face-off comparison. Scientific method surely stipulates things are a match, not a +80% good enough guess of native to claim superiority, no? DLSS also wouldn't work for a game like No Man's Sky, because you can't train the model for new unknown procedurally generated data. And anywhere where native resolution determines gameplay, you aren't providing coherent gameplay data for the DLSS output resolution.

Ok, I am not sure we are talking the same language here. Where did I say that checkerboarding or normal upscaling in worse than DLSS? My comment was towards having as an additional feature. IF it was not worth it, AMD wouldn't have bothered with coming up with their own solution.

Here, let me make it simple for you. I find playing Control on PC using DLSS as a better experience than playing the ultimate edition on my XSX (and PS% even though I don't have it, but I believe the performance is similar). It is the same thing with Cyberpunk. I don't get your comment. Like really, I have no idea how we jumped from having DLSS or AMD super resolution as a feature on PC (also expected to be on console based on AMD) to comparing it to checkerboarding and other upscaling features.

PaintTinJr · Jun 19, 2021

Vognerful said:
You can't be serious here. Are you talking about something that can be done in 6 years? Do you have any actual data what the current clock is running in PS5? Do you know that there are already in the market cards from EVGA with boosted clocks at 1710Mhz?

What 20m users (10% of the steam )users will have that clock or better? At non reference clocks they aren't at MSRP, and aren't really Ti GPUs anymore, it is constant goal post moving, unlike the PS5 where Cerny has claimed - and many have been band for suggesting 9TF/s - it will maintain the 2.23GHz clock with constant boosting.
But even then, ROPs by 500Mhz more is still different, even factoring out the gains of the cache scrubbers.

Vognerful said:
Ok, I am not sure we are talking the same language here. Where did I say that checkerboarding or normal upscaling in worse than DLSS? My comment was towards having as an additional feature. IF it was not worth it, AMD wouldn't have bothered with coming up with their own solution.

Here, let me make it simple for you. I find playing Control on PC using DLSS as a better experience than playing the ultimate edition on my XSX (and PS% even though I don't have it, but I believe the performance is similar). It is the same thing with Cyberpunk. I don't get your comment. Like really, I have no idea how we jumped from having DLSS or AMD super resolution as a feature on PC (also expected to be on console based on AMD) to comparing it to checkerboarding and other upscaling features.

This thread is about Unreal Engine nanite and Lumen technology - and sadly since the beginning warriors like

DonJuanSchlong have made it about PC troll warring.
Unless the discussion point is native resolutions about the UE5 technologies, which need to be low noise to begin with - for any ML or non-ML upscaling -to be useful, it isn't about those technologies - you were the one that brought DLSS into it, I think it is fine as an additional technology when native techniques are pushed resolution, but not as a 600p or 900p > 1400p or 4K moved goal posts solution.

Vognerful · Jun 19, 2021

PaintTinJr said:
What 20m users (10% of the steam )users will have that clock or better?

Absolutely has nothing to do with the discussion the number of people who will have this clock or GPU. We are not discussing popularity contest.

PaintTinJr said:
it is constant goal post moving,

unless you explain how it is, you are the only one shifting goal posts.

PaintTinJr said:
At non reference clocks they aren't at MSRP, and aren't really Ti GPUs anymore,

What a fucking goal shifting is this.

PaintTinJr said:
It is constant goal post moving, unlike the PS5 where Cerny has claimed - and many have been band for suggesting 9TF/s - it will maintain the 2.23GHz clock with constant boosting.
But even then, ROPs by 500Mhz more is still different, even factoring out the gains of the cache scrubbers.

You have not shown why we should not accept the claims of their GPU manufacturers regarding their boosted clock frequency.

PaintTinJr said:
This thread is about Unreal Engine nanite and Lumen technology - and sadly since the beginning warriors like DonJuanSchlong have made it about PC troll warring.
Unless the discussion point is native resolutions about the UE5 technologies, which need to be low noise to begin with - for any ML or non-ML upscaling -to be useful, it isn't about those technologies - you were the one that brought DLSS into it, I think it is fine as an additional technology when native techniques are pushed resolution, but not as a 600p or 900p > 1400p or 4K moved goal posts solution.

So, other than keep accusing me of something you are doing, we knew that I was not talking about DLSS with regards to UE5 and that the discussion I was responding to was not exactly in response to UE5 performance on consoles or PC. You can read it! it was not about using it on UE5! Why are we back into this.

Do you want to discuss it further with regard to UE5 or do you want to go into another side discussion? you can always decide not to engage with people that you feel have bad faith.

DonJuanSchlong · Jun 19, 2021

Did

PaintTinJr accuse me of trolling? I've been on topic of PC the whole thread. You on the other hand and your console warriors friends have shown us y'all have absolutely no clue what you are talking about. You can't even get simple things like boost click speeds right, as I can hold over 2.1ghz all the time, any day, rain or snow.

Instead of trying to appear smart about things you have no idea about, try doing a 5 second Google search instead of making up your own hypothesis.

Let's go further back, and look at the thread title. This thread is about PC if you haven't noticed yet, but then again, based on your post's like this

PaintTinJr said:
Obviously there could be upsides to PlayStation customers - as well as massive negatives - but my main reason for being against this, is that it makes Cerny look like a liar - and in fairness, if he moved to consult on another console system for some else, with the state of meta-driven policy on PlayStation, I'd probably look at that system seriously as my main gaming option.

Pathetic console warrior doesn't want Sony games to come to PC, as he doesn't want Cerny to look bad. No wonder he has to try and turn this thread into a Sony thread.

staticshock · Jun 19, 2021

Rea said:
Watch this video, the guy said they are the same level but different part of the map.

Because World Partition and Data Layers replaces the old level streaming, that's the entire point.
So you don't need 10, 20, 30 different levels/sub-levels. Each data layer can act as a different level. You simply unload and load data layers dynamically at runtime.

World Partition | Unreal Engine Documentation
World Partition - Data Layers | Unreal Engine Documentation

PaintTinJr · Jun 19, 2021

DonJuanSchlong said:
Pathetic console warrior doesn't want Sony games to come to PC, as he doesn't want Cerny to look bad. No wonder he has to try and turn this thread into a Sony thread.

I want a reason to buy all platforms - including buying a new cpu and gpu for my PC -driven by hw design leading to bespoke software design.

Freeloading on software libraries paid for by others' investment in a platform is akin to drinking in rounds and failing to pay your way. Port begging from successful platforms you can buy is pathetic IMHO - but each to their own - as it just makes game development less ambitious and more generic at the design stage.

PaintTinJr · Jun 19, 2021

DonJuanSchlong said:
Did PaintTinJr accuse me of trolling? I've been on topic of PC the whole thread. You on the other hand and your console warriors friends have shown us y'all have absolutely no clue what you are talking about. You can't even get simple things like boost click speeds right, as I can hold over 2.1ghz all the time, any day, rain or snow.

Good for you. But at reference TDP and out of the box cooling on any other PC too, or are we back to talking about projecting your 1 config as a minimum experience anyone buying a 2060 super - oh no, goal post moved has it at 3060, or is that a ti with further OC now - will get?

DonJuanSchlong · Jun 19, 2021

PaintTinJr said:
I want a reason to buy all platforms - including buying a new cpu and gpu for my PC -driven by hw design leading to bespoke software design.

Freeloading on software libraries paid for by others' investment in a platform is akin to drinking in rounds and failing to pay your way. Port begging from successful platforms you can buy is pathetic IMHO - but each to their own - as it just makes game development less ambitious and more generic at the design stage.

Here's a Fun Fact for you, that is going to really blow your limited mindset as a console warrior. First of all, no one besides console warriors use the phrase "port begging". You can't make yourself look any more biased, even by trying to play the innocent card. We have all seen your posts on here, there's no trying to hide it.

Secondly, "freeloading"? Another reason why it's easy to label you as a biased console warrior. How is it freeloading when Sony can sell software and services to either A) It's own user base of 100+ million or B) it's own user base of 100+ million as well as well over 100+ million on PC? Did the console players collectively help develop the game or something? That's a weird take you got there...

Let's try and leave emotions out of the conversation and stick to strictly facts. Gamers don't want to prohibit other gamers from playing games. That's a really shitty and petty personality trait to posses.

PaintTinJr said:
Good for you. But at reference TDP and out of the box cooling on any other PC too, or are we back to talking about projecting your 1 config as a minimum experience anyone buying a 2060 super - oh no, goal post moved has it at 3060, or is that a ti with further OC now - will get?

Nvidia lists their GPU's with the absolute minimum clocks it can run at, under any circumstances. With literally any scenario, the actual clocks are much higher than what is listed on the box. You can ask literally anyone in here who has a PC, whether Nvidia or AMD GPU's. You can go on YouTube to see this. Even people who don't know much about PC's know this. But here you are, trying to persuade others, otherwise. Shame on you.

If you are unsure of how things work, simply ask us, and you'll get the answers. But don't go spreading false made up narratives. It's disingenuous to the topic at hand, and to people like yourself who are also unaware of how clockspeeds work on GPU's for the past decade.

PaintTinJr · Jun 19, 2021

DonJuanSchlong said:
Here's a Fun Fact for you, that is going to really blow your limited mindset as a console warrior. First of all, no one besides console warriors use the phrase "port begging". You can't make yourself look any more biased, even by trying to play the innocent card. We have all seen your posts on here, there's no trying to hide it.

Secondly, "freeloading"? Another reason why it's easy to label you as a biased console warrior. How is it freeloading when Sony can sell software and services to either A) It's own user base of 100+ million or B) it's own user base of 100+ million as well as well over 100+ million on PC? Did the console players collectively help develop the game or something? That's a weird take you got there...

Let's try and leave emotions out of the conversation and stick to strictly facts. Gamers don't want to prohibit other gamers from playing games. That's a really shitty and petty personality trait to posses.

PlayStation's studios are only in a position to make any software because they have platform specific success that pays their wages. That's the business model. Not PC gamers wanting AAA game development but spending that console money on over priced GPUs that fund zero AAA software development in its entirety. Without console buyers who will fund the software at that production level? No AAA PC exclusive segment exists aa a viable market.

DonJuanSchlong said:
Nvidia lists their GPU's with the absolute minimum clocks it can run at, under any circumstances. With literally any scenario, the actual clocks are much higher than what is listed on the box. You can ask literally anyone in here who has a PC, whether Nvidia or AMD GPU's. You can go on YouTube to see this. Even people who don't know much about PC's know this. But here you are, trying to persuade others, otherwise. Shame on you.

If you are unsure of how things work, simply ask us, and you'll get the answers. But don't go spreading false made up narratives. It's disingenuous to the topic at hand, and to people like yourself who are also unaware of how clockspeeds work on GPU's for the past decade.

And overclocking shortens lifespan and potentially invalidates warranties. It is hardly an equivalence of fixed cost, fixed spec consoles - to claim any valid victory, is it?

In 6years do you expect a regular alienware pc with today's 8 core ryzen cpu, standard ssd and a rtx 3060ti to outperform a ps5 with UE5 games ? I don't. I suspect the difference at a minimum will be 8k textures on ps5 and 4k on that pc.

DonJuanSchlong · Jun 19, 2021

PaintTinJr said:
PlayStation's studios are only in a position to make any software because they have platform specific success that pays their wages. That's the business model. Not PC gamers wanting AAA game development but spending that console money on over priced GPUs that fund zero AAA software development in its entirety. Without console buyers who will fund the software at that production level? No AAA PC exclusive segment exists aa a viable market.

And overclocking shortens lifespan and potentially invalidates warranties. It is hardly an equivalence of fixed cost, fixed spec consoles - to claim any valid victory, is it?

In 6years do you expect a regular alienware pc with today's 8 core ryzen cpu, standard ssd and a rtx 3060ti to outperform a ps5 with UE5 games ? I don't. I suspect the difference at a minimum will be 8k textures on ps5 and 4k on that pc.

Did you know that Sony loses money on hardware? They aren't like Apple, who is profitable from the hardware, software, and services. Let's debunk that out of the gate.

From my perspective, a ps5 is overpriced to get to play no games that I'm interested in at the moment. Bloodborne didyn't run any better than ps4, and it's still limited to 30fps. And a ps4 is overpriced for that matter as well to me.

Plus the entire gaming market is shifting towards software and services. That's why you see Sony jumping on board late to the party, as they want extra money, to do what? FUND THEIR GAMING STUDIOS.

Boost clock isn't overflowing per se. It's running within operating ranges. You can overclock your card if you choose to do so. But automatic, standard boost clocking isn't the same. It does it without the user touching anything. Unlike how ps5 isn't always at it's maximum clocks all the time, when it does reach them, it's not considered overclocking either.

You can hypothesize all you want, but for right now that 3060 runs games better than ps5. And with DLSS, I'm not sure your guesswork is correct. It hasn't been in like 90% of your posts in this thread, so we'll see.

PaintTinJr · Jun 19, 2021

DonJuanSchlong said:
Did you know that Sony loses money on hardware? They aren't like Apple, who is profitable from the hardware, software, and services. Let's debunk that out of the gate.

From my perspective, a ps5 is overpriced to get to play no games that I'm interested in at the moment. Bloodborne didyn't run any better than ps4, and it's still limited to 30fps. And a ps4 is overpriced for that matter as well to me.

Clearly you maining PC means you don't really get the discrete gens. But that is a faithful bc last-gen experience. It costs what it costs

DonJuanSchlong said:
Boost clock isn't overflowing per se. It's running within operating ranges. You can overclock your card if you choose to do so. But automatic, standard boost clocking isn't the same. It does it without the user touching anything. Unlike how ps5 isn't always at it's maximum clocks all the time, when it does reach them, it's not considered overclocking either....

Ps5 is a paradigm shift, go watch the DF post Road to PS5 Cerny interview where Richard doesn't understand the paradigm shift either. Optimisation of code for power will keep the ps5 clock constantly boosting.

DonJuanSchlong · Jun 19, 2021

PaintTinJr said:
Clearly you maining PC means you don't really get the discrete gens. But that is a faithful bc last-gen experience. It costs what it costs

Ps5 is a paradigm shift, go watch the DF post Road to PS5 Cerny interview where Richard doesn't understand the paradigm shift either. Optimisation of code for power will keep the ps5 clock constantly boosting.

Look man, everyone has preferences. That's why you have warriors on both sides, Xbox and playstation, which bring vastly different games to the table. You might prefer one and not the other, and vise versa. All of the games that I care to play, are on my platform of choice, as PC is hardware agnostic, as well as an open platform.

For those who are dying to play TLOU2, Bloodborne, and the rest of Sony catalog, will soon be able to on PC. That's the beauty of the platform. And no matter how much you are against it, it's gonna happen, and it'll fund more games in the future for PC and ps5/6 releases.

So you are ok with ps5 boosting, but fail to see how Nvidia and AMD do it consistently? This is where you are being biased, yet again. Wake up and open your eyes up to the reality.

PaintTinJr · Jun 20, 2021

DonJuanSchlong said:
Look man, everyone has preferences. That's why you have warriors on both sides, Xbox and playstation, which bring vastly different games to the table. You might prefer one and not the other, and vise versa. All of the games that I care to play, are on my platform of choice, as PC is hardware agnostic, as well as an open platform.

For those who are dying to play TLOU2, Bloodborne, and the rest of Sony catalog, will soon be able to on PC. That's the beauty of the platform. And no matter how much you are against it, it's gonna happen, and it'll fund more games in the future for PC and ps5/6 releases.

It does feel like the move is already a done deal, but IMO the impact to PlayStation will be more about the reason why, and if they are abandoning their unique selling point for PS5 like Xbox have done.

I sort of get it with most of the PS4 games, because IMO, the PS4 was the least interesting hardware of any PlayStation and produced the least interesting catalogue of PlayStation games to justify the console hardware - even if I did enjoy quite a few and some like Dreams were a revelation. Porting slow HDD based PS4 games to PC, as is, sort of makes sense, when the PS4 - because of losses in the PS3 gen - was a poor PC in all fairness that punched above its weight. But for me, if they go all-in like Xbox and completely waste the esoteric hw design features of the PS5, and design PS5 games around generic PC porting then I'll just go back to PC as my main platform if Jim and Hulst survive in their jobs with that anti-PlayStation-console strategy.

DonJuanSchlong said:
So you are ok with ps5 boosting, but fail to see how Nvidia and AMD do it consistently? This is where you are being biased, yet again. Wake up and open your eyes up to the reality.

You mentioned earlier that you can overclock the GPU 2.1Ghz, but how did you arrive at that figure. and what was the exact frequency? Did you benchmark each change to arrive at that figure? Or is it possible that a lower frequency would have yielded a higher benchmark score, and despite your clock being stable it may have increased GPU error rates that are being corrected saliently? DF did an article about degraded performance from higher clocks on lower cards IIRC to try and suggest Cerny was wrong.

The reason I ask, is because the PS5 boosting is a paradigm shift, and deterministically provides higher performance when fully boosted at 2.23Ghz, and lower when the GPU deterministically drops clock to operate at - near - constant power draw when occupancy is greater, or when GPU controller determines a job will exceeded the available power.

Your GPU is reacting to compute demand, and error rates in various areas - with such a boosted clock - will be unpredictable across the full array of work it will do in a game - as it is a hobbyist style overclock with little more insight than anecdotal fps numbers or benchmark scores from rudimentary tests, at best. By comparison, the PS5 GPU has been simulated fully at the design stage for all valid workloads and a specific optimal boost clock has been choose for each workload based on trying to maintain static power use and not age the silicon prematurely - so when unoptimized code is causing the clock controller to pre-emptively drop the clock, the same re-optimised code, probably because of lower occupancy or reduced instruction use via code factorization, can then be boosted fully yielding better performance. The same reoptimized code running on an 80ROP 3060 ti gains no performance, but just uses less power.

So yes, I'm okay with paradigm shifting PS5 boosting.

DonJuanSchlong · Jun 20, 2021

PaintTinJr said:
It does feel like the move is already a done deal, but IMO the impact to PlayStation will be more about the reason why, and if they are abandoning their unique selling point for PS5 like Xbox have done.

I sort of get it with most of the PS4 games, because IMO, the PS4 was the least interest hardware of any PlayStation and produced the least interesting catalogue of PlayStation games to justify the console hardware - even if I did enjoy quite a few and some like Dreams were a revelation. Porting slow HDD based PS4 games to PC, as is, sort of makes sense, when the PS4 - because of losses in the PS3 gen - was a poor PC in all fairness that punched above its weight. But for me, if they go all-in like Xbox and completely waste the esoteric hw design features of the PS5, and design PS5 games around generic PC porting then I'll just go back to PC as my main platform if Jim and Hulst survive in their jobs with that anti-PlayStation-console strategy.

You mentioned earlier that you can overclock the GPU 2.1Ghz, but how did you arrive at that figure. and what was the exact frequency? Did you benchmark each change to arrive at that figure? Or is it possible that a lower frequency would have yielded a higher benchmark score, and despite your clock being stable it may have increased GPU error rates that are being corrected saliently? DF did an article about degraded performance from higher clocks on lower cards IIRC to try and suggest Cerny was wrong.

The reason I ask, is because the PS5 boosting is a paradigm shift, and deterministically provides higher performance when fully boosted at 2.23Ghz, and lower when the GPU deterministically drops clock to operate at - near - constant power draw when occupancy is greater, or when GPU controller determines a job will exceeded the available power.

Your GPU is reacting to compute demand, and error rates in various areas - with such a boosted clock - will be unpredictable across the full array of work it will do in a game - as it is a hobbyist style overclock with little more insight than anecdotal fps numbers or benchmark scores from rudimentary tests, at best. By comparison, the PS5 GPU has been simulated fully at the design stage for all valid workloads and a specific optimal boost clock has been choose for each workload based on trying to maintain static power use and not age the silicon prematurely - so when unoptimized code is causing the clock controller to pre-emptively drop the clock, the same re-optimised code, probably because of lower occupancy or reduced instruction use via code factorization, can then be boosted fully yielding better performance. The same reoptimized code running on an 80ROP 3060 ti gains no performance, but just uses less power.

So yes, I'm okay with paradigm shifting PS5 boosting.

So the ps5 overclocks up to 2.23ghz is what you are saying?

PaintTinJr · Jun 20, 2021

DonJuanSchlong said:
So the ps5 overclocks up to 2.23ghz is what you are saying?

No, the system constantly boosts to that level unless a workload on a given cycle would exceed the constant power available and drop the boost clock.

The clock changes something stupid like 15 times per second IIRC. The paradigm shift is interesting and almost certainly coming to AMD GPUs in the future.

DonJuanSchlong · Jun 20, 2021

PaintTinJr said:
No, the system constantly boosts to that level unless a workload on a given cycle would exceed the constant power available and drop the boost clock.

The clock changes something stupid like 15 times per second IIRC. The paradigm shift is interesting and almost certainly coming to AMD GPUs in the future.

So why are you trying to imply Nvidia and AMD's boost clocks, are considered overclocking? I get it that you don't want to make Cerny look bad, as you've mentioned earlier. But I don't think Cerny would even want one of his students to be sprouting false premises. Overclocking a GPU would be manually going past the boost clock ranges.

In essence, if ps5 reaching 2.23ghz is not considered overclocking, the same applies to PC boost clocks.

My most important question, do you ever get tired of being wrong, trolling threads you don't really understand, and spreading false information? I just don't get what benefit you have from being wrong all the time? Cerny paying you to shill with these anti PC campaigns?

Lastly, can we get back on topic? If you are just gonna shill for Sony, there's thousands of other threads for that.

PaintTinJr · Jun 20, 2021

DonJuanSchlong said:
So why are you trying to imply Nvidia and AMD's boost clocks, are considered overclocking? I get it that you don't want to make Cerny look bad, as you've mentioned earlier. But I don't think Cerny would even want one of his students to be sprouting false premises. Overclocking a GPU would be manually going past the boost clock ranges.

In essence, if ps5 reaching 2.23ghz is not considered overclocking, the same applies to PC boost clocks.

I don't know if that's just word play for your narrative like Richard at DF, or you really don't get it.

DonJuanSchlong said:
My most important question, do you ever get tired of being wrong, trolling threads you don't really understand, and spreading false information? I just don't get what benefit you have from being wrong all the time? Cerny paying you to shill with these anti PC campaigns?

Lastly, can we get back on topic? If you are just gonna shill for Sony, there's thousands of other threads for that.

You are the one making outlandish claims about pixel-rate and warrior-ing for a PC - without bringing receipts, I might add about how you got to your convenient 2.1Ghz fix clock overclock. The pixel-rate is on topic because nanite and lumen's SW RT performance are tied to it and PS5's constant boost clock has been officially benchmarked with ms breakdowns by Epic at a 1404p@30(not a typo) while mesh distance field GI is configured for 1km than the default 200m.

ZywyPL · Jun 20, 2021

Some people just took the wrong pill I guess.

FireFly · Jun 20, 2021

PaintTinJr said:
The clock changes something stupid like 15 times per second IIRC. The paradigm shift is interesting and almost certainly coming to AMD GPUs in the future.

AMD already throttles based on power budgets. The PS5 will have code specially written for the boosting algorithm, but in the PC space power limits are higher anyway. To illustrate, the 6700 XT averages 2489 MHz, according to TechpowerUp, across their suite of 23 games.

AMD Radeon RX 6700 XT Review

The AMD Radeon RX 6700 XT introduces the new Navi 22 GPU, which is optimized to take the fight to NVIDIA in the $500 segment. The RX 6700 XT in our review beats the RTX 3060 Ti with ease and achieves performance that rivals the more expensive RTX 3070, with lower fan noise.

www.techpowerup.com

(Not that this has anything to do with the original discussion)

DonJuanSchlong · Jun 20, 2021

PaintTinJr said:
I don't know if that's just word play for your narrative like Richard at DF, or you really don't get it.

You are the one making outlandish claims about pixel-rate and warrior-ing for a PC - without bringing receipts, I might add about how you got to your convenient 2.1Ghz fix clock overclock. The pixel-rate is on topic because nanite and lumen's SW RT performance are tied to it and PS5's constant boost clock has been officially benchmarked with ms breakdowns by Epic at a 1404p@30(not a typo) while mesh distance field GI is configured for 1km than the default 200m.

Again, wtf are you talking about? What receipts? It's common knowledge that GPU's can hold their boost clocks. Who needs receipts for that, when it's common knowledge

.

You are the one trying to say there are more errors and all this bullshit with no receipts. You can't even prove ps5 isn't "overclocking" to 2.23ghz, or maintains it, yet you consider boost clock = overclocking only when it comes to PC.

The difference between boost clock vs overclock

The boost clock is the boost of the processing speed done by the GPU automatically itself to provide high performance under heavy load and overclocking means a manual increase in the speed which pushes the GPU up to its ultimate limits.

The PC ran a more complex version of the demo, being that it's in the editor. With all of the updates, in sure the resolution would be ~1080p on ps5 today. Earlier in this thread, someone quoted YOU and broke down why the PC version is so much more complex, yet here you are spreading false narratives, yet again. Stop trolling this thread please. You can praise ps5 elsewhere, but instead of you try and turn every PC thread about ps5. You did this in all of the other unreal discussions.

PaintTinJr · Jun 20, 2021

FireFly said:
AMD already throttles based on power budgets. The PS5 will have code specially written for the boosting algorithm, but in the PC space power limits are higher anyway. To illustrate, the 6700 XT averages 2489 MHz, according to TechpowerUp, across their suite of 23 games.

AMD Radeon RX 6700 XT Review

The AMD Radeon RX 6700 XT introduces the new Navi 22 GPU, which is optimized to take the fight to NVIDIA in the $500 segment. The RX 6700 XT in our review beats the RTX 3060 Ti with ease and achieves performance that rivals the more expensive RTX 3070, with lower fan noise.

www.techpowerup.com

(Not that this has anything to do with the original discussion)

The previous paradigm of throttling is reactive. The PS5 constant boosting is pro-active clock selection for a workload - so no massive latency between clock switching.

As an example - the RaspberryPi 4 can be used as - a Minecraft server and has a tick-rate that indicates the speed of world updates, and is largely tied to clock-rate (of the CPU cores in this case). The RPi4 is effectively a 800Mhz ARM processor with an in-built firmware boost mode to 1400Mhz - which a cutdown Minecraft server needs to run - but under normal serving load the system will thermally throttle enough for the tick-rate to exceed the condition by which the server assumes it has crashed - and even though it will eventually comeback in thermal/power range to return to 1400Mhz it shutsdown- unless using a ample power source and active cooling with a heatsink and fan.

The GPUs still work like the RPi4 for reactive clocking, the PS5 doesn't. This was a topic that was discussed heavily in the next-gen thread and is surprisingly still needing to be explained.

FireFly · Jun 20, 2021

PaintTinJr said:
The previous paradigm of throttling is reactive. The PS5 constant boosting is pro-active clock selection for a workload - so no massive latency between clock switching.

So what information is the clock selection based on? And where can I find the breakdown of how the boost algorithm works?

PaintTinJr · Jun 20, 2021

FireFly said:
So what information is the clock selection based on? And where can I find the breakdown of how the boost algorithm works?

Sadly, there is limited info, just the info in the Road to PS5 (RtPS5) 2020 GDC talk by Mark Cerny, and then more info in the post RtPS5 interview he gave DF in 2020 in which he explains the paradigm shift - an interview that Richard idiotically tries to insinuate he's wrong and lying, otherwise we might have got more info.

edit: obviously anyone with access to a PS5 devkit will have access to all that NDA info.

FireFly · Jun 20, 2021

PaintTinJr said:
Sadly, there is limited info, just the info in the Road to PS5 (RtPS5) 2020 GDC talk by Mark Cerny, and then more info in the post RtPS5 interview he gave DF in 2020 in which he explains the paradigm shift - an interview that Richard idiotically tries to insinuate he's wrong and lying, otherwise we might have got more info.

edit: obviously anyone with access to a PS5 devkit will have access to all that NDA info.

Cerny said:

"So instead of using the temperature of the die, we use an algorithm in which the frequency depends on CPU and GPU activity information. That keeps behaviour between PS5s consistent."

PlayStation 5 uncovered: the Mark Cerny tech deep dive

On March 18th, Sony finally broke cover with in-depth information on the technical make-up of PlayStation 5. Expanding …

www.eurogamer.net

How do we know AMD is not already using the same activity information in their boosting algorithm?

PaintTinJr · Jun 20, 2021

FireFly said:
Cerny said:

"So instead of using the temperature of the die, we use an algorithm in which the frequency depends on CPU and GPU activity information. That keeps behaviour between PS5s consistent."

PlayStation 5 uncovered: the Mark Cerny tech deep dive

On March 18th, Sony finally broke cover with in-depth information on the technical make-up of PlayStation 5. Expanding …

www.eurogamer.net

How do we know AMD is not already using the same activity information in their boosting algorithm?

Probably because the graph from the 6700XT article you linked shows it doesn't use - near - constant power, and spikes for 20ms durations that far exceed 1/15 frame or second(I forget how fast it was changing) .

clintar · Jun 23, 2021

clintar said:
Um, big issue I'm seeing with that example there. I've never played doom eternal, so I don't know what initial loads are like, but how is the GPU memory already pretty much full before loading unless everything is already in GPU memory? Really would like to see inital game load times.

Really, can someone show me initial load times on this game?

Hoddi ? I have looked all over youtube, but my search-fu must be pretty bad these days, cause I think I found one video that was like 24 seconds, but no idea what storage the guy had...

LiquidRex · Aug 7, 2021

WIPO - Search International and National Patent Collections

Some new Mark Cerny GPU based patents

onesvenus · Aug 7, 2021

LiquidRex said:
WIPO - Search International and National Patent Collections

WIPO - Search International and National Patent Collections

Some new Mark Cerny GPU based patents

One of those patents refers to using multiple GPUs and the other one to stream only the needed LOD level. Why do you think they relate somehow to UE?

Bo_Hazem · Aug 7, 2021

onesvenus said:
One of those patents refers to using multiple GPUs and the other one to stream only the needed LOD level. Why do you think they relate somehow to UE?

The guy loves these things and after closing the tech thread he's been kinda lost.

LiquidRex · Aug 8, 2021

onesvenus said:
Bo_Hazem said:

The guy loves these things and after closing the tech thread he's been kinda lost.

Click to expand...

Very true...

Lethal01 · Aug 10, 2021

onesvenus · Aug 11, 2021

Briank Karis presented at Siggraph yesterday with the title "A Deep Dive into Nanite Virtualized Geometry"
You can find the slides here

I've done a quick summary here. It's a long long but really interesting talk. They combine a lot of existing techniques to get something that has never been done before.

The dream: Using film quality source art

Voxels and implicit surfaces have a lot of potential advantages and are the most discussed direction to solve the problem
- Data size problem: A 2M poly mesh gets converted to 13M sparse SDF voxels and it looks blobby
- UV seams problems
- Features vanishing when they are thinner than a voxel
- Might be possible but need many years of research
Subdivision surfaces
- Great for up close but still connects artist authoring choices with rendering cost
Displacement maps
- Could capture displacement like normal maps now and the low poly could be even lower
- There are geometries which can't be displaced (i.e. a sphere into a torus)
- Great for up close but not good enough for general purpose simplification
Point based rendering
- Either massive amounts of overdraw or hole filling required (how do we know if a hole should be there or it's a hole that should be filled?)

Nanite

Triangles are the core of Nanite
GPU driven pipeline
Triangle cluster culling: Group 128 triangles into clusters and build a bounding box for each cluster. Cull cluster based on their bounds.
- No need to draw more triangles than pixels. Draw the same number of clusters every frame regardless of how many objects or how dense they are
- Cost of rendering the geometry should scale with screen resolution, not screen complexity.
- LOD using a hierarchy of clusters where parents are simplified versions of their children
- At run time find a cut of the tree that matches the desired cost. Different parts of the same mesh can be at different LODs based on what's needed. A parent will be drawn instead of its children if you can't tell the difference (in pixels) from a given POV
- No need to have all the clusters in memory but need to have all the tree structure in memory. Requests data based on demand -> If need children and not in RAM, request them from disk. If they are in RAM but haven't been drawn in a while, evict them.
- LOD cracks when boundaries not match -> Group clusters that need to have the same LOD during build
- Only change LODs that have less than 1 pixel of error -> TAA smoothes out the difference making the LOD pop-in imperceptible
- Meshes with less than 128 triangles (i.e. a wall section of a building seen from very far away) can't be culled because it could mean entire buildings vanishing
  - Use static imposters -> Can produce pop when many instances of the same mesh are next to one another (like repeating wall sections)
Occlusion culling: Both against the frustum as well as occlusion cull done against a hierarchical z-buffer (HZB) -> Calculate a screen rect from the bounds, find the lowest mip where the rect is 4x4 pixels and test if it's ocluded.
- Reproject previous frame's z-buffer into current frame -> Approximate
- 2 pass occlusion culling:
  - Draw what was visible last frame
  - Build HZB from that
  - Test the HZB to determine what's visible now but wasn't the last frame and draw anything that's new
Decouples geometry from materials
- REYES overshades 4x or more
- Deferred materials via visibility buffer
  - Write the smallest amount of geometry data to the screen in form of depth, instance id and triangle id
  - Per-pixel material shader:
    - Loads the visibility buffer
    - Loads the triangle data
    - Transforms the 3vertex positions to the screen
    - Derive the barycentric coordinates for this pixel
    - Load and interpolate the vertex attributes
  - Lots of cache hits and no overdraw
- Can draw all opaque geometry with a single draw call
- CPU cost is independent from number of objects in the scene or in view
- Triangles are only rasterized once
Pixel scale detail
- Want zero perceptual loss of detail
- Software rasterization 3x faster than hardware one on average compared to their fastest primitive shader implementation
  - Hw rasterizers are optimized for triangles that cover many pixels
  - Hw can be built to do the same thing they do but it's questionable if that's the best use of transistors vs giving more CU
  - Need to z-buffer in software vs using ROPs and depth-test hardware
- Use the HW rasterizer for triangles more than 32 pixels long
- Decide on SW of HW rasterizer per cluster based on which will be faster
Shadows
- Can't ray trace them because there are more shadow rays than primary rays since there are more than 1 light per pixel on average
- 16K virtual shadow maps -> Mip mapped to 1 texel per pixel
Streaming
- Stream entire groups of clusters to guarantee no cracks in the geometry
- Fixed size memory pages with groups minimizing the number of pages that are likely to be needed at runtime
- The root page of a cluster group (i.e. the most simplified LOD) is always in memory to always have something to render
- During the cluster hierarchy traversal determine if child clusters would have been rendered if they were resident.
- An object can request data from all the levels it needs for target quality

There's a lot more information on how to group clusters, how the geometry is simplified, and how the materials are culled in the presentation.

Some things worth noting:
- The streaming section where they discuss how they don't stream individual polygons and the compression section where they say disk access doesn't support random access of the Nanite geometry. I hope the discussion about reading single polygons from the disk when needed ends here

- Brian explicitly says they are not rendering 20 billion triangles but 25M. Those 20 billion triangles being thrown here multiple times is the source geometry before importing them to Nanite. Across the demo that 25M triangles is fairly constant.

And a couple of images with data:

Fafalada · Aug 11, 2021

Well this should end debates about whether nanite is storing discrete LODs. It does, and much like I suggested analogy is closest to texture LODs (albeit not exact).

sinnergy · Aug 11, 2021

Off course it's closer to 25 million, that's about the render budget of polygons you can show in a frame.. on todays hardware at 30 frames??, like I said months ago.

But it's still pretty cool, this is what I could draw in a project, offline on a workstation in 2012 but at 20 frames

, so look how far we have come, real-time.

Corndog · Aug 11, 2021

onesvenus said:
Briank Karis presented at Siggraph yesterday with the title "A Deep Dive into Nanite Virtualized Geometry"
You can find the slides here

I've done a quick summary here. It's a long long but really interesting talk. They combine a lot of existing techniques to get something that has never been done before.

The dream: Using film quality source art

Voxels and implicit surfaces have a lot of potential advantages and are the most discussed direction to solve the problem

Data size problem: A 2M poly mesh gets converted to 13M sparse SDF voxels and it looks blobby

UV seams problems

Features vanishing when they are thinner than a voxel

Might be possible but need many years of research

Subdivision surfaces

Great for up close but still connects artist authoring choices with rendering cost

Displacement maps

Could capture displacement like normal maps now and the low poly could be even lower

There are geometries which can't be displaced (i.e. a sphere into a torus)

Great for up close but not good enough for general purpose simplification

Point based rendering

Either massive amounts of overdraw or hole filling required (how do we know if a hole should be there or it's a hole that should be filled?)

Nanite

Triangles are the core of Nanite

GPU driven pipeline

Triangle cluster culling:Group 128 triangles into clusters and build a bounding box for each cluster. Cull cluster based on their bounds.

No need to draw more triangles than pixels. Draw the same number of clusters every frame regardless of how many objects or how dense they are

Cost of rendering the geometry should scale with screen resolution, not screen complexity.

LOD using a hierarchy of clusters where parents are simplified versions of their children

At run time find a cut of the tree that matches the desired cost. Different parts of the same mesh can be at different LODs based on what's needed. A parent will be drawn instead of its children if you can't tell the difference (in pixels) from a given POV

No need to have all the clusters in memory but need to have all the tree structure in memory. Requests data based on demand -> If need children and not in RAM, request them from disk. If they are in RAM but haven't been drawn in a while, evict them.

LOD cracks when boundaries not match -> Group clusters that need to have the same LOD during build

Only change LODs that have less than 1 pixel of error -> TAA smoothes out the difference making the LOD pop-in imperceptible

Meshes with less than 128 triangles (i.e. a wall section of a building seen from very far away) can't be culled because it could mean entire buildings vanishing

Use static imposters -> Can produce pop when many instances of the same mesh are next to one another (like repeating wall sections)

Occlusion culling:Both against the frustum as well as occlusion cull done against a hierarchical z-buffer (HZB) -> Calculate a screen rect from the bounds, find the lowest mip where the rect is 4x4 pixels and test if it's ocluded.

Reproject previous frame's z-buffer into current frame -> Approximate

2 pass occlusion culling:

Draw what was visible last frame

Build HZB from that

Test the HZB to determine what's visible now but wasn't the last frame and draw anything that's new

Decouples geometry from materials

REYES overshades 4x or more

Deferred materials via visibility buffer

Write the smallest amount of geometry data to the screen in form of depth, instance id and triangle id

Per-pixel material shader:

Loads the visibility buffer

Loads the triangle data

Transforms the 3vertex positions to the screen

Derive the barycentric coordinates for this pixel

Load and interpolate the vertex attributes

Lots of cache hits and no overdraw

Can draw all opaque geometry with a single draw call

CPU cost is independent from number of objects in the scene or in view

Triangles are only rasterized once

Pixel scale detail

Want zero perceptual loss of detail

Software rasterization 3x faster than hardware one on average compared to their fastest primitive shader implementation

Hw rasterizers are optimized for triangles that cover many pixels

Hw can be built to do the same thing they do but it's questionable if that's the best use of transistors vs giving more CU

Need to z-buffer in software vs using ROPs and depth-test hardware

Use the HW rasterizer for triangles more than 32 pixels long

Decide on SW of HW rasterizer per cluster based on which will be faster

Shadows

Can't ray trace them because there are more shadow rays than primary rays since there are more than 1 light per pixel on average

16K virtual shadow maps -> Mip mapped to 1 texel per pixel

Streaming

Stream entire groups of clusters to guarantee no cracks in the geometry

Fixed size memory pages with groups minimizing the number of pages that are likely to be needed at runtime

The root page of a cluster group (i.e. the most simplified LOD) is always in memory to always have something to render

During the cluster hierarchy traversal determine if child clusters would have been rendered if they were resident.

An object can request data from all the levels it needs for target quality

There's a lot more information on how to group clusters, how the geometry is simplified, and how the materials are culled in the presentation.

Some things worth noting:
- The streaming section where they discuss how they don't stream individual polygons and the compression section where they say disk access doesn't support random access of the Nanite geometry. I hope the discussion about reading single polygons from the disk when needed ends here
- Brian explicitly says they are not rendering 20 billion triangles but 25M. Those 20 billion triangles being thrown here multiple times is the source geometry before importing them to Nanite. Across the demo that 25M triangles is fairly constant.

And a couple of images with data:

Hope this is the end of the billions of triangles fud.

sinnergy · Aug 11, 2021

Corndog said:
Hope this is the end of the billions of triangles fud.

Probably not, they will read this, but don't post, and make new threads about billions of polygons

Lethal01 · Aug 11, 2021

onesvenus said:
Briank Karis presented at Siggraph yesterday with the title "A Deep Dive into Nanite Virtualized Geometry"
You can find the slides here

I've done a quick summary here. It's a long long but really interesting talk. They combine a lot of existing techniques to get something that has never been done before.

The dream: Using film quality source art

Voxels and implicit surfaces have a lot of potential advantages and are the most discussed direction to solve the problem

Data size problem: A 2M poly mesh gets converted to 13M sparse SDF voxels and it looks blobby

UV seams problems

Features vanishing when they are thinner than a voxel

Might be possible but need many years of research

Subdivision surfaces

Great for up close but still connects artist authoring choices with rendering cost

Displacement maps

Could capture displacement like normal maps now and the low poly could be even lower

There are geometries which can't be displaced (i.e. a sphere into a torus)

Great for up close but not good enough for general purpose simplification

Point based rendering

Either massive amounts of overdraw or hole filling required (how do we know if a hole should be there or it's a hole that should be filled?)

Nanite

Triangles are the core of Nanite

GPU driven pipeline

Triangle cluster culling:Group 128 triangles into clusters and build a bounding box for each cluster. Cull cluster based on their bounds.

No need to draw more triangles than pixels. Draw the same number of clusters every frame regardless of how many objects or how dense they are

Cost of rendering the geometry should scale with screen resolution, not screen complexity.

LOD using a hierarchy of clusters where parents are simplified versions of their children

At run time find a cut of the tree that matches the desired cost. Different parts of the same mesh can be at different LODs based on what's needed. A parent will be drawn instead of its children if you can't tell the difference (in pixels) from a given POV

No need to have all the clusters in memory but need to have all the tree structure in memory. Requests data based on demand -> If need children and not in RAM, request them from disk. If they are in RAM but haven't been drawn in a while, evict them.

LOD cracks when boundaries not match -> Group clusters that need to have the same LOD during build

Only change LODs that have less than 1 pixel of error -> TAA smoothes out the difference making the LOD pop-in imperceptible

Meshes with less than 128 triangles (i.e. a wall section of a building seen from very far away) can't be culled because it could mean entire buildings vanishing

Use static imposters -> Can produce pop when many instances of the same mesh are next to one another (like repeating wall sections)

Occlusion culling:Both against the frustum as well as occlusion cull done against a hierarchical z-buffer (HZB) -> Calculate a screen rect from the bounds, find the lowest mip where the rect is 4x4 pixels and test if it's ocluded.

Reproject previous frame's z-buffer into current frame -> Approximate

2 pass occlusion culling:

Draw what was visible last frame

Build HZB from that

Test the HZB to determine what's visible now but wasn't the last frame and draw anything that's new

Decouples geometry from materials

REYES overshades 4x or more

Deferred materials via visibility buffer

Write the smallest amount of geometry data to the screen in form of depth, instance id and triangle id

Per-pixel material shader:

Loads the visibility buffer

Loads the triangle data

Transforms the 3vertex positions to the screen

Derive the barycentric coordinates for this pixel

Load and interpolate the vertex attributes

Lots of cache hits and no overdraw

Can draw all opaque geometry with a single draw call

CPU cost is independent from number of objects in the scene or in view

Triangles are only rasterized once

Pixel scale detail

Want zero perceptual loss of detail

Software rasterization 3x faster than hardware one on average compared to their fastest primitive shader implementation

Hw rasterizers are optimized for triangles that cover many pixels

Hw can be built to do the same thing they do but it's questionable if that's the best use of transistors vs giving more CU

Need to z-buffer in software vs using ROPs and depth-test hardware

Use the HW rasterizer for triangles more than 32 pixels long

Decide on SW of HW rasterizer per cluster based on which will be faster

Shadows

Can't ray trace them because there are more shadow rays than primary rays since there are more than 1 light per pixel on average

16K virtual shadow maps -> Mip mapped to 1 texel per pixel

Streaming

Stream entire groups of clusters to guarantee no cracks in the geometry

Fixed size memory pages with groups minimizing the number of pages that are likely to be needed at runtime

The root page of a cluster group (i.e. the most simplified LOD) is always in memory to always have something to render

During the cluster hierarchy traversal determine if child clusters would have been rendered if they were resident.

An object can request data from all the levels it needs for target quality

There's a lot more information on how to group clusters, how the geometry is simplified, and how the materials are culled in the presentation.

Some things worth noting:
- The streaming section where they discuss how they don't stream individual polygons and the compression section where they say disk access doesn't support random access of the Nanite geometry. I hope the discussion about reading single polygons from the disk when needed ends here
- Brian explicitly says they are not rendering 20 billion triangles but 25M. Those 20 billion triangles being thrown here multiple times is the source geometry before importing them to Nanite. Across the demo that 25M triangles is fairly constant.

And a couple of images with data:

This is okay but a bit too short for my tastes

PaintTinJr · Aug 11, 2021

onesvenus said:
Briank Karis presented at Siggraph yesterday with the title "A Deep Dive into Nanite Virtualized Geometry"
You can find the slides here

I've done a quick summary here. It's a long long but really interesting talk. They combine a lot of existing techniques to get something that has never been done before.

The dream: Using film quality source art

Voxels and implicit surfaces have a lot of potential advantages and are the most discussed direction to solve the problem

Data size problem: A 2M poly mesh gets converted to 13M sparse SDF voxels and it looks blobby

UV seams problems

Features vanishing when they are thinner than a voxel

Might be possible but need many years of research

Subdivision surfaces

Great for up close but still connects artist authoring choices with rendering cost

Displacement maps

Could capture displacement like normal maps now and the low poly could be even lower

There are geometries which can't be displaced (i.e. a sphere into a torus)

Great for up close but not good enough for general purpose simplification

Point based rendering

Either massive amounts of overdraw or hole filling required (how do we know if a hole should be there or it's a hole that should be filled?)

Nanite

Triangles are the core of Nanite

GPU driven pipeline

Triangle cluster culling:Group 128 triangles into clusters and build a bounding box for each cluster. Cull cluster based on their bounds.

No need to draw more triangles than pixels. Draw the same number of clusters every frame regardless of how many objects or how dense they are

Cost of rendering the geometry should scale with screen resolution, not screen complexity.

LOD using a hierarchy of clusters where parents are simplified versions of their children

At run time find a cut of the tree that matches the desired cost. Different parts of the same mesh can be at different LODs based on what's needed. A parent will be drawn instead of its children if you can't tell the difference (in pixels) from a given POV

No need to have all the clusters in memory but need to have all the tree structure in memory. Requests data based on demand -> If need children and not in RAM, request them from disk. If they are in RAM but haven't been drawn in a while, evict them.

LOD cracks when boundaries not match -> Group clusters that need to have the same LOD during build

Only change LODs that have less than 1 pixel of error -> TAA smoothes out the difference making the LOD pop-in imperceptible

Meshes with less than 128 triangles (i.e. a wall section of a building seen from very far away) can't be culled because it could mean entire buildings vanishing

Use static imposters -> Can produce pop when many instances of the same mesh are next to one another (like repeating wall sections)

Occlusion culling:Both against the frustum as well as occlusion cull done against a hierarchical z-buffer (HZB) -> Calculate a screen rect from the bounds, find the lowest mip where the rect is 4x4 pixels and test if it's ocluded.

Reproject previous frame's z-buffer into current frame -> Approximate

2 pass occlusion culling:

Draw what was visible last frame

Build HZB from that

Test the HZB to determine what's visible now but wasn't the last frame and draw anything that's new

Decouples geometry from materials

REYES overshades 4x or more

Deferred materials via visibility buffer

Write the smallest amount of geometry data to the screen in form of depth, instance id and triangle id

Per-pixel material shader:

Loads the visibility buffer

Loads the triangle data

Transforms the 3vertex positions to the screen

Derive the barycentric coordinates for this pixel

Load and interpolate the vertex attributes

Lots of cache hits and no overdraw

Can draw all opaque geometry with a single draw call

CPU cost is independent from number of objects in the scene or in view

Triangles are only rasterized once

Pixel scale detail

Want zero perceptual loss of detail

Software rasterization 3x faster than hardware one on average compared to their fastest primitive shader implementation

Hw rasterizers are optimized for triangles that cover many pixels

Hw can be built to do the same thing they do but it's questionable if that's the best use of transistors vs giving more CU

Need to z-buffer in software vs using ROPs and depth-test hardware

Use the HW rasterizer for triangles more than 32 pixels long

Decide on SW of HW rasterizer per cluster based on which will be faster

Shadows

Can't ray trace them because there are more shadow rays than primary rays since there are more than 1 light per pixel on average

16K virtual shadow maps -> Mip mapped to 1 texel per pixel

Streaming

Stream entire groups of clusters to guarantee no cracks in the geometry

Fixed size memory pages with groups minimizing the number of pages that are likely to be needed at runtime

The root page of a cluster group (i.e. the most simplified LOD) is always in memory to always have something to render

During the cluster hierarchy traversal determine if child clusters would have been rendered if they were resident.

An object can request data from all the levels it needs for target quality

There's a lot more information on how to group clusters, how the geometry is simplified, and how the materials are culled in the presentation.

Some things worth noting:
- The streaming section where they discuss how they don't stream individual polygons and the compression section where they say disk access doesn't support random access of the Nanite geometry. I hope the discussion about reading single polygons from the disk when needed ends here
- Brian explicitly says they are not rendering 20 billion triangles but 25M. Those 20 billion triangles being thrown here multiple times is the source geometry before importing them to Nanite. Across the demo that 25M triangles is fairly constant.

And a couple of images with data:

That's a really great post you've done there onesvenus - to round up Brian's siggraph nanite talk that I'll have to fully read later - to go with the video Lethal01 recently uploaded with more detail on lumen.
Lots of missing stuff that have been argued about, that has finally been filled in, like the fact that the HZB doesn't use a hw z-buffer, and the comment regarding heavy cache hits - in the GPU - meaning the cache scrubbers and infinity cache on the AMD hardware is seems very well aligned to nanite's needs.

As for the comment regarding billions of triangles, that's surely a moot point, no?

GPUs typically specify polygons per second - not per frame, because frame-rate is application specific - and a constant 25Mpolygons per second, is around the billion mark ~40fps, and by Brian's numbers that is outperforming hw acceleration at 3x the real number an equivalent GPU can do in hardware, even though the GPUs technical specs say billions/spec.

But the main reason I would consider it a moot point, is that the illusion of billions of polygons on screen at once was convincing IMHO because of how nanite works, and because when the camera moves the new detail in the next frame moves accordingly as a REYES system, so locking in on the 25mpolys/frame number - discussed for months - feels like you are underselling nanite. say in comparison to any modern GPU just doing 25Mpolys/frame (from 75Mpolys) panning a character model in isolation.

onesvenus · Aug 12, 2021

PaintTinJr said:
the comment regarding heavy cache hits - in the GPU - meaning the cache scrubbers and infinity cache on the AMD hardware is seems very well aligned to nanite's needs

Yup, looks like it. I suppose Nvidia cards will brute force it but I hope one day we can see a fair comparison between UE5 running on PS5 and XSX to see what those cache scrubbers are used.

PaintTinJr said:
But the main reason I would consider it a moot point, is that the illusion of billions of polygons on screen at once was convincing IMHO because of how nanite works, and because when the camera moves the new detail in the next frame moves accordingly as a REYES system, so locking in on the 25mpolys/frame number - discussed for months - feels like you are underselling nanite. say in comparison to any modern GPU just doing 25Mpolys/frame (from 75Mpolys) panning a character model in isolation.

I didn't want to undersell Nanite at all, sorry if it seems like that. My comment about the billions was in reference to some Playstation fans claiming that the Lumen in the land of Nanite video was rendering billions while The Coalition UE5 demo was only rendering hundreds of millions. Neither is true, those numbers correspond to the source material and nothing else.

PaintTinJr · Aug 12, 2021

onesvenus said:
Yup, looks like it. I suppose Nvidia cards will brute force it but I hope one day we can see a fair comparison between UE5 running on PS5 and XSX to see what those cache scrubbers are used.

Yeah, the AMD approach compared to Nvidia approach to handle nanite at 2ms - following the extra nanite details from your post - certainly gets morr interesting IMO. I suspect Amd has the nanite advantage, and nvidia recovers it with lumen's foreground hw rt render path.

Going by thr info in the coalition ue5 video I don't think UE5 is optimized for XsX to have a fair comparison, but the cache hits might very well mean that Geordiemp's shader array assessment for ps5 vs xsx will be proved correct, even after optimization on xsx.

onesvenus said:
I didn't want to undersell Nanite at all, sorry if it seems like that. My comment about the billions was in reference to some Playstation fans claiming that the Lumen in the land of Nanite video was rendering billions while The Coalition UE5 demo was only rendering hundreds of millions. Neither is true, those numbers correspond to the source material and nothing else.

Yeah those twitter posts, etc weren't helpful at all, and out of order; especially when the coalition stuff was just to give developers and Epic insight about the state of ue5 on XsX in early access and "current" best practices for workflows - and the result of those workflows.

sinnergy · Aug 12, 2021

Lethal01 said:
This is okay but a bit too short for my tastes

Maybe he should have used a font that’s wider

Darius87 · Aug 12, 2021

we can put some misconceptions to bed like Lumen doesn't use RT hardware.

ZywyPL · Aug 12, 2021

onesvenus said:
No need to draw more triangles than pixels. Draw the same number of clusters every frame regardless of how many objects or how dense they are

I really hope they'll sort out the algorithm to actually achieve this, because ideally you really want/need to draw only ~8.3M polygons given native 4K resolution, whereas as of now the engine does like 3x as much polygons if not more but at 1440p at best, so it's wasting a huge amount of resources, especially on Lumen which is already heavy to compute but on top of that has to unnecessarily to do 3x more computations.

Corndog said:
Hope this is the end of the billions of triangles fud.

Nah, it's been already demonstrated some time ago by Epic themselves that the engine always draws more or less the same amount of clusters, to ensure constant number of polygons on screen. But some people simply don't wat to see/hear anything beyond the SSD narrative, the no LODs, trillions of triangles etc. Even once the actual games will hit the market you'll see excuses like "lazy/incompetent devs", "downgrade for the sake of parity" etc., 10-20 years from now you'll still be reading how the initial UE5 reveal was the biggest graphical jump in video games history but was wasted because of other, slower devices holding it back. While everyone else already knows since forever that SSDs don't render the graphics.

PaintTinJr · Aug 12, 2021

ZywyPL said:
I really hope they'll sort out the algorithm to actually achieve this, because ideally you really want/need to draw only ~8.3M polygons given native 4K resolution, whereas as of now the engine does like 3x as much polygons if not more but at 1440p at best, so it's wasting a huge amount of resources, especially on Lumen which is already heavy to compute but on top of that has to unnecessarily to do 3x more computations.

Actually, I would expect it to go the other way, and increase to 4x 8.3M because you need to sample at twice the bandwidth of the signal - nyquist theory IIRC - and with an x and y direction - maybe even z when things are in motion - so you would maybe need either 34Mpolys/frame or 51Mpolys/frame to ensure you aren't undersampling in any way for perfectly tessellated geometry at 4K

PaintTinJr · Aug 12, 2021

Darius87 said:
we can put some misconceptions to bed like Lumen doesn't use RT hardware.

No one said that IIRC, but It doesn't do the bulk of the lifting at present, because AFAIK it doesn't work with nanite geometry using SDF, but has to use the proxy mesh geometry using traditional primitives. So the choice is traditional rendering with HW RT or nanite geometry with SW RT from lumen, hence why the first 10metres is typically HW RT and the remainder of the scene is nanite with lumen SW RT.

Darius87 · Aug 12, 2021

PaintTinJr said:
No one said that IIRC, but It doesn't do the bulk of the lifting at present, because AFAIK it doesn't work with nanite geometry using SDF, but has to use the proxy mesh geometry using traditional primitives. So the choice is traditional rendering with HW RT or nanite geometry with SW RT from lumen, hence why the first 10metres is typically HW RT and the remainder of the scene is nanite with lumen SW RT.

many people believed Lumen wasn't using RT HW also Lumen is using SW RT because older GPU's doesn't support RT.

PaintTinJr · Aug 12, 2021

Darius87 said:
many people believed Lumen wasn't using RT HW also Lumen is using SW RT because older GPU's doesn't support RT.

That is what they said, but they've also said that Lumen needs a GTX 1080 or RX 5700., so unless Lumen is being used by more than just those - say like X1X, PS4 Pro, GTX 1050TI, RX 570 - then the statement sounds wrong for the justification, as most of the old HW without BVH/tensor cores didn't make the cut already.

hardware RT implies traditional primitives, but I suspect Epic now mean using BVH/tensor cores with nanite going forward.

Support NeoGAF

Inside Unreal: In-depth look at PS5's Lumen in the land Of Nanite demo(only 6.14gb of geometry) and Deep dive into Nanite

Member

Member

Banned

Member

Member

Member

Banned

Member

Member

Member

Banned

Member

Banned

Member

Banned

Member

Banned

Member

Banned

Member

Banned

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Member

Banned

Member

Member

Member

Fafracer forever

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Banned

Member

Member

Member

Member

Similar threads