Support NeoGAF

Ascend · Jun 23, 2020

Just wanted to add... Even though I did not make the statement that with the SSD you can load data mid-frame from the GPU... There's this;

There are loads of things for us to still explore in the new hardware but I’m most intrigued to see what we can do with the new SSD drive and the hardware decompression capabilities in the Xbox Velocity Architecture.
The drive is so fast that I can load data mid-frame, use it, consume it, unload and replace it with something else in the middle of a frame, treating GPU memory like a virtual disk. How much texture data can I now load?
I’m looking forward to pushing this as far as I can to see what kind of beautiful experiences we can deliver.

Inside Xbox Series X Optimized: Dirt 5 - Xbox Wire

When it launches this holiday, Xbox Series X will be the most powerful console the world has ever seen. One of the biggest benefits of all that power is giving developers the ability to make games that are Xbox Series X Optimized. This means that they’ve taken full advantage of the unique...

news.xbox.com

polybius80 · Jun 23, 2020

Ascend said:
Just wanted to add... Even though I did not make the statement that with the SSD you can load data mid-frame from the GPU... There's this;

There are loads of things for us to still explore in the new hardware but I’m most intrigued to see what we can do with the new SSD drive and the hardware decompression capabilities in the Xbox Velocity Architecture.
The drive is so fast that I can load data mid-frame, use it, consume it, unload and replace it with something else in the middle of a frame, treating GPU memory like a virtual disk. How much texture data can I now load?
I’m looking forward to pushing this as far as I can to see what kind of beautiful experiences we can deliver.

Inside Xbox Series X Optimized: Dirt 5 - Xbox Wire

When it launches this holiday, Xbox Series X will be the most powerful console the world has ever seen. One of the biggest benefits of all that power is giving developers the ability to make games that are Xbox Series X Optimized. This means that they’ve taken full advantage of the unique...

news.xbox.com

you made the calculation, the amount of data that can be retrieved is too small and probably not worth it, its good they experiment to see how far they can go but its not worth the trouble for a handful of MB because you can interfere streaming or streaming may introduce lag(assuming is a super important data) IMO specially in a racing game where you can be streaming the whole scene as you advance, the same with PS5, as impressive as U5 demo was no matter if it can retrieve 2x is still not worth it for retrieve and use during frame

Ascend said:
Just wanted to add... Even though I did not make the statement that with the SSD you can load data mid-frame from the GPU...

JackNightblade · Jun 23, 2020

Lethal01 said:
We really will need to see exactly how it's implemented before we start claiming it's on the same level as ratchet and clank. .

"no discernible load times or impact to game performance and graphics," could still include the new world taking 10 seconds to fade in. What they have demonstrated so far just isn't as impressive.

ok but if R&C did it in “2 seconds” then the xsx would do it in 4 seconds if we take the ssd speeds at face value and not factor things in that we dont fully understand yet, like velocity architecture.

so at worse, 4 seconds compared to 2 seconds. that will be hardly noticeable AND at higher fidelity on the xsx

thicc_girls_are_teh_best · Jun 23, 2020

polybius80 said:
you made the calculation, the amount of data that can be retrieved is too small and probably not worth it, its good they experiment to see how far but its nor worth the trouble for a handful of MB, IMO, the same with PS5 no matter if it can retrieve 2x is still not worth it for mid-frame data

It is if the system is stressed with small RAM, which both PS5 and XSX fall into classification of since they only have a 2x increase (PS5) or in XSX's case only 4 more GB over the XBO X. So it would make sense that both systems took their SSDs into equation to see if they could be leveraged for, essentially, a multiplier effect of the RAM via making it more useful. They both have their own solutions in that regard.

With XSX, a dev could stream in 40 MB per frame in a 60 FPS game (double that to 80 MB for a 30 FPS title; halve it to 20 MB for a 120 FPS game) if it's just raw texture data. However you could double that to 80 MB per frame in a 60 FPS title if the data is compressed. That sounds small in isolation but it actually isn't; if the odd mention a long time ago by that AMD engineer's LInkedIn on XSX's GPU about ARM cores being in the APU design is true, and knowing ARM cores could easily facilitate a replacement for the FPGA cores Nvidia uses with their GPUs if MS and AMD wanted to do something similar there, then it's not impossible the GPU is reading the MBs of data to load in the caches, and using data that is expected to stay in the caches it's in for a certain amount of time so that frequent read accesses to replace with more data from the SSD isn't even necessary.

Keep in mind the GPU is streaming in specified textures and/or other assets from the SSD in very specific use-cases. So of course it has its limitations. However I think there's other reasons MS would choose to use this approach with their system but it'd take a while to explain through it right now and it's already kinda late so...

polybius80 · Jun 23, 2020

thicc_girls_are_teh_best said:
It is if the system is stressed with small RAM, which both PS5 and XSX fall into classification of since they only have a 2x increase (PS5) or in XSX's case only 4 more GB over the XBO X. So it would make sense that both systems took their SSDs into equation to see if they could be leveraged for, essentially, a multiplier effect of the RAM via making it more useful. They both have their own solutions in that regard.

With XSX, a dev could stream in 40 MB per frame in a 60 FPS game (double that to 80 MB for a 30 FPS title; halve it to 20 MB for a 120 FPS game) if it's just raw texture data. However you could double that to 80 MB per frame in a 60 FPS title if the data is compressed. That sounds small in isolation but it actually isn't;

yes you can retrieve 40-80-120 MB(or 80-160-240 in PS5 case) per frame and put it together its called streaming

, I didnt said what you can retrieve per frame was small in fact its a lot of data compared to current gen, I said its not worth the problem to be retrieved-used-discarded mid-frame instead of loading in advance, the fast SSD drives allow many things precisely because you can load lot of datavery fast

if you realize you need a texture during midframe and then go and retrieve it and then discard it as mentioned you have a fraction of the frametime to retrieve that data not the whole 16 ms(at 60 FPS), the GPU is not going to be waiting for the data the whole frametime there are other parts of the frame to render, if you start streaming at the very beginning of the frame sure you can retrieve the 40-80-120 MB but the whole data is available by the end of the frame and for that you have to know what you need to retrieve the frame before or at the very beginning and you will have all the 40-80-120 MB for the next frame not for the current one

it makes sense to have more space if ram is insufficient but it doesnt mean you can use SSD like RAM without its problems, as fast as SSD is its not the same as RAM I think you are underestimating the technical problems, saving differences there is a similar example in the past, the GC has a slow amount of RAM(I think 16 MB) for sound and I/O that wasnt used directly for textures or geometry like the normal RAM and even normal RAM was used to stream textures to the texture cache many times during frame time, if textures where to be streamed from the slow ram they are supposed to be streamed to a cache in normal ram in advance to be used to improve texture availability not directly to texture cache as this has it access time and can be filled way more faster with normal ram, that was then but I think is a good example to give us an idea of use cases 16 MB is nothing for current standards but back them was a very good space

about the RAM space PS4 pro and XBX are prepared to have more RAM space than base consoles because are targeting higher resolutions as high as 4K, and then the next consoles will improve on that so its a bit more than what you think, is clear both systems are pointing to streaming, RAM space can be improved in many ways for example you can stream other kinds of data like sound and have a very small cache in RAM there are many ways SSD improve RAM usage but lets use more realistic cases instead of burning a forest just to stream a 4k texture directly to GPU from SSD at mid frame just because we didnt want to do that some frames before using 100-500 MB from RAM that runs at worst at 336 GB/s, to instead read at 2-5 GB/s from SSD data to be used directly at mid-frame

thicc_girls_are_teh_best said:
if the odd mention a long time ago by that AMD engineer's LInkedIn on XSX's GPU about ARM cores being in the APU design is true, and knowing ARM cores could easily facilitate a replacement for the FPGA cores Nvidia uses with their GPUs if MS and AMD wanted to do something similar there, then it's not impossible the GPU is reading the MBs of data to load in the caches, and using data that is expected to stay in the caches it's in for a certain amount of time so that frequent read accesses to replace with more data from the SSD isn't even necessary.

I dont know what you trying to say here, you are basically saying that the cache is doing what the cache do, why was impossible?

thicc_girls_are_teh_best said:
Keep in mind the GPU is streaming in specified textures and/or other assets from the SSD in very specific use-cases. So of course it has its limitations. However I think there's other reasons MS would choose to use this approach with their system but it'd take a while to explain through it right now and it's already kinda late so...

maybe in specific cases, but those are supposed to be avoided and probably will result in a stall even in PS5 that has double speed

Ascend · Jun 23, 2020

polybius80 said:
you made the calculation, the amount of data that can be retrieved is too small and probably not worth it, its good they experiment to see how far they can go but its not worth the trouble for a handful of MB because you can interfere streaming or streaming may introduce lag(assuming is a super important data) IMO specially in a racing game where you can be streaming the whole scene as you advance, the same with PS5, as impressive as U5 demo was no matter if it can retrieve 2x is still not worth it for retrieve and use during frame

I don't know if the amount of data is really too small. How much data actually needs to be transferred to RAM per frame in current games? Taking a figure of 100MB/s, which is still a bit high, you'd end up with 1.7MB per frame. Let's say 2 MB. With the current consoles, we're literally 20 times faster. Not to mention the wasteful loading of resources that are never used, which is also addressed with the XSX.

From another perspective, what currently takes a whole second to load from an HDD, or 60 frames, or 1000 ms, now would take only 2 and a half frames if loaded from the SSD, or 42 ms. If you leave the lower quality mip in RAM, as you move closer to whichever higher quality texture requires loading, there is little reason to transfer it to RAM first if the GPU sees the SSD as RAM also. Yes, you worry about pop-in, but that is exactly what the texture filters are there for. I doubt MS would design something redundant.

Additionally, if you need a low quality mip, you don't need that max amount at all. If you take the info we have on the sampler feedback streaming, the smallest tiles, which are likely to be 64KB, you could literally transfer around 35 mips per frame.

Sampler Feedback

Engineering specs for DirectX features.

microsoft.github.io

And remember that it's not only about the amount of data. It's also about the latency. Reading from the SSD to then write to RAM, to then read RAM with the GPU will inevitably have more latency than the GPU reading directly from the SSD, even if the amount of data transferred is less. There is an optimum somewhere about how much and what to load into RAM, and what to load directly from the SSD, and over time developers will find it.

cormack12 · Jun 23, 2020

Ascend said:
Just wanted to add... Even though I did not make the statement that with the SSD you can load data mid-frame from the GPU... There's this;

There are loads of things for us to still explore in the new hardware but I’m most intrigued to see what we can do with the new SSD drive and the hardware decompression capabilities in the Xbox Velocity Architecture.
The drive is so fast that I can load data mid-frame, use it, consume it, unload and replace it with something else in the middle of a frame, treating GPU memory like a virtual disk. How much texture data can I now load?
I’m looking forward to pushing this as far as I can to see what kind of beautiful experiences we can deliver.

Inside Xbox Series X Optimized: Dirt 5 - Xbox Wire

When it launches this holiday, Xbox Series X will be the most powerful console the world has ever seen. One of the biggest benefits of all that power is giving developers the ability to make games that are Xbox Series X Optimized. This means that they’ve taken full advantage of the unique...

news.xbox.com

I think you're misreading this because of the way it's quoted. I think the point you're meant to take is

The drive is so fast that I can [..]treat[ing] GPU memory like a virtual disk.

As can both consoles. It means the I/O is so fast you can page on demand rather than having a block of assigned memory that you pull an asset from (i.e. static datastore). Also, to touch on earlier general points in the thread, Ratchet and Clank isn't impressive because it loads a level in a couple of seconds. It's impressive because of the reduction in time between those sections. It's the 'negative' time you need to focus on to realise the benefits.

The first rift jump could be done now by caching all that secondary level, but then it would have to refill it's cache again to load the next rift, which you're talking probably 3 minutes+. The demo showed it was ready to do subsequent rift jumps after 7 seconds each time, at

1:05
1:12
1:19

It's the 7 seconds over literal minutes that should be the talking point. In those 7 seconds, an entirely new level is loaded, ready to go while still in the old environment. Then the discussion can move onto how beneficial is this in most cases? I can see alternate realities, maybe some fantasy RPG's taking advantage of this but over and above the initial load, how many titles are going to be refreshing the entire world that quickly? So you can squeeze higher textures or more unique assets down the I/O. Unique assets have a cost, a high cost so all but the triple AAA studio's are probably not going to be doing this. Also, it can ruin scene focus/continuity if you kinda throw loads of shit up that looks unique but inconsistently so.

polybius80 · Jun 23, 2020

Ascend said:
And remember that it's not only about the amount of data. It's also about the latency. Reading from the SSD to then write to RAM, to then read RAM with the GPU will inevitably have more latency than the GPU reading directly from the SSD, even if the amount of data transferred is less.

RAM operates in nano seconds
SSD operates in microseconds

both latency and transfer speed are important, but this is in the context of GPU, lets asume for argument sake SSD has lower latency, so what? GPU have much more tolerance to high latency, for a GPU what is important is to receive a high amount of data they process and then take more if the order to start retrieving data is faster that is not important as there is plenty of time while is busy working

its basically the same discussion after xbox one reveal, people tried to portray DDR3 as better choice arguing lower latency vs GDDR5, was flawed argument when it comes to graphics and actually the difference wasnt so big for CPU yet people was repeating over an over until the secret chip or other stupid arguments and then pretended they said nothing about it

you just keep repeating yourself over an over that is not going to make your argument true but I am not interested in repeating the same quotes you don't read/understand over and over

good day

Jigsaah · Jun 23, 2020

99% of these posts are over my head. 55 pages, is there some form of consensus yet with all the calculus going on here?

It seems like both consoles use different technical philosophies to reach a similar goal. I suppose it's easier to understand how to PS5's SSD should be faster, but the nuances of Xbox Velocity architecture could realistically close the gap presented in the raw numbers.

Am I in the ballpark?

Nikana · Jun 23, 2020

Jigsaah said:
99% of these posts are over my head. 55 pages, is there some form of consensus yet with all the calculus going on here?

Yes. Pineapple doesn't go on pizza.

Jigsaah · Jun 23, 2020

Nikana said:
Yes. Pineapple doesn't go on pizza.

8gqKQn7P64rwQCEa8NrPaQVSHwxvFjxQuwLftt3nxjoDjpFXmWMYVvYf1_eZjGbExQKJGNnlvCFGjHlFQwGCW7p1ainYH6m4JSsyY_lKqHHz42W-WaqU8WpzCD8iOcZAolXXi4XO-dLMFJP3wexHPFZDChj24_UYPY4

Nikana · Jun 23, 2020

Jigsaah said:

Facts hurt dont they?

Ascend · Jun 23, 2020

polybius80 said:
RAM operates in nano seconds
SSD operates in microseconds

Yeah. I said that a few posts ago. Here is the permalink

polybius80 said:
both latency and transfer speed are important,

Yes. My previous post implied that...

polybius80 said:
but this is in the context of GPU, lets asume for argument sake SSD has lower latency, so what? GPU have much more tolerance to high latency, for a GPU what is important is to receive a high amount of data they process and then take more if the order to start retrieving data is faster that is not important as there is plenty of time while is busy working

The GPU might be able to handle high latency relatively better than a CPU, but that does not mean that lower latency still is not better for the GPU than high latency.

polybius80 said:
its basically the same discussion after xbox one reveal, people tried to portray DDR3 as better choice arguing lower latency vs GDDR5, was flawed argument when it comes to graphics and actually the difference wasnt so big for CPU yet people was repeating over an over until the secret chip or other stupid arguments and then pretended they said nothing about it

Higher bandwidth is better than low latency for as long as the bandwidth gives benefits. Again, that does not mean that lower latency does not have its own benefits. If you have 1000GB/s bandwidth but the latency is 100ms, it still won't get you anywhere for the GPU since nothing can be loaded and processed in time. You'd literally be better off with 10GB/s with 1 ms latency.

polybius80 said:
you just keep repeating yourself over an over that is not going to make your argument true but I am not interested in repeating the same quotes you don't read/understand over and over

good day

Maybe that's because you are not really addressing anything I am saying. You completely ignore the fact that a longer pathway from the SSD to the GPU through RAM is inevitably slower than a shorter pathway. I have to repeat myself in different ways because clearly you are missing the whole point of what I am saying in every single post.

You have not said anything I do not already know. It's quite laughable that you claim I don't read nor understand your statements. Even more laughable is the accusation of me repeating things while you've been repeating things I have already said in prior posts. Take a look in the mirror.
But you keep shifting the goal post, and when I bring things back, you say I'm repeating myself as a way to try and dismiss it, because it doesn't conform to your beliefs. Everything you've thrown at me I have addressed, some of them with statements by MS, and some of them through simple logical deduction. In many cases, your 'concerns' had already been addressed in a post you are directly quoting. So I don't get how you can claim these absurdities of me not reading or understanding. Go troll somewhere else.

oldergamer · Jun 23, 2020

Ascend said:
Yeah. I said that a few posts ago. Here is the permalink

Yes. My previous post implied that...

The GPU might be able to handle high latency relatively better than a CPU, but that does not mean that lower latency still is not better for the GPU than high latency.

Higher bandwidth is better than low latency for as long as the bandwidth gives benefits. Again, that does not mean that lower latency does not have its own benefits. If you have 1000GB/s bandwidth but the latency is 100ms, it still won't get you anywhere for the GPU since nothing can be loaded and processed in time. You'd literally be better off with 10GB/s with 1 ms latency.

Maybe that's because you are not really addressing anything I am saying. You completely ignore the fact that a longer pathway from the SSD to the GPU through RAM is inevitably slower than a shorter pathway. I have to repeat myself in different ways because clearly you are missing the whole point of what I am saying in every single post.

You have not said anything I do not already know. It's quite laughable that you claim I don't read nor understand your statements. Even more laughable is the accusation of me repeating things while you've been repeating things I have already said in prior posts. Take a look in the mirror.
But you keep shifting the goal post, and when I bring things back, you say I'm repeating myself as a way to try and dismiss it, because it doesn't conform to your beliefs. Everything you've thrown at me I have addressed, some of them with statements by MS, and some of them through simple logical deduction. In many cases, your 'concerns' had already been addressed in a post you are directly quoting. So I don't get how you can claim these absurdities of me not reading or understanding. Go troll somewhere else.

He is trolling. just ignore him man.

Physiognomonics · Jun 23, 2020

Ascend said:
Just wanted to add... Even though I did not make the statement that with the SSD you can load data mid-frame from the GPU... There's this;

There are loads of things for us to still explore in the new hardware but I’m most intrigued to see what we can do with the new SSD drive and the hardware decompression capabilities in the Xbox Velocity Architecture.
The drive is so fast that I can load data mid-frame, use it, consume it, unload and replace it with something else in the middle of a frame, treating GPU memory like a virtual disk. How much texture data can I now load?
I’m looking forward to pushing this as far as I can to see what kind of beautiful experiences we can deliver.

Inside Xbox Series X Optimized: Dirt 5 - Xbox Wire

When it launches this holiday, Xbox Series X will be the most powerful console the world has ever seen. One of the biggest benefits of all that power is giving developers the ability to make games that are Xbox Series X Optimized. This means that they’ve taken full advantage of the unique...

news.xbox.com

Of course it can. But the GPU has to stop working, flush all the caches in order to get notified of the new data, and then re-start working on the frame. Doing so will impact performance.

Lethal01 · Jun 23, 2020

JackNightblade said:
ok but if R&C did it in “2 seconds” then the xsx would do it in 4 seconds if we take the ssd speeds at face value and not factor things in that we dont fully understand yet, like velocity architecture.

so at worse, 4 seconds compared to 2 seconds. that will be hardly noticeable AND at higher fidelity on the xsx

"We really will need to see exactly how it's implemented before we start claiming it's on the same level as ratchet and clank."

My whole point is it's far to early to claim we know, If the difference is small great.
We just don't have anything right now to prove the XSX is on the same level as the ps5 when it comes to transfer from storage.

Deleted member 775630 · Jun 23, 2020

Lethal01 said:
"We really will need to see exactly how it's implemented before we start claiming it's on the same level as ratchet and clank."

My whole point is it's far to early to claim we know, If the difference is small great.
We just don't have anything right now to prove the XSX is on the same level as the ps5 when it comes to transfer from storage.

That's true but we also have no prove that the level of PS5 is needed for next-gen games. For all we know it's unnecessarily fast.

Trueblakjedi · Jun 23, 2020

Lethal01 said:
"We really will need to see exactly how it's implemented before we start claiming it's on the same level as ratchet and clank."

My whole point is it's far to early to claim we know, If the difference is small great.
We just don't have anything right now to prove the XSX is on the same level as the ps5 when it comes to transfer from storage.

This is at 120fps not 60 right? so at 8ms per frame...

Lethal01 · Jun 23, 2020

Trueblakjedi said:
This is at 120fps not 60 right? so at 8ms per frame...

What are you saying？

Ascend · Jun 23, 2020

cormack12 said:
I think you're misreading this because of the way it's quoted. I think the point you're meant to take is

As can both consoles. It means the I/O is so fast you can page on demand rather than having a block of assigned memory that you pull an asset from (i.e. static datastore).

Nah... I don't think I'm misreading it. You are indeed correct that you can page on demand. He's saying you can do it mid-frame, which means you can do all those operations while the GPU is doing other stuff. It does not say that you can do all those things within the same frame, which I was accused of saying.

Which brings in the question, what does it mean to use GPU memory as a virtual disk? Doesn't this again reinforce the concept that the GPU does not see a difference between the RAM and the SSD?

oldergamer · Jun 25, 2020

So came across an interesting theory on what velocity architecture is. I think i like this one....

Latest theory posted by a reddit user:

So MSFT has stated that they will be able to use the SSD as virtual RAM. This has led to alot of speculation about what this is exactly and the implications it has for performance. I've read several posts and papers related to the topic across several platforms and I've reached a fairly solid conclusion imho what it is and I'd like to share here. And just a spoiler I don't think it is persistent memory.

I think the virtual RAM on the XSX is simply implemented using an HBCC(High Bandwidth Cache Controller). If true, this would be a game changing feature not present on other next-gen hardware systems.

https://preview.redd.it/hiwjg4l1wv651.jpg?width=1536&format=pjpg&auto=webp&s=2a9d618d50e975f86a4bb9c1f458196d67ab7477

With this controller, the developer simply has to request for data on the SSD without having to worry too much about how it's going to be placed in RAM, so less time spent on memory management. The HBCC determines what part of data in the SSD is sent to RAM at a granularity not possible using traditional methods. It does this still by using segmented pages. This means that despite not being byte addressable, data sent in from the SSD will be broken down into the smallest possible sizes and will be just that data that is needed by the CPU.

Here is an illustration using HBCC on a PC(Unlike the next-gen consoles, PCs usually don't have unified memory and the data needs to be sent to system RAM before being sent to Video RAM.)

https://preview.redd.it/km3kq7axwv651.png?width=2828&format=png&auto=webp&s=aae0a8a9362022a384d731ee67afd49c1fe94cc7

This image represents data being sent from system RAM to VRAM with and without HBCC. Notice the finer granularity with HBCC; smaller data sizes in the form of pages are being sent. This results into better RAM utilization, reduced wastage on unused data and an effective increase in the addressable space. For example look at the wastage that existed by sending the red data into VRAM. With paging, you can just get those pieces that will be used. Look at the dark blue block, we couldn't send any of that data into VRAM but now we have 4 pages with HBCC on. Same for the dark black block. We couldn't send any data into VRAM but now have 3 pages. So the HBCC converts/treats the VRAM as a "last-level cache", on the consoles it will be all the RAM.

This is how it will look on the XSX:

https://preview.redd.it/tzptbnaezv651.png?width=2828&format=png&auto=webp&s=85e09d6805cbb170b7c0fa346cbed4b7dbdfb7fc

So this effectively converts the game install on the SSD into virtual RAM. Due to the low latency and instant access time of the SSD, it means any page needed of the game install can instantly be made available in RAM, ready for the CPU/GPU to access. And this is where the 100GB virtual RAM comes from. I just have to say HBCC is rated to handle up to 512TB of data so it's surprising that MSFT has stated only up to 100GB and not the whole game install if it goes beyond this figure.

Ascend · Jun 25, 2020

oldergamer said:
So came across an interesting theory on what velocity architecture is. I think i like this one....

Latest theory posted by a reddit user:

So MSFT has stated that they will be able to use the SSD as virtual RAM. This has led to alot of speculation about what this is exactly and the implications it has for performance. I've read several posts and papers related to the topic across several platforms and I've reached a fairly solid conclusion imho what it is and I'd like to share here. And just a spoiler I don't think it is persistent memory.

I think the virtual RAM on the XSX is simply implemented using an HBCC(High Bandwidth Cache Controller). If true, this would be a game changing feature not present on other next-gen hardware systems.

https://preview.redd.it/hiwjg4l1wv651.jpg?width=1536&format=pjpg&auto=webp&s=2a9d618d50e975f86a4bb9c1f458196d67ab7477

With this controller, the developer simply has to request for data on the SSD without having to worry too much about how it's going to be placed in RAM, so less time spent on memory management. The HBCC determines what part of data in the SSD is sent to RAM at a granularity not possible using traditional methods. It does this still by using segmented pages. This means that despite not being byte addressable, data sent in from the SSD will be broken down into the smallest possible sizes and will be just that data that is needed by the CPU.

Here is an illustration using HBCC on a PC(Unlike the next-gen consoles, PCs usually don't have unified memory and the data needs to be sent to system RAM before being sent to Video RAM.)

https://preview.redd.it/km3kq7axwv651.png?width=2828&format=png&auto=webp&s=aae0a8a9362022a384d731ee67afd49c1fe94cc7

This image represents data being sent from system RAM to VRAM with and without HBCC. Notice the finer granularity with HBCC; smaller data sizes in the form of pages are being sent. This results into better RAM utilization, reduced wastage on unused data and an effective increase in the addressable space. For example look at the wastage that existed by sending the red data into VRAM. With paging, you can just get those pieces that will be used. Look at the dark blue block, we couldn't send any of that data into VRAM but now we have 4 pages with HBCC on. Same for the dark black block. We couldn't send any data into VRAM but now have 3 pages. So the HBCC converts/treats the VRAM as a "last-level cache", on the consoles it will be all the RAM.

This is how it will look on the XSX:

https://preview.redd.it/tzptbnaezv651.png?width=2828&format=png&auto=webp&s=85e09d6805cbb170b7c0fa346cbed4b7dbdfb7fc

So this effectively converts the game install on the SSD into virtual RAM. Due to the low latency and instant access time of the SSD, it means any page needed of the game install can instantly be made available in RAM, ready for the CPU/GPU to access. And this is where the 100GB virtual RAM comes from. I just have to say HBCC is rated to handle up to 512TB of data so it's surprising that MSFT has stated only up to 100GB and not the whole game install if it goes beyond this figure.

That is actually interesting, and actually really plausible... Listen to what Raja says here.. Timestamped... Does what he says at around 1:42 sound like what MS is saying...?

Edit:
Yeah... I'm being convinced that this is it... Listen to this... Timestamped again...

DavidGzz · Jun 25, 2020

Secwet sauce?

MCplayer · Jun 25, 2020

Guys... forget HBCC, this is sounding as ridiculous as the "second secret GPU" in the beginning of this current gen.

Microsoft would have already pointed that out if it was the case.

Ascend · Jun 25, 2020

MCplayer (Master Chief) said:
Microsoft would have already pointed that out if it was the case.

That's quite a weak argument.

Major_Key · Jun 25, 2020

New gaming experiences with seamless content paging from the SSD to the GPU based on the revolutionary Xbox Velocity Architecture

From AMD https://community.amd.com/community...rcharge-console-gaming-with-the-xbox-series-x

cormack12 · Jun 25, 2020

Ascend said:
Nah... I don't think I'm misreading it. You are indeed correct that you can page on demand. He's saying you can do it mid-frame, which means you can do all those operations while the GPU is doing other stuff. It does not say that you can do all those things within the same frame, which I was accused of saying.

Which brings in the question, what does it mean to use GPU memory as a virtual disk? Doesn't this again reinforce the concept that the GPU does not see a difference between the RAM and the SSD?

I'm not sure of this history, just speaking to the effect of still needing to get from the SSD into memory rather than using the SSD as the memory.

So this effectively converts the game install on the SSD into virtual RAM. Due to the low latency and instant access time of the SSD, it means any page needed of the game install can instantly be made available in RAM, ready for the CPU/GPU to access.

This is how both next gen machines are going to work, hence the streaming debate. The technical approaches may be different but the aim of streaming data from the SSD into proper memory for on-demand or Just-in-Time use is what's going to change gaming as a whole and why we have the discussion about cross gen and PC mechnical disks being the lowest target component. All the custom terminology is largely irrelevant except for 'features/labelling'.

The unknowns for us appear to be performance versus efficiency. Essentially the bandwidth available versus making the payload smaller. Sony have a lot of bandwidth - if they need that 5GB data on screen they can feed all that down at one latency. Microsoft have less bandwidth but expected higher efficiencies. What they can send down their bandwidth might be more relevant or more intelligently packaged data. This is all conjecture by the way, and theory. As I understand it, some of the data sent through the 5GB might have overhead on it (in terms fo data size) whereas MS are aiming to remove that overhead. Its quite an interesting battle to be honest. In all likelihood it probably won't matter - look how good Uncharted and Gears looked this gen.

oldergamer · Jun 25, 2020

cormack12 said:
I'm not sure of this history, just speaking to the effect of still needing to get from the SSD into memory rather than using the SSD as the memory.
This is how both next gen machines are going to work, hence the streaming debate. The technical approaches may be different but the aim of streaming data from the SSD into proper memory for on-demand or Just-in-Time use is what's going to change gaming as a whole and why we have the discussion about cross gen and PC mechnical disks being the lowest target component. All the custom terminology is largely irrelevant except for 'features/labelling'.

I wouldn't go out on a limb and say that this is how both consoles are going to work. They are approaching similar things quite differently. Sony with the more bruteforce speed approach( super fast SSD and decompression) , and MS with the more efficient (to get the most out of the hardware they use) approach.

Also AMD just stated "New gaming experiences with seamless content paging from the SSD to the GPU based on the revolutionary Xbox Velocity Architecture"

Bernkastel · Jun 25, 2020

Bernkastel said:
Use of Machine Learning in texture compression
Patent US20200105030A1 and US20190304138A1 describes the use of machine learning in texture compression or upscaling and reducing search space for real time texture compression.

US20200105030A1 - Machine learning applied to textures compression or upscaling - Google Patents

Methods and devices for generating hardware compatible compressed textures may include accessing, at runtime of an application program, graphics hardware incompatible compressed textures in a format incompatible with a graphics processing unit (GPU). The methods and devices may include...

patents.google.com

US20190304138A1 - Reducing the search space for real time texture compression - Google Patents

Methods and devices for real time texture compression may include accessing graphics hardware incompatible compressed textures in a format incompatible with the GPU, and a metadata file associated with the graphics hardware incompatible compressed textures, wherein the metadata file includes at...

patents.google.com

Video games are experiencing problems with textures taking too much storage. Having a relatively large storage footprint effects the speed with which games can load textures. Block compression used by games at runtime to save memory, bandwidth and cache pressure have a fixed compression ratio. Other schemes present far better compression ratio but are not in a format directly usable by GPU. One method is, using a machine learning model the graphics hardware incompatible compressed textures(e.g., Machine Learning Image Compression, JPEG compression, wavelet compression etc.,) will be converted into hardware compatible compressed textures usable by GPU at runtime of the application. Another method relates to a computer readable medium storing instructions executable by a computing device, causing the computing device to access at runtime of an application, graphics hardware incompatible compressed textures in a format incompatible with GPU and using the instructions to convert them into hardware compatible compressed in run time. This will help in reducing input/output bandwidth and the actual size of game data.

Microsoft’s Game Stack chief: The next generation of games and game development

James Gwertzman, general manager of Microsoft Game Stack, gets to see how thousand of games are developed and run. He talked about the future of games.

venturebeat.com

Gwertzman: You were talking about machine learning and content generation. I think that’s going to be interesting. One of the studios inside Microsoft has been experimenting with using ML models for asset generation. It’s working scarily well. To the point where we’re looking at shipping really low-res textures and having ML models uprez the textures in real time. You can’t tell the difference between the hand-authored high-res texture and the machine-scaled-up low-res texture, to the point that you may as well ship the low-res texture and let the machine do it.
Journalist: Can you do that on the hardware without install time?
Gwertzman: Not even install time. Run time.
Journalist: To clarify, you’re talking about real time, moving around the 3D space, level of detail style?
Gwertzman: Like literally not having to ship massive 2K by 2K textures. You can ship tiny textures.
Journalist: Are you saying they’re generated on the fly as you move around the scene, or they’re generated ahead of time?
Gwertzman: The textures are being uprezzed in real time.

Click to expand...

MCplayer · Jun 25, 2020

Ascend said:
That's quite a weak argument.

I literally don't know what to say to that... but if you prefer to believe in something that wasn't official revealed (if it had, MS would have already milked it, since its revolutionary for consoles and even pc) thats your problem and not even AMD said anything about it.
Anything

thicc_girls_are_teh_best · Jun 25, 2020

MCplayer (Master Chief) said:
I literally don't know what to say to that... but if you prefer to believe in something that wasn't official revealed (if it had, MS would have already milked it, since its revolutionary for consoles and even pc) thats your problem and not even AMD said anything about it.
Anything

.........

Major_Key said:
Major_Key said:

New gaming experiences with seamless content paging from the SSD to the GPU based on the revolutionary Xbox Velocity Architecture

From AMD https://community.amd.com/community...rcharge-console-gaming-with-the-xbox-series-x

Click to expand...

I mean, that's more or less a way to speak about it without MS outright confirming it from their own mouths. AMD basically has done the same with some PS5 features, too.

Wondering if the HBCC has the ARM processors that one AMD engineer LinkedIn profile explicitly mentioned in their dev experience on XSX?

cormack12 said:
I'm not sure of this history, just speaking to the effect of still needing to get from the SSD into memory rather than using the SSD as the memory.

This is how both next gen machines are going to work, hence the streaming debate. The technical approaches may be different but the aim of streaming data from the SSD into proper memory for on-demand or Just-in-Time use is what's going to change gaming as a whole and why we have the discussion about cross gen and PC mechnical disks being the lowest target component. All the custom terminology is largely irrelevant except for 'features/labelling'..,

The unknowns for us appear to be performance versus efficiency. Essentially the bandwidth available versus making the payload smaller. Sony have a lot of bandwidth - if they need that 5GB data on screen they can feed all that down at one latency. Microsoft have less bandwidth but expected higher efficiencies. What they can send down their bandwidth might be more relevant or more intelligently packaged data. This is all conjecture by the way, and theory. As I understand it, some of the data sent through the 5GB might have overhead on it (in terms fo data size) whereas MS are aiming to remove that overhead. Its quite an interesting battle to be honest. In all likelihood it probably won't matter - look how good Uncharted and Gears looked this gen.

It's more than just terminology; in actual practice there's enough evidence to indicate they are taking very different approaches to resolve common I/O problems as

oldergamer suggested. PS5's I/O block, if you think about it, is like a pseudo-DPU (Data Processing Unit); I think that is where Sony got inspiration in designing it. Per Wikipedia:

The data is transmitted to and from the component as multiplexed packets of information. DPUs have the generality and the programmability of central processing units but are specialized[2] to operate efficiently on networking packets, storage requests or analytics requests.[3][4]

A DPU differentiates itself from a CPU by a larger degree of parallelism (required to process lots of requests) and from a GPU by a MIMD architecture rather an SIMD architecture (required as each request needs to make different decisions and follow a different path through the chip)

PS5's approach is conceptually similar, just smaller in scale to the data in the local hardware itself, and targeting streamlining process between the storage and RAM. It's trying to make sure old data in RAM can be replaced with new data from the SSD as quickly as possible and therefore maximize use of that RAM framebuffer. But that's just it; it's focused on boosting the speed throughput as much as possible, so it's relying on beefier hardware to do it. Another reason they need that speed is because when the dedicated processor in the I/O block is addressing RAM, other processor components have to wait their turn; it's still hUMA architecture, after all.

MS's approach is a bit different from that; there's too much pointing now towards some type of asset streaming through the GPU directly with the 100 GB NAND partition on the storage: the mention of ARM cores in the APU by the AMD engineer on their LinkedIn profile, the Dirt 5 dev interview comments, already pre-existing implementations of such things like GPUDirectStorage on Nvidia cards, HBCC, and AMD's SSG card line, and quotes from certain Xbox engineers and Phil Spencer himself as

Major_Key linked here in the thread, etc.

I'm not saying any of these things irrefutably confirm this approach, but they more or less point very strongly towards it. There's also other aspects of MS's approach different from Sony; for example since they still have a fraction of the CPU handling some of the transfer between storage and RAM (rather than 100% offloading that to a dedicated processor like on PS5), it technically means any CPU-bound game logic can still access RAM in the 6 GB pool while the I/O is doing storage/RAM transfer operations. That means CPU-bound game logic doesn't have to "wait its turn" to get back access to the bus during these instances unlike on PS5.

So yeah, they are both effectively going after solving the same problems, but have some big differences in how they approach them.

Lethal01 said:
"We really will need to see exactly how it's implemented before we start claiming it's on the same level as ratchet and clank."

My whole point is it's far to early to claim we know, If the difference is small great.
We just don't have anything right now to prove the XSX is on the same level as the ps5 when it comes to transfer from storage.

The truth is it really might just come down to too many outside factors like specific use-cases, engines, programming etc. I think trying to compare the systems' I/O solutions directly against each other is fruitless because it's getting apparent now that they're taking a bit more of an apples-to-oranges approach between the two in addressing similar problems.

Actually the only thing some of us are looking to do is indicate that XvA will very likely punch above its weight compared to what the paper specs we know so far indicate; seeing some of the other parts of it described by engineers on the team, seeing great analysis of it from people who understand the tech and how it may or may not be applied, exploring use-cases and limitations, previous precedents of similar implementations in other products on the market, interestingly worded statements like the Spencer quote linked on this page in the thread, etc...it's fair to say that will be the case.

It doesn't mean XvA will close the gap with PS5's I/O implementation in obvious areas like raw bandwidth or peak compression data transfer rates, but it's also important to keep in mind MS aren't trying to go about certain things in the I/O stack the same way Sony are, either, and vice-versa. So you can't even directly compare everything between the two as they aren't in the same ballpark WRT how they want to tackle certain issues of I/O bottlenecks.

oldergamer · Jun 25, 2020

MCplayer (Master Chief) said:
I literally don't know what to say to that... but if you prefer to believe in something that wasn't official revealed (if it had, MS would have already milked it, since its revolutionary for consoles and even pc) thats your problem and not even AMD said anything about it.
Anything

That's not a good argument because this entire time MS said they were not going into detail on certain features until a later date. The hotchips conference is where they are talking about velocity architecture. so yeah saying they would have told us already, is a pretty weak argument when you contrast it against what MS has already stated.

Jigsaah · Jun 25, 2020

^ He's right, ya know...

THE:MILKMAN · Jun 25, 2020

oldergamer said:
That's not a good argument because this entire time MS said they were not going into detail on certain features until a later date. The hotchips conference is where they are talking about velocity architecture. so yeah saying they would have told us already, is a pretty weak argument when you contrast it against what MS has already stated.

Have Microsoft confirmed they'll be giving more detailed info about XVA at Hot Chips? I watched the One X Hot Chips talk recently and have to say it only had a fairly basic hardware overview of the SoC and a slightly more in depth, but short, Q&A.

oldergamer · Jun 25, 2020

THE:MILKMAN said:
Have Microsoft confirmed they'll be giving more detailed info about XVA at Hot Chips? I watched the One X Hot Chips talk recently and have to say it only had a fairly basic hardware overview of the SoC and a slightly more in depth, but short, Q&A.

I think it was listed in the agenda for that meeting

Ascend · Jun 25, 2020

MCplayer (Master Chief) said:
I literally don't know what to say to that... but if you prefer to believe in something that wasn't official revealed (if it had, MS would have already milked it, since its revolutionary for consoles and even pc) thats your problem and not even AMD said anything about it.
Anything

Why though? Because what AMD has been saying about HBCC falls directly in line with what MS is saying about the XSX. So really, it's basically putting the puzzles together. The HBCC is likely the controller that is going to take charge of feeding the GPU, and it would do it directly from the SSD and RAM. Combine it with SFS, and I'm starting to get why it can be a more efficient solution.

I mean, look at this;
"Raja Koduri – Chief Architect Radeon Technologies Group, AMD With regards to the High Bandwidth Cache from a gaming perspective;
We looked at all the modern games, the big games that push memory hard, and one of the things we noticed is the VRAM – graphics memory – utilization. We look at how much of the VRAM that the game allocates. So if the game say needs 4GB of memory when we looked at actually how much of that memory is actually used to render pixels we found that many games, actually most games, don’t use more than 50% of what they allocate. That’s because the current/old GPU architecture doesn’t give you flexibility to move memory in fine granularity. So with Vega and with the High Bandwidth Cache and the HBC controller, for games it will utilize the amount of frame-buffer you have much more efficiently. So effectively you can think of it as Vega will be doubling your memory capacity for games.

And compare it to these, stated about the XSX;

As textures have ballooned in size to match 4K displays, efficiency in memory utilisation has got progressively worse - something Microsoft was able to confirm by building in special monitoring hardware into Xbox One X's Scorpio Engine SoC. "From this, we found a game typically accessed at best only one-half to one-third of their allocated pages over long windows of time," says Goossen. "So if a game never had to load pages that are ultimately never actually used, that means a 2-3x multiplier on the effective amount of physical memory, and a 2-3x multiplier on our effective IO performance."

Inside Xbox Series X: the full specs

This is it. After months of teaser trailers, blog posts and even the occasional leak, we can finally reveal firm, hard …

www.eurogamer.net

A component of the Xbox Velocity Architecture, SFS is a feature of the Xbox Series X hardware that allows games to load into memory, with fine granularity, only the portions of textures that the GPU needs for a scene, as it needs it. This enables far better memory utilization for textures, which is important given that every 4K texture consumes 8MB of memory. Because it avoids the wastage of loading into memory the portions of textures that are never needed, it is an effective 2x or 3x (or higher) multiplier on both amount of physical memory and SSD performance.

Defining the Next Generation: An Xbox Series X|S Technology Glossary - Xbox Wire

[Editor’s Note: Updated on 10/21 at 11AM to ensure it is now reflective of the capabilities across both of our next-gen Xbox consoles following the unveil of Xbox Series S.] As we enter a new generation of console gaming with Xbox Series X and Xbox Series S, we’ve made a number of technology...

news.xbox.com

Those statements are basically copies of each other. So basically they are using sampler feedback combined with the high bandwidth cache controller to make the transfer of data more efficient. HBCC didn't do much in games previously, because they didn't have the sampler feedback part of it. With the sampler feedback, the controller can be much more accurate. At least, that's what I'm suspecting.

Dodkrake · Jun 25, 2020

Jigsaah said:
99% of these posts are over my head. 55 pages, is there some form of consensus yet with all the calculus going on here?

It seems like both consoles use different technical philosophies to reach a similar goal. I suppose it's easier to understand how to PS5's SSD should be faster, but the nuances of Xbox Velocity architecture could realistically close the gap presented in the raw numbers.

Am I in the ballpark?

Kinda. It's more akin to "The PS5 has an architecture that's at least over twice as fast, however we are trying to come up with reasons to break the laws of physics to give a fictitious edge to MS".

THE:MILKMAN · Jun 25, 2020

oldergamer said:
I think it was listed in the agenda for that meeting

Just says 'Xbox Series X system architecture' Jeff Andrews and Mark Grossman. It is also the same 30 minute segment as the One X was so I'm not expecting much, if any, new info.

Even Anandtech seems sceptical but hopeful:

The final talk in a very long day is from Microsoft, about the Xbox Series X system architecture. This is likely going to focus on the infrastructure design on the console, the collaboration with AMD on the processor, perhaps some insight into new features we’re going to find in the console, and how the chip is going to drive the next 4-8 years of console gaming. I’m actually a bit 50-50 on this talk, as we’ve had presentations like this at events before (e.g. Qualcomm’s XR) which didn’t actually say anything that wasn’t already announced. There’s the potential here for Microsoft to not say anything new, but I hope that they will go into more detail.

GODbody · Jun 25, 2020

Dodkrake said:
Kinda. It's more akin to "The PS5 has an architecture that's at least over twice as fast, however we are trying to come up with reasons to break the laws of physics to give a fictitious edge to MS".

Can't forget the part where PS5's SSD Bamdwidth advantage gets mentioned every 10 posts like that has some relevance to the topic of this thread.

oldergamer · Jun 25, 2020

Dodkrake said:
Kinda. It's more akin to "The PS5 has an architecture that's at least over twice as fast, however we are trying to come up with reasons to break the laws of physics to give a fictitious edge to MS".

Fanboy reasoning. If for example PS5 is only actually using half of the visible textures its loading every frame. It still needs to travel over the main bus, where that is faster on Xbox. That SSD advantage is lessened with a more optimized architecture.

Not a single person in this fucking thread has "tried to give an advantage to MS", but i think it's safe to say the advantage sony has likely isn't as big as you think.

oldergamer · Jun 25, 2020

We will have to judge by which console has the better looking, or more next gen games

Jigsaah · Jun 25, 2020

Nikana said:
Facts hurt dont they?

Not really, pineapple on pizza is disgusting.

Ascend · Jun 25, 2020

Dodkrake said:
Kinda. It's more akin to "The PS5 has an architecture that's at least over twice as fast, however we are trying to come up with reasons to break the laws of physics to give a fictitious edge to MS".

We are trying to understand what the XSX actually has and what it is doing.

Xbox Velocity Architecture is clearly a marketing name, so that means it doesn't say much at all about what it's doing.
Sampler feedback is a GPU feature, but doesn't necessarily enable effective SFS on its own. It needs something else to allow the streaming part to be efficient.
DirectStorage is an API, but we don't (or didn't) know for which component.

That is all we have to go on right now. Based on all the talks we've had here, we've come quite close to potentially understanding what is going on.
Let me throw my conclusions as of now... Of what is likely. They are still speculation and can still be off. And this is a very short recap.

We have two different compression techniques for the console, because one portion will be able to be used by the GPU directly, and the other one will not and will need to be decompressed by the decompression block and reside in RAM. The GPU can process compressed data, and I conclude that based on this slide, which reinforces the idea that the SSD can transfer data directly to the GPU. Which data is transferred/loaded is controlled by the HBCC, which is programmable by MS, but will not need any programming from game developers ( RedGamingTech explanation video ). DirectStorage is the new API created by MS to properly take advantage of the HBCC. The HBCC with the sampler feedback allows efficient streaming of data that will actually be used based on how the processing of data is scheduled, fetching only the data that is scheduled to be processed. This will save bandwidth, RAM usage and make things easier on the GPU, because guessing which data might be used in the future is no longer necessary. MS gives a figure of an effective 2x-3x bandwidth and RAM size increase of this feature.

And notice that I didn't have to mention the PS5 once in this post (except now), nor in the many of my previous posts, because, I am not comparing for the purpose of beating out the competition (and neither are most of us here), but simply trying to understand what the XSX is doing.

oldergamer · Jun 26, 2020

Ascend said:
We are trying to understand what the XSX actually has and what it is doing.

Xbox Velocity Architecture is clearly a marketing name, so that means it doesn't say much at all about what it's doing.
Sampler feedback is a GPU feature, but doesn't necessarily enable effective SFS on its own. It needs something else to allow the streaming part to be efficient.
DirectStorage is an API, but we don't (or didn't) know for which component.

That is all we have to go on right now. Based on all the talks we've had here, we've come quite close to potentially understanding what is going on.
Let me throw my conclusions as of now... Of what is likely. They are still speculation and can still be off. And this is a very short recap.

We have two different compression techniques for the console, because one portion will be able to be used by the GPU directly, and the other one will not and will need to be decompressed by the decompression block and reside in RAM. The GPU can process compressed data, and I conclude that based on this slide, which reinforces the idea that the SSD can transfer data directly to the GPU. Which data is transferred/loaded is controlled by the HBCC, which is programmable by MS, but will not need any programming from game developers ( RedGamingTech explanation video ). DirectStorage is the new API created by MS to properly take advantage of the HBCC. The HBCC with the sampler feedback allows efficient streaming of data that will actually be used based on how the processing of data is scheduled, fetching only the data that is scheduled to be processed. This will save bandwidth, RAM usage and make things easier on the GPU, because guessing which data might be used in the future is no longer necessary. MS gives a figure of an effective 2x-3x bandwidth and RAM size increase of this feature.

And notice that I didn't have to mention the PS5 once in this post (except now), nor in the many of my previous posts, because, I am not comparing for the purpose of beating out the competition (and neither are most of us here), but simply trying to understand what the XSX is doing.

You are too nice, please add 36% more swearing next time

Trueblakjedi · Jun 26, 2020

MCplayer (Master Chief) said:
Guys... forget HBCC, this is sounding as ridiculous as the "second secret GPU" in the beginning of this current gen.

Microsoft would have already pointed that out if it was the case.

Untrue. this is exactly what August Hotchips session is for, The entire architecture will be detailed.

Mod of War · Jun 26, 2020

Dodkrake said:
Kinda. It's more akin to "The PS5 has an architecture that's at least over twice as fast, however we are trying to come up with reasons to break the laws of physics to give a fictitious edge to MS".

Posts like these in technical threads are unnecessary and only serve to bait users into juvenile console wars theatrics.

Let’s keep the posts more like

Ascend ’s and less antagonizing bait.

Thank you.

Bernkastel · Jul 5, 2020

Xbox Series X Velocity Architecture With Fast SSD Will Make a Difference, Says Dev

gamingbolt.com

When asked about Velocity Architecture and how it would make development easier, he said, “Xbox has told that this technology will unlock new, never-seen before capabilities in game development for consoles. I am aware that this is a marketing jargon, but I am honestly excited to see the Velocity Architecture in action. Together with a fast SSD, it can really make a difference.

Xbox Series X’s Velocity Architecture Will “Greatly Help” Open World Games, Says Developer

gamingbolt.com

“This will greatly help large games – especially open world – because streaming is always an issue to deal with,” he said. “It’s not only about reading from SSD, but also providing the assets for the game. So yes, having hardware-level decompression and asset preprocessing might bring in a very interesting point for the overall smoothness.”

thicc_girls_are_teh_best · Jul 12, 2020

Some interesting speculation on B3D, felt worth sharing.

User function:

Now I'm assuming that the "virtual memory" is storing data as if it were already in, well, memory. So the setup, initialisation and all that is already done and that saves you some time and overhead when accessing from storage compared to, say, loading assets for an SSD on PC. But this virtual memory will need to be accessed via a page table, that then has to go through a Flash Translation Layer. Normally this FTL is handled by the flash controller on the SSD, accessing, if I've got this right, a FTL stored in either an area of flash memory, or in dram on the SSD or on the host system.

XSX has a middling flash controller, and no dram on the SSD. So that should be relatively slow. But apparently it's not (if we optimistically run with the comments so far).

My hypothesis is that for the "100 GB of virtual ram" the main SoC is handling the FTL, doing so more quickly than the middling flash controller with no dram of its own, and storing a 100GB snapshot of the FTL for the current game in an area of system reserved / protected memory to make the process secure for the system and transparent to the game. Because this is a proprietary drive with a custom firmware, MS can access the drive in "raw mode" like way bypassing all kinds of checks and driver overhead that simply couldn't be done on PC, and because it's mostly or totally read access other than during install / patching, data coherency shouldn't be a worry either.

My thought is that this map of physical addresses for the system managed FTL would be created at install time, updated when wear levelling operations or patching take place, and stored perhaps in some kind of meta data file for the install. So you just load it in with the game.

And as for the "100GB" number, well, the amount of reserved memory allocated to the task might be responsible for the arbitrary seeming 100GB figure too.

Best I could find out from Google, on a MS research paper from 2012 (https://static.usenix.org/events/fast12/tech/full_papers/Grupp.pdf), was that they estimated the FTL might be costing about 30 microseconds on latency. Which wouldn't be insignificant if you could improve it somewhat.

So the plus side of this arrangement would be, by my thinking:
- Greatly reduced read latency
- Greatly improved QoS guarantees compared to PC
- No penalty for a dram-less SSD
- A lower cost SSD controller being just as good as a fast one, because it's doing a lot less
- Simplified testing for, and lower requirements from, external add on SSDs

The down sides would be:
- You can only support the SSDs that you specifically make for the system, with your custom driver and custom controller firmware
- Probably some additional use of system reserved dram required (someone else will probably know more!)

and user DSoup:

I can't offer much insight because, as you said, these are thoughts based on a number of vague comments and much of my comments where about the Windows I/O stack which is likely very different than Xbox Series X, but it would indeed be truly amazing if Sony have prioritised raw bandwidth and Microsoft have prioritised latency.

My gut tells me that if this is what has happened that they'll largely cancel each other out except in cases where one scenario favours bandwidth over latency and another favours latency over bandwidth. Nextgen consoles have 16Gb GDDR6 so raw bandwidth is likely to be a preferable in cases where you want to start/load a game quicker, e.g. loading 10Gb in 1.7 seconds at 100ms latency compared to 3.6 seconds at 10ms latency. Where the latency could make a critical difference is frame-to-frame rendering and pulling data off the SSD for the next frame, or the frame after.

The SSDs in both nextgen consoles hugely improve both latency and bandwidth over current gen (and PC) today but it really feels like that no matter what decisions Microsoft and Sony made for this generation and what they have made for next generation, that there is actually only marginal differences between the actual games themselves. Look at launch PlayStation 4 vs Xbox One and the clear disparity in GPU/Compute (18 CUs vs 12 CUs) and how that really didn't make much difference to actual games.

MS have been routinely talking about latency for a while now in almost all aspects of the system design, so it would make sense if addressing latency was also the big goal in XvA and their SSD I/O as well. It also matches up very well with various statements from some MS engineers on Twitter, a few devs (DiRT5 dev, Louise Kirby, etc.), some uncovered patents etc. Also earlier I speculated that part of the 2..5 reserved RAM on Series X would probably be used as a cache for SSD I/O-related data tasks, such as things these two posters speculate on.

Which would just go to show that while MS and Sony's approaches are ultimately attempting to address the same thing, they are very apples-to-oranges in a lot of key ways which makes comparing paper specs in their SSD I/O approaches a case of missing the bigger picture. Basically, assuming these ideas all shake out, the I/O capabilities of both systems will be very similar and much closer than they'll be distant, except in very niche, fringe use-cases that could favor bandwidth over latency, or latency over bandwidth.

oldergamer · Jul 12, 2020

It would be pretty funny that people in this thread were automatically assuming that the solution ms had for the ssd was at a latency disadvantage to what sony created, when it could be possible its the other way around. Seems par for the course if it turns out to be what it is.

Ascend · Jul 12, 2020

thicc_girls_are_teh_best said:
Some interesting speculation on B3D, felt worth sharing.

User function:

and user DSoup:

MS have been routinely talking about latency for a while now in almost all aspects of the system design, so it would make sense if addressing latency was also the big goal in XvA and their SSD I/O as well. It also matches up very well with various statements from some MS engineers on Twitter, a few devs (DiRT5 dev, Louise Kirby, etc.), some uncovered patents etc. Also earlier I speculated that part of the 2..5 reserved RAM on Series X would probably be used as a cache for SSD I/O-related data tasks, such as things these two posters speculate on.

Which would just go to show that while MS and Sony's approaches are ultimately attempting to address the same thing, they are very apples-to-oranges in a lot of key ways which makes comparing paper specs in their SSD I/O approaches a case of missing the bigger picture. Basically, assuming these ideas all shake out, the I/O capabilities of both systems will be very similar and much closer than they'll be distant, except in very niche, fringe use-cases that could favor bandwidth over latency, or latency over bandwidth.

That's really interesting... Especially because we had that interview a while back saying that it's hard to advertise/market for smoothness and low input latency. It would make sense that the XSX was indeed made to reduce latency as much as possible rather than go for raw bandwidth. It also makes sense for the texture filters, since they can use a 'placeholder' and allow the processes to keep moving, avoiding unnecessary stalls...

It would still make me wonder about GDDR6 though, because DDR4 has lower latency all-around. But maybe the drawbacks for DDR4 bandwidth were still too large and they still opted for GDDR6...

Another core difference between DDR4 and GDDR5/6 memory involves the I/O cycles. Just like SATA, DDR4 can only perform one operation (read or write) in one cycle. GDDR5 and GDDR6 can handle input (read) as well as output (write) on the same cycle, essentially doubling the bus width.

What is the Difference Between GDDR5 vs DDR5 vs LPDDR5 Memory?

Modern computers use many different kinds of memory: DDR4, GDDR5, GDDR6, LPDDR4, HBM, etc. While these are all based on DRAM, there are some key differences between them. DDR4 is used in most PCs as the main memory and is the most popular form of DRAM. GDDR5 and GDDR6 are used in graphics cards as …

www.hardwaretimes.com

Support NeoGAF

Xbox Velocity Architecture - 100 GB is instantly accessible by the developer through a custom hardware decompression block

Member

Member

Member

Gold Member

Member

Member

Gold Member

Member

Gold Member

Go Go Neo Rangers!

Gold Member

Go Go Neo Rangers!

Member

Member

Member

Member

Deleted member 775630

Unconfirmed Member

Member

Member

Member

Member

Member

Member

Member

Member

perm warning for starting troll/bait threads

Gold Member

Member

Ask me about my fanboy energy!

Member

Gold Member

Member

Gold Member

Member

Member

Member

Banned

Member

Member

Member

Member

Gold Member

Member

Member

Member

Ω

Ask me about my fanboy energy!

Gold Member

Member

Member

Similar threads