• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Detailed overview of PS3 development station vs. PS3 console

Kleegamefan

K. LEE GAIDEN
Originally posted by Gofreak elsewhere, but I thought it deserved its own thread:

Translated from PC Express weekly:

http://pc.watch.impress.co.jp/docs/...2/kaigai199.htm

SCEI and PS3 development kit schedule announced

Sony Computer Entertainment held their yearly summary meeting for the Playstation, the PlayStation Meeting 2005. In that meeting, new information regarding the Playstation 3 was released.

First of all, the scheduled launch of the PS3 in the spring of 2006 was reconfirmed, and immediately before that, a pre-event called the “Playstation Conference” will be held.

At first, SCEI used the Cell Evaluation System to do software stack validation. This machine was supposedly used as a debugging machine, used in internal company labs, and it was also provided to a select group of vendors for evaluation. It has a 2.4 GHz Cell processor, 256MB XDR DRAM, and an nVidia graphics board.

Next, SCEI developed the much-anticipated “PS3 Evaluation System” for customer evaluation. The machine number is CEB-2030 and the codename is “Cytology.” SCEI has been distributing these machines to software vendors since this spring. The specs of the PS3 Evaluation System will be explained later, but basically it has a 2.4GHz Cell, 512MB of XDR DRAM, and a GeForce 7800 (G70).

In December 2005, SCEI is scheduled to release the “PS3 Reference Tool”, which has nearly the same architecture as the actual PS3. It will have a 3.2 GHz Cell, the RSX, 512MB of XDR DRAM, and a BD drive. Currently, it is set to be a 2U rack mount unit, but vertical configurations are being considered.

SCEI will continue to provide PS3 Evaluation Systems until November. Currently, 450 units have been sent out, and will continue to increase according to the supply figures shown below, to answer the intensive demand for the machine.

August – 200 units
Sept – 300 units
October – 3000 units
November – 3000+ units

CELL and XDR DRAM are 75% of PS3’s capability

The PS3 Evaulation System differs from the final PS3 specs in various ways.


First of all, the Cell operating frequency is 2.4 GHz, which is 75% of the production board. In the case of the CPU, it is not uncommon to hold down the clock speed until validation is completed. Of course, while it is not possible for this machine to perform at the PS3 final spec, the knowledge that it is at 75% [and then compensating for it] should be enough to get by.

The memory is XDR DRAM, and the Cell chip used is connected to the XDR DRAM by the XDR DRAM interface (XIO). This is also not full-spec. At least in June, the XDR DRAM date transfer rate in the PS3 Evaluation System was held to 2.4 Gbps. The PS3’s XDR DRAM data rate will be at 3.2 Gbps, so this is also a 75% capability.

The XDR DRAM data rate drop can be seen as in sync with the CPU clock speed drop. What this shows is the possibility that the Cell CPU core and the XDR DRAM interface were developed at the same time. Simultaneous development is easier, and has other advantages. Particularly, in the case of CPU-memory, latency is a very important factor, so simultaneous development has many advantages.

Most importantly, the XDR DRAM rate may have been dropped to compensate for the yield rate of the new XDR DRAM. It might be difficult to create 3.2 Gbps XDR DRAM samples at this early stage. If we think about the DRAM cell core clock (Internal Column Frequency), 3.2Gbps XDR DRAM is rather difficult. When XDR DRAM mass production for the PS3 begins, it will be moved to a 90nm process, but for now, it is being built on 100-110nm processes, which is bad for yield rates. Additionally, in the PS3 Evaluation System, RIMM (Rambus memory modules) are used. These modules might eat into the timing margins.

The PS3 Evaluation System introduced in this conference has a 512 MB of XDR DRAM. This is twice the 256 MB of the PS3. This increase might be due to the RIMM modules. In June, it was explained that the PS3 Evaluation System was designed to also be able to use RIMM. This large amount of memory is meant for verification [appraisal, testing] purposes.

The XDR DRAM interface is configurable, so it has high flexibility. It is an x16 interface, but is also capable being configured as x8 or x4. XDR DRAM has a point-to-point connection with the Cell chip. For example, by changing from x16 to x8, one channel can support connections with twice as many DRAM chips. The RIMM module takes advantage of this property, allowing one channel to support 2 RIMM while maintaining a point-to-point connection. (Trans. note: by context it is clear that “point-to-point” means a direct connection between two ICs, with no intervening chip in between.) On the other hand, the final PS3 design has the XDR DRAM memory directly integrated into the motherboard.


Currently, the graphics are connected by PCI-Express x4

In the PS3 Evaluation System, the PC-centric GeForce 7800 GTX (G70) is used as a substitute for the RSX. The RSX and G70 are made from about the same shaders, and the internal shader architecture is predicted to be quite similar. Because of that, as far as graphics are concerned, using the G70 as a base for software development should not create many problems. Shader programs should be able to run as if the two chips were the same.

However, the G70 has a lower clock speed than the RSX, and will certainly have some level of performance difference. However, an even greater difference than the internal GPU performance is the interface.

In the PS3, the Cell and RSX are connected by a parallel interface developed by Rambus called FlexIO (Redwood), which has a wide 35GB/sec bandwidth (20GB/sec down, 15GB/sec up). However, the G70, which has a PCI Express x16 interface, cannot be directly connected to the Cell’s FlexIO interface.

Therefore, in the PS3 Evaluation System the G70 is collected to the south bridge by PCI Express. In the June PS3 Evaluation System, they were connected by PCI Express x4. The south bridge used by the PS3 Evaluation System is basically the same as the south bridge developed by IBM for the Cell Workstation. Because of that, the chip has peripheral I/O PCI Express x4 meant for server applications. In the final version, PCI Express will disappear from the south bridge, but currently, the G70 is connected by it.

For that reason, currently the PCI Express x16 interface in the G70 cannot realize its full potential. According to the spec of the south bridge, Cell has only a 5 GB/sec FlexIO interface to the south bridge. If we assume the same is true for the PS3 Evaluation System, it will have drastically less bandwidth than the actual machine. Furthermore, the G70 is connected to the south bridge by PCI Express x4, which, at 2GB/sec, is even less. If we compare Cell->GPU bandwidth, we see that the PS3 Evaluation System is only 1/20 of the PS3.

According to SCEI, in the PS3 Evaluation System, the graphics side has been increased to 512MB of GDDR3 memory. In the actual PS3, there will be 256 MB of GDDR3. The reason for this increase in the video side memory is to allow buffering of data into the graphics side when the bus is idle. However, it will be difficult to use the PS3 Evaluation System to effectively evaluate the wide connection between the Cell and RSX in the PS3.

Additionally, the GDDR3 interface of the RSX is 128bits wide, whereas the G70 is 256bits wide, which means if both use x32 512Mbit DRAM chips, the G70 can support twice as much memory.

The special characteristic of the PS3 is the connection between Cell and RSX

The big special characteristic of PS3 Graphics is the connection between Cell and RSX. The RSX itself has a similar architecture to the G70, but the host interface for the G70 is meant for the PC and is completely different. The G70 uses PCI Express x16 to connect to the chipset as 8GB/sec (4GB/sec one-way), and it cannot directly access main memory. In contrast, the RSX has a 35GB/sec (20GB/sec down, 15GB/sec up) direct connection to the Cell, and can directly render from the main memory on the Cell side.

This is a big difference, because it allows a completely different way of using the GPU from PC architectures, SCEI explained. First of all, because the bus is wider, the Cell can perform a great amount of geometry operations, then send the vertex data [to the RSX]. Conversely, the RSX side can easily send data back to the Cell.

“The Cell processor can do both pre-processing and post-processing. For example, tessellation, dot filling, etc… Cell can perform physics processing like collision and motion calculations, and transform the vertex array.” said David B. Kirk, Chief Scientist of nVidia.

SCEI basically expects higher abstraction levels to be processed by Cell, and the details (like vertexes and pixels) to be processed by the GPU. The is reasonable – for example, in the case where the CPU side handles geometry transformation, collision detection, which is important in games, is not a problem. In the case where the GPU handles geometry transformation, if the data is not sent back to the CPU, clipping issues may occur. In the case of the PS3, the Cell side can perform transformations, and even if the GPU is used for transformation, it is comparatively easy to send the data back to the CPU side.

In architectures up to now, either the CPU or the GPU have been the bottleneck. It this is not resolved, we cannot go any further. To face this, in PS3 architecture, if the GPU becomes the bottleneck, it can shift work to the Cell, if the Cell becomes the bottleneck it can send work to the GPU, shifting the workload. For example, according to the software, the Cell side can perform more graphics processing, or, oppositely, or easily make an adjustment to leave the graphics work to the GPU, it was explained. In summary, between the CPU and GPU programmable processors, a flexible balance adjustment can be done.

In previous PC architectures, because they were limited by the CPU<->GPU pipe, geometry operations were held to a certain limit, and how rich an environment you can create within that limit became the main technical challenge. In contrast, the PlayStation2-type game consoles created large amounts of polygons, but after that it did not have the expressiveness of PCs. (Trans. note: probably means that PS2 is less capable in applying different effects to polygons than the PC, despite pumping out more polygons.) In the case of the PS3, both are possible, with the flexibility to balance the two.

However, in the case of the currently available PS3 Evaluation System, because of restrictions in the architecture, it is not possible to evaluate the balancing [of Cell and RSX]. This is a difficulty and a weakness, but, if we state it differently, software demos on the current systems still do not demonstrate the full potential of PS3. It is possible that the actual PS3 will have performance greater than current demos.

Significantly, when in comes to bus bandwidth, the Xbox 360 CPU-GPU connection is 21.6GB/sec, which is much wider than PCs. A wide-bandwidth CPU-GPU connections in not just the characteristic of PS3 in the next-generation consoles.

PS2’s simple boot-up started with firmware, and it loaded the OS and libraries from the disk. In comparison the PS3 starts from “Haipaabaiza” (Hyper-visor?) firmware. Haipaabaiza is a type of VMM (Virtual Machine Manager) software, which runs not on top but under the OS, providing machine virtualization. Even, when using only the Cell OS for gameplay, Haipaabaiza will always start first, and on top of that runs the pre-defined OS (guest OS). The OS, along with Haipaabaiza, creates a two-layer image. This basic OS layering is the same in the PS3 Evaluation System.
 

llTll

Banned
For that reason, currently the PCI Express x16 interface in the G70 cannot realize its full potential. According to the spec of the south bridge, Cell has only a 5 GB/sec FlexIO interface to the south bridge. If we assume the same is true for the PS3 Evaluation System, it will have drastically less bandwidth than the actual machine. Furthermore, the G70 is connected to the south bridge by PCI Express x4, which, at 2GB/sec, is even less


i didnt understand this... is this bad thing? is he like saying that PS3 wont be that powerful?? can someone explain this is to me in easy way so n00b like me can understand???????
 

kikonawa

Member
the g70 is connected with 2gb/sec on the devkit (current one) instead of the final 22+gb/sec. so its way way slower so ps3 will be much faster.
 

BirdySky

Member
He's saying the ps3 will have 5x to 10x more bandwidth between cpu/gpu than the current evolution kits developers are using.

I think.
 

Doube D

Member
llTll said:
For that reason, currently the PCI Express x16 interface in the G70 cannot realize its full potential. According to the spec of the south bridge, Cell has only a 5 GB/sec FlexIO interface to the south bridge. If we assume the same is true for the PS3 Evaluation System, it will have drastically less bandwidth than the actual machine. Furthermore, the G70 is connected to the south bridge by PCI Express x4, which, at 2GB/sec, is even less


i didnt understand this... is this bad thing? is he like saying that PS3 wont be that powerful?? can someone explain this is to me in easy way so n00b like me can understand???????

umm, no? He is saying that the G70's utilization of the pc express interface prohibits the alpha/beta dev kits from reaching optimal performace (which you would get with the final ps3 dev kit and console).
 

llTll

Banned
if this is true, then what does mean"If we assume the same is true for the PS3 Evaluation System, it will have drastically less bandwidth than the actual machine

doesnt less bandwidth mean bad thing [ or negative thing for the meaning ] ??


or less bandwidth = good thing??

i dont understand
 

Kleegamefan

K. LEE GAIDEN
It means that the current PS3 dev stations use a small fraction(5%) of the bandwidth of the actual PS3 and the PS3 Refrence Tool dev kit that will arrive in December...

The Reference PS3 Tool and the PS3 have a 35GB/sec bi-directional pipe between RSX and CELL.....it is set up in a way that if CELL is idle, it can give graphic assit to RSX and if RSX is idle it can give assistance to CELL....CELL can do some polygon vertex work (along with RSX) and RSX can feed CELL with frame buffer data for post-processing work....these are just a few examples of what they intend to do with the CELL<=>RSX connection..

All the demos you have seen sofar(compared to the actual PS3):

Do not use the full power of the GPU

Run on a CPU that is only 75% as fast

Have Ram that is 75% as fast



Also, these kits only have PCI Express, not FlexIO so the nVidia GF 7800 GTX used in these kits cant use CELL for polygon/pixel assistance because the bus in 1/20th as fast as FlexIO.....the 7800 also cant feed the current CELL with much post processing data either..

Keep all this in mind when viewing PS3 demos made before December....
 

gofreak

GAF's Bob Woodward
Kleegamefan said:
it is set up in a way that if CELL is idle, it can give graphic assit to RSX and if RSX is idle it can give assistance to CELL....CELL can do some polygon vertex work (along with RSX) and RSX can feed CELL with frame buffer data for post-processing work....these are just a few examples of what they intend to do with the CELL<=>RSX connection..

It should be noted for clarity that the "load balancing" between CPU and GPU, or shifting around of work, is not automatic. We'll have to see if they do anything further to accomodate that beyond a large pipe (well we also know they allow the CPU to use the same rounding and cutoff modes as the GPU in order to better facilitate sharing of data between the two). The balance between CPU and GPU will be up to the developer to implement, which is really the only way you could do it since it's up to the developer where a game's priorities lie.
 

llTll

Banned
Kleegamefan said:
It means that the current PS3 dev stations use a small fraction(5%) of the bandwidth of the actual PS3 and the PS3 Refrence Tool dev kit that will arrive in December...

The Reference PS3 Tool and the PS3 have a 35GB/sec bi-directional pipe between RSX and CELL.....it is set up in a way that if CELL is idle, it can give graphic assit to RSX and if RSX is idle it can give assistance to CELL....CELL can do some polygon vertex work (along with RSX) and RSX can feed CELL with frame buffer data for post-processing work....these are just a few examples of what they intend to do with the CELL<=>RSX connection..

All the demos you have seen sofar(compared to the actual PS3):

Do not use the full power of the GPU

Run on a CPU that is only 75% as fast

Have Ram that is 75% as fast



Also, these kits only have PCI Express, not FlexIO so the nVidia GF 7800 GTX used in these kits cant use CELL for polygon/pixel assistance because the bus in 1/20th as fast as FlexIO.....the 7800 also cant feed the current CELL with much post processing data either..

Keep all this in mind when viewing PS3 demos made before December....


thats intresting. so all these games like Lair and gundam are nothing compared to the final ps3 specs???


shit... i dont want to think of it
 

Suikoguy

I whinny my fervor lowly, for his length is not as great as those of the Hylian war stallions
well, if base specs are about 3/4's "there"
and bandwidth is fuxored, then I think running at 2/3 what its capable of is a reasonable estimate.
 

Wunderchu

Member
llTll said:
thats intresting. so all these games like Lair and gundam are nothing compared to the final ps3 specs???
well ... I woudn't say "nothing" .. but final hardware will definitely be more powerful than these pre-final dev. kits, probably in every way (as has been mentioned, CELL will be clocked higher, RSX is more powerful than GeForce 7800 , there will be more bandwidth in the final hardware, etc.) :D
 
great read.

I almost feel overwhelmed with how good graphics/AI/physics/etc will be in a few years. I'm replaying God of War and it's not unreasonable to expect that the CG sceens will be in-game on GoW2 or 3.

I'm not a 2D elitest or anything, but damn we've come a long way since Pitfall (the first game I remember playing)
 
Suikoguy said:
well, if base specs are about 3/4's "there"
and bandwidth is fuxored, then I think running at 2/3 what its capable of is a reasonable estimate.

When you've got less than a 1/10th of the bandwidth, I think it cuts performance by a good bit more than that, assuming you don't compensate and code differently than you would on the target system.

Of course, that's assuming any game being developed right now is anywhere approaching heavy bandwidth utilization.

It's funny that with all of the cutting edge CPU's & GPU's that the interconnects end up being the last parts to be finalized.
 

llTll

Banned
Wunderchu said:
well ... I woudn't say "nothing" .. but final hardware will definitely be more powerful than these pre-final dev. kits, probably in every way (as has been mentioned, CELL will be clocked higher, RSX is more powerful than GeForce 7800 , there will be more bandwidth in the final hardware, etc.) :D


man.... killzone E3 trailer here i come...


well.. not to sound like fanboy for sony or anything but... is it like xbox 360 i am cry?
[ in terms of graphics that is ] ???
 

Wunderchu

Member
llTll said:
man.... killzone E3 trailer here i come...


well.. not to sound like fanboy for sony or anything but... is it like xbox 360 i am cry?
[ in terms of graphics that is ] ???
I think gofreak may have made a good point regarding PS3 vs. Xbox 360 graphics:
gofreak said:
There is a general perception that PS3 is more powerful, and I guess some would like to see that difference manifest itself from the start. And from what's been shown thusfar, they haven't been particularly disappointed, and so people start thinking these things. Though I expect that should change pre-X360's launch, when MS start showing finished goods on final hardware. I would be surprised if MS didn't start closing the gap visually from Monday, but with that said, in the longer term differences may start to manifest themselves. If a developer was heavily focussed on visuals, throwing PS3 as a system into graphics (almost entirely) vs throwing X360 as a system into graphics (almost entirely) could yield large differences (assuming both accomodate decent cpu-gpu communication). I don't expect devs to start doing that with early games though, so looking for differences now may not pay dividends (or at least once MS shows the expected pre-launch).

Differences in other areas may show up a lot quicker..
[source: http://www.ga-forum.com/showthread.php?p=1675757#post1675757 ]
 

Cuth

Member
Additionally, the GDDR3 interface of the RSX is 128bits wide, whereas the G70 is 256bits wide
What's the meaning of this? The GDDR3 bandwidth is double (with the same clock) in the dev kit or what?
 
Wunderchu said:
I think gofreak may have made a good point regarding PS3 vs. Xbox 360 graphics:[source: http://www.ga-forum.com/showthread.php?p=1675757#post1675757 ]
It basically means that its a sure-fire guarantee that the high-budget exclusive games for each respective system will look phenomenal. Using some of the balancing and whatnot outlined in this article, I'm sure Konami will make MGS4 just look breathtaking. Same with Bungie and Halo 3 and special tricks that the X360 can do.
 

ourumov

Member
I was beginning to think in the possibilities of the SPE+ GPU configuration...Actually let's just imagine we want to use EVERYTHING to render our objects.
We have 44 GFLOPS of Vertex Shading power...this could be useful to more or less set the bar when deciding how to use the SPEs in order to stablish a chain of rendering. Knowing this, one idea for instance could be the following.
We use 4 SPEs to do the following (each one gives 25.6 GFLOPS, 2= 51.2~44 !):
SPE1 and SPE2 perform paralel skinning of a known mesh, meaning that they treat differents chunks of 3D data.
When they finish they output their result to SPE3 and SPE4 which do Transform Operations and write the result to the RSX VRAM. There we decide to apply some effect that looks enough good in a per-vertex basis...for instance CelShading...And we use the normal and light to calculate texture coordinates...

This is an scenario I thought and that puts to the table how curious are the VS in the design...Not that they are not useful but they are so slow compared to the SPEs that at the end if you want to use them, SPEs will always be waiting. Of course there are 256 MB of VRAM to do buffering and things like this.

Not that all the operations need the same GFLOPS, actually some of the operations mentioned in this rendering pipeline need more FP power than others...but it's pretty ilustrative :)
I think the best would be to set the VS operations in the final part of the rendering pipeline and not in the beginning...meaning that I think it would be better to use VS for effects than for instance skinning...
 

xexex

Banned
Cuth said:
What's the meaning of this? The GDDR3 bandwidth is double (with the same clock) in the dev kit or what?

basicly it could be, almost double. unless they purposely limited the bandwidth of the dev kits to simulate the bandwidth of PS3.

but this much is for certain: on the normal PC side, the G70 GPU (GeForce 7800 GTX) has a 256-bit bus for the GDDR3 memory. this results in bandwidth of about 38 GB/sec. the GPU-core speed of G70 is 430 MHz. I do not know what the memory clockspeed is, which would determine how much bandwidth GDDR3 over a 256-bit bus would get.

on Playstation3, the RSX which is based on G70 but faster, is getting a 128-bit bus to GDDR3. I do not know the clockspeed offhand for the PS3's GDDR3 memory (RSX core itself is 550 MHz). but we do know the bandwidth of the GDDR3 memory in PS3, it is 22.4 GB/sec, the same as Xbox 360's system memory bandwidth (because both have 128-bit busses to GDDR3 and both are obviously using GDDR3 clocked at the same speed)


edit: the GDDR3 memory is clocked at 700 MHz in both Xbox 360 (512 MB of it) and Playstation3 (256 MB of it). note that PS3 also has 256 MB of Rambus XDR memory, therefore the amount of system memory in the two consoles is equal: 512 MB

(not counting the 10 MB eDRAM that Xbox 360 has attached to the GPU shader core)


p.s. in the final PS3 Dev-Kits and the retail PS3 console, the Cell and RSX are going to have a really tight and interesting relationship, helping each other out :)
 

Kleegamefan

K. LEE GAIDEN
Cuth said:
What's the meaning of this? The GDDR3 bandwidth is double (with the same clock) in the dev kit or what?


Apperently they are doing this to partially offset the performance disadvantage of the GF 7800 GTX vs. RSX...

The GF 7800 GTX has a 256-bit bus *but* it is slower than RSX (by over 100Mhz) and probably has less features too(both the GF 7800 and RSX have ~300M transistors, but RSX has no need for trasistors dedicated to NVIDIA PureVideo technology, so that could net us 30 or 40M more transistors for something RSX *could* use instead....more AA or HDR logic, for example) ...not to mention it cant use FlexIO to its full potential either.....the ability to use XRDRAM as VRAM (via FlexIO) in PS3 more than offsets the 256-bit bus in the GF 7800 GTX IMO.....

Sure, some developers will simply use GDDR3 for VRAM and not touch XRDRAM or Cell for graphic work, but these probably won't be be the best looking games anyway, so the lack of a 256-bus wont matter in the end....

Remember, XRDRAM in PS3 will run at 3.2Ghz, which is more than 4 1/2 times faster than GDDR3, which runs at 700Mhz....

Also note that thanks to FlexIO, it is possible to reverse the RAM configuration....that is... you can have CELL use GDDR3 as main ram and have RSX use ultra-fast XRDRAM as VRAM.........Now, since those pools of ram are logically further away from each other, I am not sure how much latancy this would introduce, but the pros might outweigh the cons....PS3 developers might not even know the answer to this question for sure until the PS3 reference Tools arrive in December, so who knows..

The point is there is alot of enherent flexiblity in the PS3 hardware.....people focus on the raw power of the hardware, but there are other things about it that are very intresting too...


llTll said:
man.... killzone E3 trailer here i come...


well.. not to sound like fanboy for sony or anything but... is it like xbox 360 i am cry?
[ in terms of graphics that is ] ???


Nah....X360 will have some image quality that'll knock your sox off and XeCPU and Xenos/C1 are also set up for some bi-directional action, ala CELL<=>RSX...

The "problem" with XeCPU<=>Xenos/C1 is it looks like it will not have the performance of CELL or RSX or CELL<=>RSX...

First of all, XeCPU<=>Xenos/C1 pipe is only 2/3rds as fast as the same pipe on PS3, plus the fastest non-embedded ram is ~20% as fast as XRDRAM....

Also, when Xenos is accessing XeCPU, it is doing so via the L2 cache vs RSX, which has a dedicated pip to XRDRAM and can also communicate with each SPU in CELL+the 256Kb SRAM of each...

Xenos can "lock" 1/3rd of the L2 cache in XeCPU, giving it a dedicated pipe back to the CPU but again, this cuts off 2/3rd of the cache, meaning that when this is going on, the XeCPU can only use a max of 4 hardware threads instead of 6.....

So you are saying to yourself, "but if I am using 3 SPEs for RSX assistance instead of all 7 SPEs for normal CPU -type stuff, isn't that the same thing?"

Well.....it is, BUT, the CELL has an overall performance advantage over XeCPU....so if you need (just throwing a number out there) say 50 Gflops of calculation power for graphic assistance of both machines, that would be ~1/2 of XeCPU total output, while 50 GFlops is ~ 25% of CELL total.....in other words....since you have more total available hardware power on CELL, this could allow you to make less overall sacrifices in game-related calculations vs XeCPU...

Also, Xenos/C1 can do, it seems, 500M pps (theoreticall max)....RSX is estimated to do 1.18B verticies(theoretical max)....As I understand it, 2 verts=1 poly so we are similar in poly counts on both GPUs....or are we??

In RSX, the 8 Vertex shaders perform that 1.18B number, so that means you also have 20+ pixel shaders available for pixel work......with Xenos/C1, that 500M number is using *ALL* 48 ALUs for vertex work.......no pixel work at all....I am not sure how well XeCPU is set up for pixel assistance to Xenos/C1, but even so, it probably cannot do as much of it as CELL...the point is, you are unlikely to approach that 500M number in Xenos/C1 because you are not going to use 100% poly resources and no pixel resources in any given frame.....with RSX, you can use the performance of all 8 vertex shaders without sacrificing pixel output for any given frame...besides, PS3 is probably not GPU limited anyway, not to mention the RSX vertex theoreticall max is not the absolute ceiling for the PS3 either...you can also do additional vertex work on CELL at the same time...

I think that, overall, the PS3 hardware is quite a bit more powerful than X360...keep in mind though, that PS3 is more "brute force power" than the X360 hardware, which is beautifully efficient, so that will balance things out somewhat....

How big the gap is, ultimately depends on how much of a brute PS3 turns out to be...


*disclaimer* THIS IS ONLY MY TAKE ON THINGS.....I AM ONLY STATING THINGS AS I UNDERSTAND IT AND HOW IT WAS PRESENTED/EXPLAINED TO ME....I AM NOT A CE ENGINEER BUT AM HAPPY TO BE CORRECTED WHERE I AM WRONG AS IT ALLOWS ME TO LEARN MORE */disclaimer*
 

Pimpwerx

Member
ourumov said:
I was beginning to think in the possibilities of the SPE+ GPU configuration...Actually let's just imagine we want to use EVERYTHING to render our objects.
We have 44 GFLOPS of Vertex Shading power...this could be useful to more or less set the bar when deciding how to use the SPEs in order to stablish a chain of rendering. Knowing this, one idea for instance could be the following.
We use 4 SPEs to do the following (each one gives 25.6 GFLOPS, 2= 51.2~44 !):
SPE1 and SPE2 perform paralel skinning of a known mesh, meaning that they treat differents chunks of 3D data.
When they finish they output their result to SPE3 and SPE4 which do Transform Operations and write the result to the RSX VRAM. There we decide to apply some effect that looks enough good in a per-vertex basis...for instance CelShading...And we use the normal and light to calculate texture coordinates...

This is an scenario I thought and that puts to the table how curious are the VS in the design...Not that they are not useful but they are so slow compared to the SPEs that at the end if you want to use them, SPEs will always be waiting. Of course there are 256 MB of VRAM to do buffering and things like this.

Not that all the operations need the same GFLOPS, actually some of the operations mentioned in this rendering pipeline need more FP power than others...but it's pretty ilustrative :)
I think the best would be to set the VS operations in the final part of the rendering pipeline and not in the beginning...meaning that I think it would be better to use VS for effects than for instance skinning...
I always thought VS had to be used at the start of the rendering pipeline. Could you elaborate on what you mean by "effects" at the end of the rendering chain? I think your suggestion is good, but I don't know what you mean with your use of the VS for almost post-processing. I wish I did graphics coding, b/c this is a major weakness of mine, trying to understand shader ops and how they work through a graphics pipe. But what you suggest would keep those VS active, and would explain better why NVidia/Sony went with that level of redundancy.

Oh yeah, I think the dev kit has a 256bit bus to help better approximate the total bandwidth in the PS3. Since the G70 is connected to the southbridge with a paltry 2GB/s bus (according to the Goto diagram), then you're losing 33GB/s of bandwidth. A 256bit bus won't match that, but will at least come closer. And with 512MB of VRAM, it's like addressing XDR+GDDR, so Cell can dump work to GDDR, and the G70 can churn through it. A decent solution given what they have for hardware at present. PEACE.
 

Baron Aloha

A Shining Example
KLeeGamefan said:
Nah....X360 will have some image quality that'll knock your sox off and XeCPU and Xenos/C1 are also set up for some bi-directional action, ala CELL<=>RSX...

The "problem" with XeCPU<=>Xenos/C1 is it looks like it will not have the performance of CELL or RSX or CELL<=>RSX...

First of all, XeCPU<=>Xenos/C1 pipe is only 2/3rds as fast as the same pipe on PS3, plus the fastest non-embedded ram is ~20% as fast as XRDRAM....

Also, when Xenos is accessing XeCPU, it is doing so via the L2 cache vs RSX, which has a dedicated pip to XRDRAM and can also communicate with each SPU in CELL+the 256Kb SRAM of each...

Xenos can "lock" 1/3rd of the L2 cache in XeCPU, giving it a dedicated pipe back to the CPU but again, this cuts off 2/3rd of the cache, meaning that when this is going on, the XeCPU can only use a max of 4 hardware threads instead of 6.....

So you are saying to yourself, "but if I am using 3 SPEs for RSX assistance instead of all 7 SPEs for normal CPU -type stuff, isn't that the same thing?"

Well.....it is, BUT, the CELL has an overall performance advantage over XeCPU....so if you need (just throwing a number out there) say 50 Gflops of calculation power for graphic assistance of both machines, that would be ~1/2 of XeCPU total output, while 50 GFlops is ~ 25% of CELL total.....in other words....since you have more total available hardware power on CELL, this could allow you to make less overall sacrifices in game-related calculations vs XeCPU...

Also, Xenos/C1 can do, it seems, 500M pps (theoreticall max)....RSX is estimated to do 1.18B verticies(theoretical max)....As I understand it, 2 verts=1 poly so we are similar in poly counts on both GPUs....or are we??

In RSX, the 8 Vertex shaders perform that 1.18B number, so that means you also have 20+ pixel shaders available for pixel work......with Xenos/C1, that 500M number is using *ALL* 48 ALUs for vertex work.......no pixel work at all....I am not sure how well XeCPU is set up for pixel assistance to Xenos/C1, but even so, it probably cannot do as much of it as CELL...the point is, you are unlikely to approach that 500M number in Xenos/C1 because you are not going to use 100% poly resources and no pixel resources in any given frame.....with RSX, you can use the performance of all 8 vertex shaders without sacrificing pixel output for any given frame...besides, PS3 is probably not GPU limited anyway, not to mention the RSX vertex theoreticall max is not the absolute ceiling for the PS3 either...you can also do additional vertex work on CELL at the same time...

I think that, overall, the PS3 hardware is quite a bit more powerful than X360...keep in mind though, that PS3 is more "brute force power" than the X360 hardware, which is beautifully efficient, so that will balance things out somewhat....

How big the gap is, ultimately depends on how much of a brute PS3 turns out to be...

Wow. Great post. Very informative. Thanks.
 

ourumov

Member
Pimpwerx said:
I always thought VS had to be used at the start of the rendering pipeline. Could you elaborate on what you mean by "effects" at the end of the rendering chain? I think your suggestion is good, but I don't know what you mean with your use of the VS for almost post-processing. I wish I did graphics coding, b/c this is a major weakness of mine, trying to understand shader ops and how they work through a graphics pipe. But what you suggest would keep those VS active, and would explain better why NVidia/Sony went with that level of redundancy.

Oh yeah, I think the dev kit has a 256bit bus to help better approximate the total bandwidth in the PS3. Since the G70 is connected to the southbridge with a paltry 2GB/s bus (according to the Goto diagram), then you're losing 33GB/s of bandwidth. A 256bit bus won't match that, but will at least come closer. And with 512MB of VRAM, it's like addressing XDR+GDDR, so Cell can dump work to GDDR, and the G70 can churn through it. A decent solution given what they have for hardware at present. PEACE.


Well in a PC-centric architecture you'd ideally start doing VS operations first and then you would jump to PS ones, yes...
I was refering to the end of the VS pipeline. In my mind the SPEs power will be mainly used in per-vertex operations when doing graphical tasks...And although the CELL design allows for the SPEs to operate over data that the GPU has already processed, I see it much more logical to start those vertex operations on them, to later use the VS of the RSX.
Just my personal vision...
 

Wunderchu

Member
Kleegamefan said:
Apperently they are doing this to partially offset the performance disadvantage of the GF 7800 GTX vs. RSX...

The GF 7800 GTX has a 256-bit bus *but* it is slower than RSX (by over 100Mhz) and probably has less features too(both the GF 7800 and RSX have ~300M transistors, but RSX has no need for trasistors dedicated to NVIDIA PureVideo technology, so that could net us 30 or 40M more transistors for something RSX *could* use instead....more AA or HDR logic, for example) ...not to mention it cant use FlexIO to its full potential either.....the ability to use XRDRAM as VRAM (via FlexIO) in PS3 more than offsets the 256-bit bus in the GF 7800 GTX IMO.....

Sure, some developers will simply use GDDR3 for VRAM and not touch XRDRAM or Cell for graphic work, but these probably won't be be the best looking games anyway, so the lack of a 256-bus wont matter in the end....

Remember, XRDRAM in PS3 will run at 3.2Ghz, which is more than 4 1/2 times faster than GDDR3, which runs at 700Mhz....

Also note that thanks to FlexIO, it is possible to reverse the RAM configuration....that is... you can have CELL use GDDR3 as main ram and have RSX use ultra-fast XRDRAM as VRAM.........Now, since those pools of ram are logically further away from each other, I am not sure how much latancy this would introduce, but the pros might outweigh the cons....PS3 developers might not even know the answer to this question for sure until the PS3 reference Tools arrive in December, so who knows..

The point is there is alot of enherent flexiblity in the PS3 hardware.....people focus on the raw power of the hardware, but there are other things about it that are very intresting too...





Nah....X360 will have some image quality that'll knock your sox off and XeCPU and Xenos/C1 are also set up for some bi-directional action, ala CELL<=>RSX...

The "problem" with XeCPU<=>Xenos/C1 is it looks like it will not have the performance of CELL or RSX or CELL<=>RSX...

First of all, XeCPU<=>Xenos/C1 pipe is only 2/3rds as fast as the same pipe on PS3, plus the fastest non-embedded ram is ~20% as fast as XRDRAM....

Also, when Xenos is accessing XeCPU, it is doing so via the L2 cache vs RSX, which has a dedicated pip to XRDRAM and can also communicate with each SPU in CELL+the 256Kb SRAM of each...

Xenos can "lock" 1/3rd of the L2 cache in XeCPU, giving it a dedicated pipe back to the CPU but again, this cuts off 2/3rd of the cache, meaning that when this is going on, the XeCPU can only use a max of 4 hardware threads instead of 6.....

So you are saying to yourself, "but if I am using 3 SPEs for RSX assistance instead of all 7 SPEs for normal CPU -type stuff, isn't that the same thing?"

Well.....it is, BUT, the CELL has an overall performance advantage over XeCPU....so if you need (just throwing a number out there) say 50 Gflops of calculation power for graphic assistance of both machines, that would be ~1/2 of XeCPU total output, while 50 GFlops is ~ 25% of CELL total.....in other words....since you have more total available hardware power on CELL, this could allow you to make less overall sacrifices in game-related calculations vs XeCPU...

Also, Xenos/C1 can do, it seems, 500M pps (theoreticall max)....RSX is estimated to do 1.18B verticies(theoretical max)....As I understand it, 2 verts=1 poly so we are similar in poly counts on both GPUs....or are we??

In RSX, the 8 Vertex shaders perform that 1.18B number, so that means you also have 20+ pixel shaders available for pixel work......with Xenos/C1, that 500M number is using *ALL* 48 ALUs for vertex work.......no pixel work at all....I am not sure how well XeCPU is set up for pixel assistance to Xenos/C1, but even so, it probably cannot do as much of it as CELL...the point is, you are unlikely to approach that 500M number in Xenos/C1 because you are not going to use 100% poly resources and no pixel resources in any given frame.....with RSX, you can use the performance of all 8 vertex shaders without sacrificing pixel output for any given frame...besides, PS3 is probably not GPU limited anyway, not to mention the RSX vertex theoreticall max is not the absolute ceiling for the PS3 either...you can also do additional vertex work on CELL at the same time...

I think that, overall, the PS3 hardware is quite a bit more powerful than X360...keep in mind though, that PS3 is more "brute force power" than the X360 hardware, which is beautifully efficient, so that will balance things out somewhat....

How big the gap is, ultimately depends on how much of a brute PS3 turns out to be...


*disclaimer* THIS IS ONLY MY TAKE ON THINGS.....I AM ONLY STATING THINGS AS I UNDERSTAND IT AND HOW IT WAS PRESENTED/EXPLAINED TO ME....I AM NOT A CE ENGINEER BUT AM HAPPY TO BE CORRECTED WHERE I AM WRONG AS IT ALLOWS ME TO LEARN MORE */disclaimer*
thanx for that post, Kleegamefan .. IMO, it's always nice to read why a person feels the way they do :) .. although I believe some of the info. you present are not entirely correct .. for example, from the way I am reading your post, it seems to me that you are stating that RSX 's 8 vertex shaders will offer more vertex shading power than Xenos, even if Xenos dedicates all 48 ALUs to vertex shading.... there is no way this is true, AFAIK.... if all of Xenos' ALUs are dedicated to vertex shading, it will most definitely be capable of more vertex shading processing than RSX.. However, as you state, if this is done, Xenos has no more ALUs left for pixel shading ... and, it does indeed seem that CELL has quite a bit of potential for vertex shading, significantly moreso than XeCPU.. so, in the end, RSX+CELL likely has more vertex shading capability than Xenos, and as you say, this still leaves RSX with all of it's pixel shader pipes to do pixel shading...
 

sangreal

Member
The 500M polygons figure for xenos is a hard limit of the triangle setup engine and is unrelated to the theoretical peak vertex processing of the 48 ALUs, from what I understand. Also, the 500M polygons is meant to be achievable with "non-trivial" shaders. I've read an estimate (by ERP @ b3d, I believe) that it would be achievable with xbox1 level shaders.
 

ourumov

Member
The 500M polygons figure for xenos is a hard limit of the triangle setup engine

AFAIK Setup performance refers to the maximum number of polys that the GPU as a rasterizer is able to represent...500M doesn't sound too hot...Are you sure of this ?
 

sangreal

Member
ourumov said:
AFAIK Setup performance refers to the maximum number of polys that the GPU as a rasterizer is able to represent...500M doesn't sound too hot...Are you sure of this ?

I'm definitely no expert when it comes to 3d hardware, thats just the understanding I got from reading a bit about it today.

So no, I'm not sure of anything :)
 

Cuth

Member
Kleegamefan said:
Apperently they are doing this to partially offset the performance disadvantage of the GF 7800 GTX vs. RSX...

The GF 7800 GTX has a 256-bit bus *but* it is slower than RSX (by over 100Mhz) and probably has less features too(both the GF 7800 and RSX have ~300M transistors, but RSX has no need for trasistors dedicated to NVIDIA PureVideo technology, so that could net us 30 or 40M more transistors for something RSX *could* use instead....more AA or HDR logic, for example) ...not to mention it cant use FlexIO to its full potential either.....the ability to use XRDRAM as VRAM (via FlexIO) in PS3 more than offsets the 256-bit bus in the GF 7800 GTX IMO.....

Sure, some developers will simply use GDDR3 for VRAM and not touch XRDRAM or Cell for graphic work, but these probably won't be be the best looking games anyway, so the lack of a 256-bus wont matter in the end....
I think the same. :)

Remember, XRDRAM in PS3 will run at 3.2Ghz, which is more than 4 1/2 times faster than GDDR3, which runs at 700Mhz....
[...]
First of all, XeCPU<=>Xenos/C1 pipe is only 2/3rds as fast as the same pipe on PS3, plus the fastest non-embedded ram is ~20% as fast as XRDRAM....
You should use both the clock and the bus width when comparing memory performances. Better yet, bandwidth (clock * bus) and latency.
Your words seems to mean that GDDR3 is waaay underpowered compared to XDR, because you just talk about the clock... I don't know exactly about latency (AFAIK XDR should be better than GDDR in that), but the bandwidth is almost the same: GDDR3 = 22.4GB/s, XDR = 25.6GB/s

Xenos can "lock" 1/3rd of the L2 cache in XeCPU, giving it a dedicated pipe back to the CPU but again, this cuts off 2/3rd of the cache, meaning that when this is going on, the XeCPU can only use a max of 4 hardware threads instead of 6.....
Are you sure about this?

Also, Xenos/C1 can do, it seems, 500M pps (theoreticall max)....RSX is estimated to do 1.18B verticies(theoretical max)....As I understand it, 2 verts=1 poly so we are similar in poly counts on both GPUs....or are we??

In RSX, the 8 Vertex shaders perform that 1.18B number, so that means you also have 20+ pixel shaders available for pixel work......with Xenos/C1, that 500M number is using *ALL* 48 ALUs for vertex work.......no pixel work at all....I am not sure how well XeCPU is set up for pixel assistance to Xenos/C1, but even so, it probably cannot do as much of it as CELL...the point is, you are unlikely to approach that 500M number in Xenos/C1 because you are not going to use 100% poly resources and no pixel resources in any given frame.....with RSX, you can use the performance of all 8 vertex shaders without sacrificing pixel output for any given frame...besides, PS3 is probably not GPU limited anyway, not to mention the RSX vertex theoreticall max is not the absolute ceiling for the PS3 either...you can also do additional vertex work on CELL at the same time...
Again... :D
Like the previous message, you're using the triangle setup limit of Xenos as it was the vertex shader limit... They're two different things.

To create a 3d image, the vertex coordinates need to be calculated first in the 3d space, and that's made by vertex shaders and CPU. Then the 3d coordinates are transformed in a 2d space (since the frame buffer and monitors are 2d) and the triangle setup hardware takes care of that.

The Xenos shaders can elaborate something like 6 billion vertexes (this using all the shaders for that. Obviously we won't see that in a real application, but even with 1/3 or 1/2 of the shaders used for vertexes, I'd say the value (2 and 3 billion) it's not bad at all)

Now, the triangle setup limit of Xenos is 500 mpps, so Xbox 360 will never be able to display more than 500 mpps on screen, but the vertex shader power is higher than that, so the bottom line is: it can go near the 500 mpps limit not only theoretically, but even in a real application and using several vertex operation for every vertex (and using 1/2 or 2/3 of the shaders for pixels).

Do you know if the 1.18 Bpps "theoretical max" of RSX is related to the triangle setup or to vertex shaders?
In the latter case, the real world performances (I mean in a game) surely are way lower.
(EDIT: just noticed you said that it's the vertex shaders limit :) )
 

j^aws

Member
500 million vertices/sec is Xenos triangle setup limit.

860 million vertices/sec is G70 triangle setup limit.

*IF* G70 @ 430 MHz is extrapolated to 550 MHz for RSX, then,

1100 million vertices/sec will be the triangle setup limit for RSX.

Xenos' ALUs can help it approach the 500 Mvert/sec limit. CELLs SPUs *should* help approach the 1100 Mvert/sec. However, available system memory will likely be the limiting factor...
 

Kleegamefan

K. LEE GAIDEN
Thanks for the feedback guys:)

I was stating what I knew to the best of my knowledge but it seems I have learned a few things today....thatx :D


BTW J^ws....can you help me get on the B3D console forum??

I only want to lurk but my password doesn't work anymore and *ray charles* I CANT SEE SHIT*/ray charles*

Thanx :)
 

j^aws

Member
Kleegamefan said:
...
BTW J^ws....can you help me get on the B3D console forum??

I only want to lurk but my password doesn't work anymore and *ray charles* I CANT SEE SHIT*/ray charles*

Thanx :)

aga.jpg


That's right. There is no forum.

Upgraded visors will not suffice.

Anyway, I've asked...if I hear anything, check your PM but you'll probably need to re-register... ;)
 

gofreak

GAF's Bob Woodward
Kleegamefan said:
Also note that thanks to FlexIO, it is possible to reverse the RAM configuration....that is... you can have CELL use GDDR3 as main ram and have RSX use ultra-fast XRDRAM as VRAM.........Now, since those pools of ram are logically further away from each other, I am not sure how much latancy this would introduce, but the pros might outweigh the cons....PS3 developers might not even know the answer to this question for sure until the PS3 reference Tools arrive in December, so who knows..

Can't say for sure, of course, but the GPU may be less latency-sensitive than the CPU. On the other hand, I think you'll want to limit Cell's access to its own XDR memory for most things.

Kleegamefan said:
Also, when Xenos is accessing XeCPU, it is doing so via the L2 cache vs RSX, which has a dedicated pip to XRDRAM and can also communicate with each SPU in CELL+the 256Kb SRAM of each...

RSX doesn't have a dedicated pipe to XDR - it goes through Cell (or to be more accurate, the Element Interconnect Bus) to get to XDR. RSX->FlexIO->Cell (EIB)->XDR. When RSX is accessing the SPUs' local sram or XDR, it's through FlexIO.
 

xexex

Banned
j^aws said:
500 million vertices/sec is Xenos triangle setup limit.

860 million vertices/sec is G70 triangle setup limit.

*IF* G70 @ 430 MHz is extrapolated to 550 MHz for RSX, then,

1100 million vertices/sec will be the triangle setup limit for RSX.

Xenos' ALUs can help it approach the 500 Mvert/sec limit. CELLs SPUs *should* help approach the 1100 Mvert/sec. However, available system memory will likely be the limiting factor...


correction: Xenos triangle setup limit is 500 million triangles / polygons per second, or 1500 million vertices per second


if RSX' triangle setup limit is 1100 million vertices per second @ 550 MHz, then its 366 million triangles/sec.

now I realize that I could be totally wrong in what I just posted. but that's how I *think* it is, or there abouts. corrections to my post are welcome :)

in both cases, you can actually have almost 1:1 ratio of polygons to vertices so both Xenos and RSX could be over 1 billion. but still somewhat more for Xenos (1.5B vs 1.1B)
 

gofreak

GAF's Bob Woodward
xexex said:
correction: Xenos triangle setup limit is 500 million triangles / polygons per second, or 1.5 billion vertices per second

Triangles can = vertices. You could have 500m triangles out of ~500m vertices. It's likely that's how they're counting it.

The ~1bn figure for RSX is likely the peak transform rate for vertices, not the setup rate. Akin to the multi-billion peak figure for Xenos if you spent all your resources transforming vertices.

Usually the triangle setup figure is synched with the clockrate - usually 1 per clock (which is where 500m for Xenos comes from). Assuming the same for RSX, it'd be 550m. Although I don't think Nvidia disclosed the triangle setup rate for G70, except to say they had improved its efficiency a lot over the N40.

edit - actually, I can't remember if typically gpus used to transform 1 vertex per clock or setup one triangle per clock. So take the last paragraph above with a grain of salt for now.
 

Vince

Banned
xexex said:
correction: Xenos triangle setup limit is 500 million triangles / polygons per second, or 1500 million vertices per second


if RSX' triangle setup limit is 1100 million vertices per second @ 550 MHz, then its 366 million triangles/sec.

now I realize that I could be totally wrong in what I just posted. but that's how I *think* it is, or there abouts. corrections to my post are welcome :)

in both cases, you can actually have almost 1:1 ratio of polygons to vertices so both Xenos and RSX could be over 1 billion. but still somewhat more for Xenos (1.5B vs 1.1B)

Regardless, you're hyperinflating ATI's PR number in a way which is incorrect. As Gofreak correctly stated, [1 triangle] = [1 Vertice] in these comparisons. For example, ATI routinely states traingle rates, but they allways are in a 1:1 correspondance to the clock rate, which is indicative of having the logic to set-up one vertice a cycle. There is no 3X inflation, that's just wrong dude.

The R500 can set-up one per clock, the G70 can do 2. This yeilds 500MVertices/sec for the R500 and 1.1BVertices/sec for the RSX. Correct me if wrong...

edit - actually, I can't remember if typically gpus used to transform 1 vertex per clock or setup one triangle per clock. So take the last paragraph above with a grain of salt for now.

Shit, now I'm doubting myself, so removed my paragraph :) I'm not sure, the DDA works on triangles... Where's Faf or nAo? heh.

I'm still thinking the more proper metric is in vertices per unit time as the hardwired functionality can transform/output a fixed N number of vertices per clock, where as triangle rates are variable and situational dependent on the connectivity of the mesh. Although, if you're evaluating triangles -- which are variable, are they independently listed, in a mesh or fan -- I suppose additional thigs factor in such as caching the connectivity data to exploit spatial or temporal locality factor in. But, I don't know so I default to the above two...
 

j^aws

Member
xexex said:
correction: Xenos triangle setup limit is 500 million triangles / polygons per second, or 1500 million vertices per second

Nope. They're counting 1 poly = 1 vertices. So 500 MPolys/sec = 500 MVertices/sec.

Check the 'leaked text/diagram' from last year or the 'recent leak'...

xexex said:
if RSX' triangle setup limit is 1100 million vertices per second @ 550 MHz, then its 366 million triangles/sec.

now I realize that I could be totally wrong in what I just posted. but that's how I *think* it is, or there abouts. corrections to my post are welcome :)

Nope. You keep dividing by '3', which doesn't represent a scene. What you're effectively doing is saying that every poly/triangle is an 'orphan' and it doesn't represent a mesh...

xexex said:
in both cases, you can actually have almost 1:1 ratio of polygons to vertices so both Xenos and RSX could be over 1 billion. but still somewhat more for Xenos (1.5B vs 1.1B)

Nope. My original statement still stands.

500 MVertices/sec = Xenos
860 MVertices/sec = G70
1100 MVertices/sec = RSX
 

Pimpwerx

Member
Triangle setup rate for the G70 is 2 tris/clock or 860Mpps for the 430MHz part according to NVidia's spec sheet.

Extrapolate that for a 550MHz RSX, and that's 1.1Bpps. Don't bet on seeing that in a game...EVER. ;) PEACE.

EDIT: Why does that spec sheet say 24 pixels/clock and a fillrate of 10.32GP? I thought the G70 only had 16ROPs. Is the RSX looking at a fillrate of 13.2GP? I know some said the number of ROPs would be reduced due to bandwidth, but that's not really the case if you take the aggregate. That spec sheet clearly shows 24ROPs. Thoughts?
 

Wunderchu

Member
Pimpwerx said:
Triangle setup rate for the G70 is 2 tris/clock or 860Mpps for the 430MHz part according to NVidia's spec sheet.

Extrapolate that for a 550MHz RSX, and that's 1.1Bpps. Don't bet on seeing that in a game...EVER. ;) PEACE.

EDIT: Why does that spec sheet say 24 pixels/clock and a fillrate of 10.32GP? I thought the G70 only had 16ROPs. Is the RSX looking at a fillrate of 13.2GP? I know some said the number of ROPs would be reduced due to bandwidth, but that's not really the case if you take the aggregate. That spec sheet clearly shows 24ROPs. Thoughts?
hm.. interesting
 

j^aws

Member
Pimpwerx said:
Triangle setup rate for the G70 is 2 tris/clock or 860Mpps for the 430MHz part according to NVidia's spec sheet.

Extrapolate that for a 550MHz RSX, and that's 1.1Bpps. Don't bet on seeing that in a game...EVER. ;) PEACE.

EDIT: Why does that spec sheet say 24 pixels/clock and a fillrate of 10.32GP? I thought the G70 only had 16ROPs. Is the RSX looking at a fillrate of 13.2GP? I know some said the number of ROPs would be reduced due to bandwidth, but that's not really the case if you take the aggregate. That spec sheet clearly shows 24ROPs. Thoughts?

G70 has 16 ROPs not 24 ROPs.

http://www.beyond3d.com/misc/chipco...d=106&orderby=release_date&order=Order&cname=

It's fillrate is 6.88 GPixels/sec not 10.32 Gpixels/sec. I presume the above is a typo because it's assuming the same number of ROPs as 24 fragment pipes...
 
Top Bottom