• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

360 geek stuff. Someone please explain...

cyberheater

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4
From someone at IGN forums.

conference attendee posted:
Hardware Specs

Triple-Core 3.2 Ghz custom CPU
- shared 1MB L2 Cache
- customized vector floating point unit per core
- 5.4 Gps FSB: 10.8 GB/s read and 10.8 GB/s write
** GPU can read from L2
500 Mghz custom GPU
- 48 parallel unified shaders
- 10 MB embedded DRAM for fram buffer: 156GB/sec
512 Meg unified memory (700MGHZ GDDR3: 22.4 GB/s)
12x Dual layer DVD
20 GB Hard drive
High Def video out.

System Block Diagram (I'm just going to list the stuff hanging off the IO chip)
DVD (SATA)
HDD (SATA)
Front Controllers (2 USB)
Wireless Controllers
MU ports (2 USB)
Rear Panel USB
Ethernet
IR
Audio Out
FLASH
System Control
Video Out (hanging off of a separate Analog chip)

CPU: PPC Core Specs
* 3 3.2 Ghz PowerPC cores
* Shared 1 MB L2 cache, 8-way associative
* Per-Core features
- 2 issue per cycle, in-order, decoupled vector/scalar issue queue
- 2 symmetric fine grain hardware threads
- L1 Caches: 32K 2-way I$ / 32K 4-way D$
- Execution Pipelines
-- Branch Unit, Integer Unit, Load/Store Unit
-- VMX 128 Units: Floating Point Unit, Permute Unit, Simple Unit
-- Scalar FPU
* VMX128 enchanced for game and graphics workload
-- all execution units 4-way SIMD
-- 128 128-bit vector registers per thread
-- custom dot-product instruction
-- native D3D compressed data formats

CPU Data Streaming Specs
* High bandwidth data streaming support with minimal cache thrashing
- 128B cache line size (all cache)
- Flexible set locking in L2
- Write streaming:
* L1s are write through, writes do not allocate in L1
* 4 uncacheable write gathering buffers per core
* 8 cacheable, non-sequential write gathering buffers per core
- Read Streaming:
* xDCBT data prefetch aroudn L2, directly into L1
* 8 outstanding load/prefetches per core
- Tight GPU data streaming integration (XPS)
* XPS -- "Xbox Procedural Synthesis"
* GPU 128B read from L2
* GPU low latency cacheable writebacks to CPU
* GPU shared D#D compressed data formats with CPU => at least 2x effective bus bandwidth for typical graphics data.

GPU Specs
* 500 MGhz graphics processor
- 48 parallel shader cores (ALUs)l dynamically schedulted 32bit IEEE FLP
- 24 billion shader instructions per second
* (super scalar design; scalar and texture ops per instruction)
- Pixel fillrate: 4 billion pixels/sec (8 per cycle); 2x for depth / stencil only
* AA: 16 billion samples/sec; 2x for depth / stencil only
- Geometry rate: 500 million triangles/sec
- Texture rate: 8 billion bilinear samples / sec
* 10 MB EDRAM -> 256 GB/s fill
* Direct3d 9.0 Compatible
- High level Shader Language (HLSL) 3.0+ support
* Custom features
- Memory export; Particle physics, subdivision surfaces
- Tiling acceleration: full resolution Hi-Z, Predicated Primitives
- XPS:
* CPU cores can be slaved to GPU processing
* GPU reads geometry data directly from L2
- Hardware scaling for display resolution matching


Architectural Choices
* FSAA, alpha and z place heavy load on memory BW
* Post-process effects require large depth complexity
* Enable flexible UMA solution
* Main Memory FB/ZB => unpredictable performance
* Solution: take FB/ZB fill-rate out of the equation

Software
* SMP/SMT
- Mainstream techniques
- Everything is simplified by being symmetric
* UMA
- No partitioning headaches
* OS
- All 3 cores available for game developers
* Standard APIs
- Win32, OpenMP
- Direct3d, HLSL
- Assembly (CPU & Shader) supported - direct hardware access
* Standard tools
- XNA; PIX, XACT
- Visual C++, works with multiple threads

I didn't know that the 360 can slave geometry across the CPU. Is that a big deal?

And.

The Xbox 360 CPU (Xenon) and Xbox 360 GPU (Xenos) both natively understand Direct3D Compressed data formats, just like they understand typical "double" and "floats", etc. There's zero performance hit in using these compressed formats due to the custom circuitry employed, and it effectively halves the bandwidth requirements for sending graphics data back/forth between the CPU and GPU. The official claim is the compressed D3D formats provide the equivalent of 20GB/s more bandwidth for "free".

Sounds good to me but i'm not hot on hardware...

Was this know or is this new? And what are the implications?
 

cyberheater

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4
And

conference attendee posted:
Xbox 360 talk, explaining design decisions and System Architecture

by Jeff Andrews and Nick Baker

(I'll post slides in a week if they are not published on the net by then)


* All games are required to support 720p (I think I heard that right)

-----
CPU
-----

* Most of the time was spent enhancing the VMX-128 units fro graphics purposes.

* XPS -- small amount of read data to generate lots and lots of geometry. (used as a "decompression" algorithm)

* GPU write back to CPU is to indicate that the GPU is done reading data.

* D3D compressed data formats were customized into both the VMX units and into the GPU.

* prefetching reads can go into L1 and skip L2. Writes can skip L1 and go to L2 (this is to avoid thrashing)

* Claim: the compressed D3D effectively adds an extra 20GB/s bandwidth.

-----
GPU
-----

* The added EDRAM allows Main memory to be dedicated to texture and vertex (read only) This makes things easier for main memory.
 

Blimblim

The Inside Track
Nothing really new. IMHO it means that developers will take a long time to really get the maximum out of the Xbox 360 architecture. If they use it like Xbox and just code for Direct 3D then the games certainly won't be able to use some of the finest points of the architecture.
 

gofreak

GAF's Bob Woodward
cyberheater said:
I didn't know that the 360 can slave geometry across the CPU. Is that a big deal?

Nah, CPUs have been doing geometry in consoles foreva. Any CPU can do it really, it's a better use for most CPUs on the graphics front compared to the other end of the pipeline (rasterisation). Even if you do TnL on the GPU, CPUs still do a good bit of geometric work (with collision detection etc). The cache synching is if you're going to use the CPU for vertex processing, but we've known about that for some time now.
 

cyberheater

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4
Yes but isn't the big deal that the GPU can pick up geometry data straight from the L2 cache rather then slower main memory.
 

COCKLES

being watched
It's certainly a step up from the 'plonk a cut down PC into a black box' of the Xbox.

But MS could take a tip from Sony - while the hardware is impressive, you gotta get the buzzwords down.

3D-Ynamic Engine!
Triple Whammy Processing!
Roots Engine - We got dat processor slaved out!
 

gofreak

GAF's Bob Woodward
cyberheater said:
Yes but isn't the big deal that the GPU can pick up geometry data straight from the L2 cache rather then slower main memory.

No, the L2 cache would be as far away from the GPU as main memory really, if not even a little more. What the GPU can do is basically tell the CPU when it's ready for more, and data can be sent then directly from the cache, which is about synchronisation more than read-speed.
 

Kleegamefan

K. LEE GAIDEN
I didn't know that the 360 can slave geometry across the CPU. Is that a big deal?


Apparently, Xenos/C1 can lock 1/3 of the 1MB L2 cache for main ram-GPU streaming via a process known as MEMEXPORT (pushing/pulling vectorised data directly to and from system RAM)...this bi-directional CPU<=>GPU action is one of the things that define the next generation of consoles, IMO (and since they have so much more Memory BW than PCs, which are limited to the PCI-Express bus, this sets them apart from them too)...

When doing this, you get a bi-directional pipe between the XeCPU and Xenos/C1 (10.GB/sec in each direction) here is a little more about it:

http://www.beyond3d.com/articles/xenos/index.php?p=03

As the CPU is going to be using Xenos to handle all its memory transfers, the connection between the two has 10.8GB/s of bandwidth both upstream and downstream simultaneously. Additionally the Xenos graphics processor is able to directly lock the cache of the CPU in order to retrieve data directly from it without it having to go to system memory beforehand. The purpose of this is that one (or more, if wanted) of the three CPU cores could be generating very high levels of geometry that the developer doesn't want to, or can't, preserve in the memory footprints available on the system when in use. High-resolution dynamic geometry such as grass, leaves, hair, particles, water droplets and explosion effects are all examples of one type of scenario that the cache locking may be used in.

bandwidths.gif


It seems this will bring about a lot of flexibility in the X360 hardware....it seems to be really awesome in this regard....

The downside is the fact when you lock 1/3rd of the L2 for MEMEXPORT functionality, you only have access to 2 of the 3 PPC CPUs (4 instead of 6 hardware threads) so that is one possible downside of having both XeCPU and Xenos/C1 all fighting for just 1MB of L2 cache...


The PLAYSTATION 3 architecture is also designed around CPU<=>GPU functionality but there are a few key differences:


kaigai02l.gif


Here is a translated article:

The big special characteristic of PS3 Graphics is the connection between Cell and RSX. The RSX itself has a similar architecture to the G70, but the host interface for the G70 is meant for the PC and is completely different. The G70 uses PCI Express x16 to connect to the chipset as 8GB/sec (4GB/sec one-way), and it cannot directly access main memory. In contrast, the RSX has a 35GB/sec (20GB/sec down, 15GB/sec up) direct connection to the Cell, and can directly render from the main memory on the Cell side.

This is a big difference, because it allows a completely different way of using the GPU from PC architectures, SCEI explained. First of all, because the bus is wider, the Cell can perform a great amount of geometry operations, then send the vertex data [to the RSX]. Conversely, the RSX side can easily send data back to the Cell.

“The Cell processor can do both pre-processing and post-processing. For example, tessellation, dot filling, etc… Cell can perform physics processing like collision and motion calculations, and transform the vertex array.” said David B. Kirk, Chief Scientist of nVidia.

SCEI basically expects higher abstraction levels to be processed by Cell, and the details (like vertexes and pixels) to be processed by the GPU. The is reasonable – for example, in the case where the CPU side handles geometry transformation, collision detection, which is important in games, is not a problem. In the case where the GPU handles geometry transformation, if the data is not sent back to the CPU, clipping issues may occur. In the case of the PS3, the Cell side can perform transformations, and even if the GPU is used for transformation, it is comparatively easy to send the data back to the CPU side.

In architectures up to now, either the CPU or the GPU have been the bottleneck. It this is not resolved, we cannot go any further. To face this, in PS3 architecture, if the GPU becomes the bottleneck, it can shift work to the Cell, if the Cell becomes the bottleneck it can send work to the GPU, shifting the workload. For example, according to the software, the Cell side can perform more graphics processing, or, oppositely, or easily make an adjustment to leave the graphics work to the GPU, it was explained. In summary, between the CPU and GPU programmable processors, a flexible balance adjustment can be done.

In previous PC architectures, because they were limited by the CPU<->GPU pipe, geometry operations were held to a certain limit, and how rich an environment you can create within that limit became the main technical challenge. In contrast, the PlayStation2-type game consoles created large amounts of polygons, but after that it did not have the expressiveness of PCs. (Trans. note: probably means that PS2 is less capable in applying different effects to polygons than the PC, despite pumping out more polygons.) In the case of the PS3, both are possible, with the flexibility to balance the two.


http://pc.watch.impress.co.jp/docs/2005/0722/kaigai199.htm


Basically, this is similar to the MEMEXPORT function in X360 except the pipe between CELL and RSX is faster (20GB/sec downstream, 15GB/sec upstream, 35GB/sec total) and it can be divvyed up per SPE on CELL..you don't have to worry about thrashing in and out of the L2 cache when doing this since each SPE has its own local 256KB SRAM and L2 doesn't have to be touched at all, it seems...

It is for these reasons (and a few others) why I don't see PCs catching up with these consoles anytime soon....they don't have enough Memory BW to do what the X360 and PS3 can do (CPU<=>GPU synergy-wise, at least)

That is my take on it...
 

gofreak

GAF's Bob Woodward
Kleegamefan said:
Apparently, Xenos/C1 can lock 1/3 of the 1MB L2 cache for main ram-GPU streaming via a process known as MEMEXPORT (pushing/pulling vectorised data directly to and from system RAM)...

AFAIK, Memexport is for writing to RAM, it has nothing to do with cache. The GPU can read directly from L2, but can only write back to it to indicate that it's finished reading (people hypothesised before about memexport straight into L2 cache but this would not appear to be the case).

The RSX<->Cell relationship isn't akin to memexport btw.
 

Kleegamefan

K. LEE GAIDEN
gofreak said:
AFAIK, Memexport is for writing to RAM, it has nothing to do with cache. The GPU can read directly from L2, but can only write back to it to indicate that it's finished reading (people hypothesised before about memexport straight into L2 cache but this would not appear to be the case).

The RSX<->Cell relationship isn't akin to memexport btw.


I stand corrected :)
 
Can someone elaborate on the Xbox Memory card slots? We know it's USB, but is it the USB form factor, or a custom shell like the Xbox controllers were?

BTW: Anyone have the size of the Memcards?
 

cyberheater

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4
Standard 360 memcard is 64mb (from the leaked brochure)...
 
cyberheater said:
Standard 360 memcard is 64mb (from the leaked brochure)...
I should have been more elaborate, sorry.

What is the physical size of the Memory Unit, not storage :p

I remember a picture of a MU compared to 2 quarters. Does anyone have the pic?
 
Top Bottom