• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

The official GAF parallelism demystification thread

fart

Savant
I've seen a lot of these Deadmeat-esque threads about estimated performance of the newfangled game playing hoodads, and it saddens me to see so much bunk reasoning being thrown around. If we're going to do this on the boards, we should all sit down and make sure that everyone understands the fundamentals of architecture, and above that, the whos whats and whys of parallelism and performance.

Also, I just drank a crapload more caffeine than I should have, so I'm not getting any work done at all.

SO, if you have a question about how math gets done on a computer these days, ask away, and if you have an answers (please be educated if you plan on chiming in though. this is about information, not misinformation - if you have some notion that you think is right but are not sure about, or something that you've heard, feel free to mention it, but be explicit in how confident you are in this information), feel free to jump in!

What this should be:

An informative resource for those in and outside of the field to learn about high and low level concepts and implementation-issues inherent in parallel computing (of all degrees). Examples of great topics of discussion are: thread-abstractions in different languages, historical, modern, and future ILP, multi-core CPUs from the VLSI to the systems level, performance modeling in parallel vs full sequential environments, etc.

What this should not be:

A debate thread. I don't want to talk about whether Ps3.14.. or XBOX about-face is faster than all/some/any competitor products. I especially don't want this to devolve into a fanboy bitchfest or a pseudo-science drivel arena.

I also don't want it to turn into an engineering chest puffing contest. This should be about explaining computer science concepts and making sure everyone is on the same page. If you do have useful anecdotes, please share them, but make sure they're relevant, and try not to portray yourself as some kind of engineering god/dess because you can switch on thread-safe std libs in studio.NET.

Finally: PLEASE try to refrain from the deadmeat math. If you absolutely have to throw numbers aroun, back everything up, with references if possible.
 

Takuan

Member
fart said:
An informative resource for those in and outside of the field to learn about high and low level concepts and implementation-issues inherent in parallel computing (of all degrees). Examples of great topics of discussion are: thread-abstractions in different languages, historical, modern, and future ILP, multi-core CPUs from the VLSI to the systems level, performance modeling in parallel vs full sequential environments, etc.
For starters, what is parallel computing, ILP, and VLSI?
 

fart

Savant
ILP stands for Instruction Level Parallelism. It refers to design trends in contemporary architectures that attempt to take a single stream of instructions and execute as many of the n next-instructions as possible in parallel (or more accurately, as fast as possible). Examples of techniques that are considered ILP: pipelining, superscalar, out of order, etc.

The basic idea is that you have a machine that takes "one-word" instructions; consider a stream of instructions to a human standing in front of four boxes: A, B, C and D. You tell the human:
put a number into box A
put a number into box B
put the sum of the numbers in A and B into C
put the difference of the numbers in A and B into D
Now, let's say we have as many humans as we want working on this, but only one set of boxes. Naively, we could take one human and execute these instructions in order, sequentially. That would take 4 steps. Now, what if we had two humans? We could execute like so:
human 1 puts a number into box A while human 2 puts a number into box B
human 1 adds A and B and puts the result into C while human 2 subs A from B and puts the result into D
That will only take 2 steps. Simple, right? Abstractly, this is multiple-issue/superscalar, and one example of ILP. Where it gets complicated is taking an arbitrary stream and determining, based on dependencies (what if the fourth instruction read from box C?) what ordering needs to be maintained to protect semantic correctness.

The key to understanding all these techniques as a group is that they assume a single sequential stream of instructions, and simply try to execute the instructions as quickly as possible while maintaining the same semantics (meaning they produce the exact same results) as would occur if the instructions were executed sequentially.

Parallel computing is kind of a catch-all term right now for any computing model or application that differs from traditional sequentialism. That is, any computation or computing machine which exposes to the user (and user in this case can refer to an end-user or a software developer) multiple computing devices that can be utilized in parallel to execute a single computation (possibly a very complex one) is considered a parallel machine or a parallel application.

VLSI (Very Large Scale Integration/ed circuit) just refers to the structural-level down in terms of machine issues. That is, from the physical/materials level all the way up to the block level, where chip are portrayed as a set of functional computing units. Hmm.. to be more explicit, computing machines (read: chips) can be looked at at a number of levels of abstraction. At the very bottom-most level, they're just si and electricity. At the top-most level they're more akin to input/output black boxes that implement what amounts to a fancy programmable pocket calculator (this is how we're most used to seeing them).
 

fart

Savant
To make this more explicitly relevant to the gaming board, the multi-core chips in the ps and xbox2 are both relatively highly parallel machines, and so these concepts are fundamental to understanding both performance, and software on the new machines.
 

fart

Savant
I read (here, I think), that they didn't have enough space on each core. mup design is always a game of performance vs environmental concerns (namely: area and consumption). I don't think it's a very big issue, honestly. Really, OoO is just another "black box" ILP trick that gives a percentage execution increase on average over any given single instruction stream. Same with the branch prediction boxes that aren't on the PS3 processing units. Both of these deficiencies can be partially made up in some cases by compiler tricks (which is half of modern ILP, really).

Furthermore, it is natural for more highly parallel machines to use simpler single execution units, partially because traditional highly parallel machines have been almost application specific (SIMD units, for example), and partially because the maximum realizable speedup from a very large number of well-utilized execution units is sky-high (linear, meaning 1/Xth the runtime for X processing units, or even superlinear with some applications) compared to ILP techniques, which were more of a response to the high chip-cost and low densities of the recent past (80s, 90s),
 
I don't know about the rest of you, but I feel demystified!

...I think I prefer proper explanations to analogies. Anandtech's stuff, for example, may be harder to follow than "guy puts stuff in box", but at least I can wade through it and see the relevance.

And if you're not going to talk about PS3 or X360 specifically, or even how any of this relates to/improves games, then what's the point? Maybe this should be in the Off-Topic area?
 

fart

Savant
IMO it's important to isolate concepts in the abstract not only because it makes the individual concepts easier to understand, but because at some point you have to start ignoring the levels of abstraction under what you're directly talking about.

The nice thing about using humans and boxes in an ILP example is that there is a very very thin layer between the analogy and how these concepts are discussed at a glossy architectural level. There is also a fair amount of background required to understand implementation and algorithmic details (although if someone wants to chime in up to DFGs, be my guest!). What I dislike about eg, Anandtech articles is that the goal of the articles seems to be to sound fancy rather than make things clearer to the reader.

However, you're right in that it can be hard to glean the relevance from an explanation that is so high-level. What makes all this discussion relevant to gaming is that gaming consoles are turning into devices that are even more advanced than general purpose microcomputers. The xbox, when it comes out, will probably be the largest consumer deployment of multi-core processors. This is a classic Big Deal, and it changes the way the power user, the developer, and the gaming forum debater, have to think about personal computing.
 
Top Bottom