• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

DF Retro: The Story of Nvidia GeForce 256 - The Original 'GPU

Sosokrates

Report me if I continue to console war
ATI were just too far ahead of their time; VPU is actually used these days, but for Vision Processing Unit, for accelerated machine learning vision workloads.



Rather than dedicated accelerators (since there'd be a possible bandwidth bottleneck), we'll just keep getting integrated acceleration units for offloading those type of tasks. I think Apple has this already with small cores, the Neural Engines.

More stuff like that with integrated chiplet designs, most likely.

Do How do u think it will play out in the PC space?

Will it just continue being included in GPUs?
 

samjaza

Member
When i think of my GPU history, it makes me seem like a fanboy...
TNT 2 32MB
Geforce 4 ti 4200
Geforce FX 5600
Geforce 8800 GTS 320MB
Geforce 9800 GTX 512MB
Geforce GTX 270 2GB
Geforce GTX 480 4GB
Geforce GTX 670 4GB
Geforce GTX 970
Geforce GTX 1070
Geforce RTX 3080
 
Last edited:

DaGwaphics

Member
When i think of my GPU history, it makes me seem like a fanboy...
TNT 2 32MB
Geforce 4 ti 4200
Geforce 5600 GTX
Geforce 8800 GTS 320MB
Geforce 9800 GTX 512MB
Geforce GTX 270 2GB
Geforce GTX 480 4GB
Geforce GTX 670 4GB
Geforce GTX 970
Geforce GTX 1070
Geforce RTX 3080

Do you dream in Gsync?
 

JackMcGunns

Member
I remember running that T&L water demo and being super impressed. Good times.

I eventually bought a GeForce 2 GTS which was a beast at the time, definitely better than the Voodoo 2 I was considering. They had a demo called X-Isle with Dinosaurs showing off some pretty awesome textures, vegetation and cubemaps, I think it was by Crytek and it was the precursor to Farcry and then Crysis.
 

Three

Member
At the time, these cards were called just graphics accelerators. We were still at a time when most games could run in software mode. But if you had a card like these, certain graphic's functions would be accelerated. Hence the name.
nVidia was the first to market the term GPU. Mind you this was not a technical term, just a marketing gimmick. But it stuck, and now all graphics cards are called GPUs.
And nVidia defined that only cards that had T&L could be called GPUs. Once again, this was just a marketing gimmick to differentiate the Geforce from the competition.
ATI tried to counter with the term VPU, Visual processing Unit. But it never caught on.
I still remember the first T&L demos I ran on the early Geforce cards. Anybody else remember the shiny sphere and the boat in the lake?
 

Three

Member
Found the vid:
Ran that shiny sphere on a geforce 256 and was blown away at how realistic it looked 😄. You could poke it and it would wobble about.



Nobody remembers the boat in the lake one though. I can't find it anywhere. There was also a particle fountain one.
 
Last edited:

yewles1

Member
It's what Nvidia called it at the time of it's launch. "Nvidia hailed the GeForce 256 as "the world's first GPU," a claim made possible by being the first to integrate a geometry transform engine, a dynamic lighting engine, a 4-pixel rendering pipeline, and DirectX 7 features onto the graphics chip."

I assume the term "GPU" wasn't used before the release of the GeForce 256.
PS1, as an example...
 

cyberheater

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4
Playing Quake when I first got my Voodoo card. Changed gaming forever. Amazing times.
 

winjer

Member
When i think of my GPU history, it makes me seem like a fanboy...
TNT 2 32MB
Geforce 4 ti 4200
Geforce FX 5600
Geforce 8800 GTS 320MB
Geforce 9800 GTX 512MB
Geforce GTX 270 2GB
Geforce GTX 480 4GB
Geforce GTX 670 4GB
Geforce GTX 970
Geforce GTX 1070
Geforce RTX 3080

My list is a little bit more varied, as it has one more producer of GPUs.

TNT2 Ultra
GeForce 2 MX DDR
Radeon 9700
GeForce 6800 GT
GeForce 7800GT
Radeon X1900
GeForce 8800 GT
Radeon 5770
GeForce 470
Radeon 7950
GeForce 680
GeForce 970
Radeon 390
GeForce 1070
GeForce 2070 Super
 

SF Kosmo

Al Jazeera Special Reporter
When i think of my GPU history, it makes me seem like a fanboy...
TNT 2 32MB
Geforce 4 ti 4200
Geforce FX 5600
Geforce 8800 GTS 320MB
Geforce 9800 GTX 512MB
Geforce GTX 270 2GB
Geforce GTX 480 4GB
Geforce GTX 670 4GB
Geforce GTX 970
Geforce GTX 1070
Geforce RTX 3080

I went:

3dfx Voodoo
3dfx Banshee
GeForce 256
GeForce 3
GeForce FX (one of the mid-range ones, because I didn't get the numbering scheme)
Some Radeon I can't remember
GeForce 8800 (my Crysis rig)
Radeon 5770 (or something, not 100% sure)
GeForce 770
GeForce 1060 (my Oculus upgrade)
GeForce RTX 2070

I jumped over to Radeon twice, once because I hated the GeForce FX (easily their worst gen) and once because I couldn't afford new hardware and took a hand me down. I was never really happy with either, mostly on account of the drivers. So I have been pretty loyal to the GeForce line.
 

winjer

Member
Ok, then N64 had programmable T&L and so RCP was therefore The Original GPU instead.

Not a fan of the video title.

There are a few things to understand here. And I have to repeat myself again and gain. GPU was just a marketing term made up by nVidia, to differentiate itself from the competition.
And the fact that this video was sponsored by nVidia, just reinforces this.
It was not a technical term, just marketing, targeted at the PC market.

If we want to be precise, several devices already had T&L before the 256. Some by software, others by hardware.
Even the SVP, Sega Virtual Processor had T&L. And before that, even arcade boards. And even professional graphics systems.

Regarding the N64, it was basically a software T&L. In the sense that it was a coprocessor running vector instructions.
The Geforce T&L was a fixed function unit. And it was replaced in the GeForce 4, with Vertex Shaders.
So in a real sense, the N64 RSP was more advanced than the Geforce 256 fixed function T&L. Especially considering it could also do some shader and vertex operations.
 
Last edited:
Do How do u think it will play out in the PC space?

Will it just continue being included in GPUs?

Well, if I had the means to engineer a PC myself, I'd like to see something like OMI-XSR & CXL take over, preferably the former for processor-to-processor (and compute device-to-compute device) interconnects, and the latter for storage memory devices. That way you can have very low latency and fast bandwidth with the former (each link/lane is 64 GB/s), and the latter, with CXL 3.0 in particular, could allow for reversed memory buffer as well as dynamic partitioning of different memory segments of a peripheral device to be accessed by multiple hosts simultaneously (CXL 2.0 allows for the latter, up to 16 hosts for a single peripheral device, "hosts" being processors, processor cores of what some refer to as "master" processors).

The former is best for dynamic random memories (OMI has a lower latency than HBM memories and DDR, let alone GDDR) and the logic can be built into the logic layers of active interposers; the latter is potentially the best solution for storage-class devices. They're both in enterprise and cluster computing environments, but nothing on the consumer electronics side of things so far. I do think memories like GDDR are coming towards an EOL, and the necessity of nUMA designs for PC due to the limitations of PCIe as an interconnect in terms of cache coherency (also meaning devices over the bus can't use the same TLBs or virtual addressing tables; a lot of that has to be duplicated) hold back potential performance (things like AMD's SAM levering BAR features of PCIe are a good help, though).

I'd like to think a centralized, upgradable memory pool that can serve both CPU and GPU is the future, but you'd have to move on to HBM memories for that. And from there, design a socketed interposer of some kind, sort of like how CPU sockets function, but for memory modules. I don't know enough on that from an engineering POV though, clearly. As you can see most of this was focused on memory because, well, things like AI accelerated dedicated hardware units can be implemented any number of ways into the future, and probably integrated into the CPUs and GPUs, but arithmetic performance is "free", relatively speaking, compared to the sheer magnitudes of more energy required to access data in memory over the bus.

Personally I'm a lot more interested in future Processing-In-Memory or Processing-Near-Memory accelerated logic built into memory controllers as close to the memory itself, and what that can bring for performance gains. Since node shrinks might not come as quickly as planned (and probably won't realistically get under 2nm) and perf gains per shrink are getting smaller, perf gains will mainly have to come from how future designs handle data locality and memory accessing schemes, how they store and organize data and how that data is distributed among the system, etc. And that's combined with a larger shift into chiplet-based designs, new features of interconnect technologies enabling new things, etc.
 
Last edited:
Top Bottom