• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Apple's M1 Max GPU is as powerful as an Nvidia RTX 2080 desktop GPU and the Sony PS5 gaming console

rnlval

Member
Had absolutely no idea. It used to always be 75w. In the future when we experience brown-outs we'll know it's just due to the kids playing Fortnite. LOL

I can see HP and Dell shipping those systems out with 180w PSUs.
My MSI branded RTX 3080 Ti GX Trio is fine. Some AIB RTX GA102 cards are poorly made due to VRM related misconfiguration.

I have 5.5 KW solar panels on my roof.
 
Last edited:

rnlval

Member


Very impressive imo.
No raytracing i.e. last-gen game.

Didn't show 4K results.

Anandtech didn't show Blender 3D. PC has RT hardware acceleration.
 
Last edited:
are they gonna make MacOS and future M1 or M2 chips touch screen friendly utilizing the apple pencil? I want a stand alone surface studio type of desktop macos using apple pencil. Artists dream come true 🤤
 

rnlval

Member

Keep cherrypicking and quoting partial numbers, multiple reviewers are testing and overall power consumption of the Mac unit (I wonder if all are setting the new macOS High Power mode settings, not the default btw) vs the laptop they test again is the strong suit of the MBP’s…

Even from your link: “Apple never really intended these chips to be used for gaming, and to be honest, the energy efficiency of these chips is something that is unmatched by any notebook touting an RTX 3060 or RTX 3080.”

I would also give some time for all vendors to update software (especially Adobe) to take as a stage of the new ARM codes. Interesting you keep bringing integrated GPU (M1) vs systems with dedicated GPU’s. We will see at the end of the day the best performance : battery life ratio too I guess ;). Anyways, this is not the attitude that Intel needs to win these kind of contracts back and do not expect Apple to lose money or sleep over this.
The big challenge will be the Mac Pro where their own designs allow for multiple dedicated GPU’s… I wonder if they will still allow them or how they plan to scale this design (they did not start from mobile to tablet to 13’’ MBP to larger MBP’s for no reason).
Apple used TSMC's 5 nm process node while NVIDIA used Samsung's older 8 nm process node.

Game benchmarks at 4K i.e. memory bandwidth hammer on the GPU.



 
Last edited:

rnlval

Member
Still impressive.
M1X's 400 GB/s BW was defeated by RX 6800M's 384.0 GB/s BW + 96 MB L3 cache with DCC (Delta Color Compression).

Borderland 3's Unreal Engine 4 has a deferred shading 3D engine. There are many games and non-gaming (e.g. CAD real-time visualization, Hollywood's real-time backdrop stage, marketing 3D visualization) use cases that used Unreal Engine 4.

Hollywood's real-time wide backdrop stage e.g. Star Wars Mandalorian.
 
Last edited:

rnlval

Member
Never thought I would see an ARM chip delivering results like these though. Factor in thermals and power usage and it is very impressive.
The game's heavy vector math workloads are on the GPU.

Again, Apple used TSMC's 5 nm process node while NVIDIA used Samsung's older 8 nm process node.
 
Last edited:

Leo9

Member
M1X's 400 GB/s BW was defeated by RX 6800M's 384.0 GB/s BW + 96 MB L3 cache with DCC (Delta Color Compression).

Borderland 3's Unreal Engine 4 has a deferred shading 3D engine. There are many games and non-gaming (CAD real-time visualization, Hollywood's real-time backdrop stage, marketing 3D visualization) use cases that used Unreal Engine 4.
"Given the use of x86 binary translation and macOS’s status as a traditional second-class citizen for gaming, these aren’t apple-to-apple comparisons."

x86 code translated to ARM.
Metal vs DirectX (even on the same hardware DX provides better performance).
Completely new GPU architecture.

These GPUs can clearly do much better than that.
 

Topher

Gold Member
"Given the use of x86 binary translation and macOS’s status as a traditional second-class citizen for gaming, these aren’t apple-to-apple comparisons."

x86 code translated to ARM.
Metal vs DirectX (even on the same hardware DX provides better performance).
Completely new GPU architecture.

These GPUs can clearly do much better than that.

Keep forgetting that Shadow of the Tomb Raider is actually running in Rosetta 2. Maybe we will see a native port at some point.
 

rnlval

Member
"Given the use of x86 binary translation and macOS’s status as a traditional second-class citizen for gaming, these aren’t apple-to-apple comparisons."

x86 code translated to ARM.
Metal vs DirectX (even on the same hardware DX provides better performance).
Completely new GPU architecture.

These GPUs can clearly do much better than that.
Apple's CPU translation has static translation which is advanced when compared to only JIT emulators.

Certain PC game DRM has a virtual CPU instruction set and customized LLVM based translator e.g. Denuvo VMProtect. Denuvo VMProtect runs encrypted game code with a customized virtual CPU instruction set.

For Direct3D, there are AMD or NVIDIA game-specific shader optimizations e.g. AMD/NVIDIA monthly driver downloads. Windows PC GPU drivers from AMD and NVIDIA have cutting-edge LLVM based DX3D ASM to GPU ISA translators. Selecting a GPU vendor is also selecting post-sale monthly driver download support.

True native games with native GPU code are on the game consoles.



Unified memory's downside. CPU disrupts GPU's burst I/O mode and higher context switch overheads.
 
Last edited:

twilo99

Member
Just remember, Apple are just getting started with this, if the CPU upgrades in the iPhone are any indication by the time they get to "A3 max" or whatever it will be called in a few years they will blowing the competition even further away...

Apple’s MacBook Pro is a GPU-shaped warning to Nvidia and AMD - The Verge

"What could Apple do in an iMac or Mac Pro, with plenty of time and plenty of room to build an even bigger and better GPU with fewer thermal and power constraints?"

Nightmare fuel for the competition..
 
Last edited:

twilo99

Member
People still doubting the impressiveness? Oh but it barely beats a surface pro in Adobe premier. I’m surprised Dave didn’t point out that it hasn’t been updated yet. Most software updates see a 5x increase once updated for M series chips.

Because most people can't grasp the depth and where Apple's knowhow comes from.. I think most think that they just buy off the shelf chips from TSMC or something.
 
Still impressive.
Its impressive in the sense that its an absolutely massive 430mm2 chip, built on a leading edge node.
57 billion transistors - more than the gargantuan . The CPU cores are absolutely massive - bigger than Zen 3 cores in terms of absolute area, yet on a node with 80% higher density. Huge. The performance of the CPU is very impressive indeed.
GPU....not so much.

Its got a whopping 4096 shaders, compared to the 2560 in the 6800M and its considerably slower in games. Its impressive that it uses so little power, but then again its clocked at just 1.27GHz. Would its performance and power consumption scale just as well to higher clocks? Well that remains to be seen, but I doubt it. All that performance per watt has to come at a cost. They probably used TSMC's highest density transistor libraries - high density means higher thermal density which means clock scaling is much more difficult. This is evidenced by the fact that TSMC's N7 is capable of 90 million transistors per square millimetre - the density used by Apple's N7 SOCs, but Navi 21, 22 and 23 are all around 50 million transistors per square mm. Loss in density, but it means RDNA2 GPU's can clock close to 3GHz. There's also the fact that Apple is likely using low power cells to ensure the chip is as efficient as possible.

What would be really impressive, is if you could run 250W through the chip and its CPU and GPU performance scale up with it. Can they put 250-300W through the GPU, get its clocks up to around 2GHz on the GPU and have it perform at the level of a desktop 3070. We have no idea if it can scale that well, but that remains to be seen. Given Apple's design goals and their priority lying squarely at low-power, I don't expect that it will scale up amazingly well.

Apple’s MacBook Pro is a GPU-shaped warning to Nvidia and AMD - The Verge

"What could Apple do in an iMac or Mac Pro, with plenty of time and plenty of room to build an even bigger and better GPU with fewer thermal and power constraints?"

Nightmare fuel for the competition..
I wouldn't trust Tom Warren's understanding of silicon engineering at all.

As I've mentioned above, there are trade-offs to silicon design. If you go with high density, low power transistor libraries you sacrifice on high-power performance scaling. You can throw juice at it, but it won't go much faster.
Equally, if you go for the lower density, high-performance transistor libraries, you can get very high top-end performance. But when you lower the voltages, your performance doesn't scale down particularly well.

M1 Pro and Max are impressive chips to be sure. But there is no guarantee that if you shunt 3 times as much power into the chip, you'll get 3 times as much performance.
 
Last edited:

ethomaz

Banned
Apple’s MacBook Pro is a GPU-shaped warning to Nvidia and AMD - The Verge

"What could Apple do in an iMac or Mac Pro, with plenty of time and plenty of room to build an even bigger and better GPU with fewer thermal and power constraints?"

Nightmare fuel for the competition..
Tom Warrior lol

Just ignore.

This silicon probably can’t compete with nVidia/AMD ones due all the trade offs… it is not Apple goal btw… it is focused in production apps while nVidia/AMD are more focused in gaming.
 
Last edited:

Dream-Knife

Member
Just remember, Apple are just getting started with this, if the CPU upgrades in the iPhone are any indication by the time they get to "A3 max" or whatever it will be called in a few years they will blowing the competition even further away...
Unrelated; why do people need powerful phones? What are you doing on it?

I know casuals like to throw around the power thing as a reason to choose a product, but most just browse the web, take photos, and consume media on their phones. I don't see how slapping an M1 in it would help it. Of course it would be more impressive for marketing, yet if they can sell their current phones to people for 1k... why bother?
$6000 due to Apple tax. It's a $1000-1500 (ignoring current craziness) 6900XT with double the VRAM. It would be $1500 MSRP on a factory card if they made such a thing on PC.

I hope these become so good at mining, that all the Nvidia and AMD GPU's go back to regular pricing. This would be a true wish of mine, so that PC gamers can finally upgrade, without removing their kidneys or third legs.
Yes this.
 
This benchmark is I/O bandwidth limited.
The fact that M1 Max is a SoC and not a CPU and GPU connected via a PCI-E slot represents a huge advantage. I imagine you'd find a Renoir or Cezanne APU would also appear to be insanely fast in this benchmark for similar reasons.
10.6TF GPU is otherwise never going to be faster than a 22TF GPU at compute.
 

rnlval

Member
"Given the use of x86 binary translation and macOS’s status as a traditional second-class citizen for gaming, these aren’t apple-to-apple comparisons."

x86 code translated to ARM.
Metal vs DirectX (even on the same hardware DX provides better performance).
Completely new GPU architecture.

These GPUs can clearly do much better than that.

At 4K, the bottleneck shifts towards the GPU and video memory bandwidth side.

For Shadow of the Tomb Raider at 1080p resolution with Denuvo vs non-Denuvo. Lower resolution shows CPU being gimped by Denuvo.

Denuvo is not a major issue at 4K.
 

rnlval

Member
This benchmark is I/O bandwidth limited.
The fact that M1 Max is a SoC and not a CPU and GPU connected via a PCI-E slot represents a huge advantage. I imagine you'd find a Renoir or Cezanne APU would also appear to be insanely fast in this benchmark for similar reasons.
10.6TF GPU is otherwise never going to be faster than a 22TF GPU at compute.
FYI, Apple's Xeon W Mac Pro desktop is also limited by the aging PCIe v3.0 IO.

Intel Alder lake has PCIe 5.0
Intel Rocket Lake has PCIe 4.0
AMD Zen 2 and Zen 3 have PCIe 4.0
IBM Power 9 has PCIe 4.0
 

rnlval

Member
Indeed, and like I said earlier, they are just getting started...

Anyone underestimating what Apple's chip design team can do is a fool, there is no one better in the consumer space at the moment, and it has been the case for a while now.
FYI, NVIDIA and AMD are also moving towards TSMC's 6 nm and 5 nm process nodes. AMD's partnership with Samsung has a 4 nm process node.
 

docbot

Banned
If Sony is smart they cooperate with Apple on the PS6 and in return bring PS Games to Apple Hardware. Would be a good way to stick it to Microsoft.
 

FStubbs

Member
If Sony is smart they cooperate with Apple on the PS6 and in return bring PS Games to Apple Hardware. Would be a good way to stick it to Microsoft.
And Microsoft would say "we've had those for years now, so what". Though with the way Xbox has never outsold Playstation, not as if Sony needs to "stick it" to them.
 
Last edited:

rnlval

Member
If Sony is smart they cooperate with Apple on the PS6 and in return bring PS Games to Apple Hardware. Would be a good way to stick it to Microsoft.
XSX already has a 560 GB/s memory bandwidth design via the unified 320-bit bus with GDDR6-14000.
PS5 already has a 448 GB/s memory bandwidth design via the unified 256-bit bus with GDDR6-14000.

The fastest GDDR6 variant is GDDR6X-21000.

Based on Sony's Playstation workload targets, Sony has reduced CPU's AVX resource and has shifted the majority of vector and scalar workloads on the GpGPU and CU-based DSP. PS5's IO SSD access is hardware accelerated like RTX IO (NVIDIA's hardware backing MS DirectStorage on the PC).
 
Last edited:

rnlval

Member
Apple’s MacBook Pro is a GPU-shaped warning to Nvidia and AMD - The Verge

"What could Apple do in an iMac or Mac Pro, with plenty of time and plenty of room to build an even bigger and better GPU with fewer thermal and power constraints?"

Nightmare fuel for the competition..

M1X didn't win in Adobe Premiere. Adobe Premiere has access to newer PC hardware with PCIe 4.0 instead of Apple's current Xeon W with PCIe 3.0.

Andy's benchmarks show an application that is not fully math vector workload transferred to the GpGPU like a typical PC 3D game e.g. load 8K photo as a texture and apply to compute shading with raytracing.
 
Last edited:

Md Ray

Gold Member
XSX already has a 560 GB/s memory bandwidth design via the unified 320-bit bus with GDDR6-14000.
PS5 already has a 448 GB/s memory bandwidth design via the unified 256-bit bus with GDDR6-14000.

The fastest GDDR6 variant is GDDR6X-21000.

Based on Sony's Playstation workload targets, Sony has reduced CPU's AVX resource and has shifted the majority of vector and scalar workloads on the GpGPU and CU-based DSP. PS5's IO SSD access is hardware accelerated like RTX IO (NVIDIA's hardware backing MS DirectStorage on the PC).
FYI, XSX has split memory banks. It's not 560 GB/s via 320-bit bus for the entirety of 16GB of memory. 10GB of 16GB [runs at] 560 GB/s via 320-bit bus with GDDR6-14000, and 6GB [runs at] 336GB/s via 192-bit bus with GDDR6-14000.
 
Last edited:
I think 2022 will be the start of new standards for all ARM, X86 based computing for smartphones, laptops, desktops:

DDR5
PCIE5.0
USB4.0/Thunderbolt 4.0
NVME SSD drives read/write up 16GB/sec utilizing PCIE 5.0
SD Express 8.0 Gen4x2 = ~4GB/sec read/write (waiting on 9.0 announcement utilizing PCIE 5.0)
Wifi6E
5G
5nm/5nm+ (with exception of Intel's 10nm)
HDMI 2.1 with full 48Gbps

2020-2021 were transition years....
 

rnlval

Member
FYI, XSX has split memory banks. It's not 560 GB/s via 320-bit bus for the entirety of 16GB of memory. 10GB of 16GB [runs at] 560 GB/s via 320-bit bus with GDDR6-14000, and 6GB [runs at] 336GB/s via 192-bit bus with GDDR6-14000.
False.

XSX has a unified memory model with 6 GB address memory range having 336 GB/s bandwidth and the 10 GB address range has 560 GB/s memory bandwidth.



CPU and GPU X86-64 pointers (implied memory read/write) can point to anywhere in the 16 GB memory address range. The programmer has to be aware of the memory bandwidth difference with a certain memory address range.
 
Last edited:

rnlval

Member
That was already proved false.
AMD 4700S's AVX was component benchmarked. AMD 4700S's AVX has lower Cinebench scores when compared to Ryzen 7 4750G APU!

Testing shows that the ps5 CPU is about on par with a Ryzen 2700X although in some tests it falls 5% behind the 2700X.

AMD 4700S's CPU has up to 4 Ghz clock speed.
 
Last edited:

ethomaz

Banned
AMD 4700S was benchmarked.
Yeap that is why it has a full resource AVX unit (FPU)… at least compared with Zen2 is the same.
The issue in that implementation is the really bad cache/memory latency that downgrades all chance to have good benchmark scores.
 
Last edited:

rnlval

Member
Yeap that is why it has a full resource AVX unit (FPU)… at least compared with Zen2 is the same.
The issue in that implementation is the really bad cache/memory latency that downgrades all chance to have good benchmark scores.
For example


Extended Instructions set
Ryzen 4700S: 12,184 Million Matrices/Sec
AMD Ryzen 7 4800HS: 16,669 Million Matrices/Sec

Both Ryzen 7 4800HS and Ryzen 4700S have up to 4Ghz clock speed.

Ryzen mobiles don't support Intel's XMP (AMD has AMP/DOCP/EOCP), hence JEDEC defined DDR4-3200 SODIMM has inferior latency numbers when compared to desktop Ryzens with XMP DDR4-3200 DIMM memory types.

Ryzen APUs don't have Ryzen desktop's chiplet latency penalty.
 
Last edited:

ethomaz

Banned
For example


Extended Instructions set
Ryzen 4700S: 12,184 Million Matrices/Sec
AMD Ryzen 7 4800HS: 16,669 Million Matrices/Sec

Both Ryzen 7 4800HS and Ryzen 4700S have up to 4Ghz clock speed.

Ryzen mobiles don't support Intel's XMP (AMD has AMP/DOCP/EOCP), hence JEDEC defined DDR4-3200 SODIMM has inferior latency numbers when compared to desktop Ryzens with XMP DDR4-3200 DIMM memory types.

Ryzen APUs don't have Ryzen desktop's chiplet latency penalty.
4700S uses GDDR6 and so have huge latency penalty for any operation like it showed in benchmarks.

That is the reason it scores less than similar Zen2 APU.

And another correction 4800HS is up to 4.2Ghz in Turbo while 4700S peak at 4Ghz like you said.
 
Last edited:

rnlval

Member
4700S uses GDDR6 and so have huge latency pênaltis for any operation like it showed in benchmarks.

That is the reason it scores less than similar Zen2 APU.
Nope, certain benchmarks fit inside CPU's cache. I purposely selected Ryzen mobile due to the inferior JEDEC DDR4-3200 latency e.g. CL22.
 
Last edited:

Spukc

Member
Man i am not even sure if i should get the base 8 core mbp 14 inch or the 10 core 16 inch

i returned my m1 last year because the port situation was a complete joke.

now they kinda fixed my issue with it
 
Last edited:

ethomaz

Banned
Nope, certain benchmarks fit inside CPU's cache. I purposely selected Ryzen mobile due to the inferior JEDEC DDR4-3200 latency e.g. CL22.
That is higher than 4700S latency.
And yes latency affect it.

It is the reason why 4700S perform below what expected in PC… memory latency or in simple terms the use of GDDR6.

It doesn’t have a DDR4 controller so AMD could only accept that limitation.
 
Last edited:

rnlval

Member
That is higher than 4700S latency.
And yes latency affect it.

It is the reason why 4700S perform below what expected in PC… memory latency or in simple terms the use of GDDR6.

It doesn’t have a DDR4 controller so AMD could only accept that limitation.


Intel's open image denoise benchmark.

Ryzen 4700S's 4.4 score shows about half of Ryzen 7 4750G's 8.1 score. Ryzen 4700S's 4.4 score is in line with Ryzen 2700X's 4.4 score.

PS; Intel Core i5 11400 RocketLake has AVX-512. In real-world gaming workload, PS5's raytracing denoise pass is done on GPU's shaders. Sony's semi-custom Zen 2 request is deliberate.

Intel Core i5 11400 (RocketLake with AVX-512) 11.5 scores nearly double Ryzen 5 5600X's 7.0.
 
Last edited:
Top Bottom