• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Apple's M1 Max GPU is as powerful as an Nvidia RTX 2080 desktop GPU and the Sony PS5 gaming console

smbu2000

Member
Do you have a link?

Wasn't able to find the specs.

Although I'm curious how you get 400 GB/s with LPDDR5-6400?
Anandtech site goes into a bit of speculation along with the fie shots.
https://www.anandtech.com/show/1701...m1-max-giant-new-socs-with-allout-performance

For M1 Pro:
The company divulges that they’ve doubled up on the memory bus for the M1 Pro compared to the M1, moving from a 128-bit LPDDR4X interface to a new much wider and faster 256-bit LPDDR5 interface, promising system bandwidth of up to 200GB/s. We don’t know if that figure is exact or rounded, but an LPDDR5-6400 interface of that width would achieve 204.8GB/s.

For the M1 Max:
The packaging for the M1 Max changes slightly in that it’s bigger – the most obvious change is the increase of DRAM chips from 2 to 4, which also corresponds to the increase in memory interface width from 256-bit to 512-bit. Apple is advertising a massive 400GB/s of bandwidth, which if it’s LPDDR5-6400, would possibly be more exact at 409.6GB/s. This kind of bandwidth is unheard of in an SoC, but quite the norm in very high-end GPUs.
 

Panajev2001a

GAF's Pleasant Genius
Apple wrote the book on spin…Sony just copied it

just chiming in to say M1 chip support on most mainstream apps is pretty shitty outside of the marketing deals Apple has

u buy an M1 mac and u will quickly hate your decision
I am seeing tons of apps with M1 versions given my Activity Monitor report and I have to be forced to go back to an Intel MBP… throttle like crazy and sounds worse than the PS4 Pro… a real jet engine.

I am not sure how macOS itself and what co-processors they have dedicated to making the SSD that fast in real world scenario (still you see that the SSD in the M1 is really a lot faster than previous Mac SSD’s)… there is a reason MS and Sony invested in new I/O solutions on the software side like DirectStorage (and whatever Sony’s one is) and dedicated co-processors for data decompression and I/O handling not to saturate the main CPU.
 

Bluntman

Member
Anandtech site goes into a bit of speculation along with the fie shots.
https://www.anandtech.com/show/1701...m1-max-giant-new-socs-with-allout-performance

For M1 Pro:
The company divulges that they’ve doubled up on the memory bus for the M1 Pro compared to the M1, moving from a 128-bit LPDDR4X interface to a new much wider and faster 256-bit LPDDR5 interface, promising system bandwidth of up to 200GB/s. We don’t know if that figure is exact or rounded, but an LPDDR5-6400 interface of that width would achieve 204.8GB/s.

For the M1 Max:
The packaging for the M1 Max changes slightly in that it’s bigger – the most obvious change is the increase of DRAM chips from 2 to 4, which also corresponds to the increase in memory interface width from 256-bit to 512-bit. Apple is advertising a massive 400GB/s of bandwidth, which if it’s LPDDR5-6400, would possibly be more exact at 409.6GB/s. This kind of bandwidth is unheard of in an SoC, but quite the norm in very high-end GPUs.

This is correct.
 

Panajev2001a

GAF's Pleasant Genius
Something Apple is leaving out of their console TFLOPS comparison (beyond technologies like Direct Storage and the BCPACK + zlib HW decompressors, for example) is RT.
They have the ML hardware to run something like nVIDIA DLSS 2.x, but I do not see any dedicated RT acceleration HW in the form of discreet RT cores or even additional ray-triangle intersection and BVH traversal acceleration HW near the texture units like on RDNA2.

RDNA2 might not be your RT cup of tea, but it is a lot better than Software emulation.
 

Panajev2001a

GAF's Pleasant Genius
yeah sure if you want to play games then your $3500 is better put into a PC.

these laptops are not aimed at people who want to play games. just like you might go spend $3500 on a PC for gaming there are people out there who would pay $3500 for a computer that they can use to improve their coding, rendering, or music production performance. i'm in the UK and the 16" macbook pro tops out at £5,900. for someone who makes £200-250 a day it quickly pays for itself. earning £200/day in a 5 day week it'd pay itself off in about a month and a half. when you need a tool for your job you want something high quality that will get the job done easier/faster, right?

It is insane how quickly productivity gains add up when you count productivity loss cost for employees. A few minutes a day pay for a new laptop every year… easily.
 

rnlval

Member
We are talking about TFLOPS which is the theoretical maximum compute capability of the vector ALUs, nothing else.
Reminder, math array co-processors for raster and raytracing-based games are not just compute shaders i.e. GPUs are not DSP.
 

rnlval

Member
Something Apple is leaving out of their console TFLOPS comparison (beyond technologies like Direct Storage and the BCPACK + zlib HW decompressors, for example) is RT.
They have the ML hardware to run something like nVIDIA DLSS 2.x, but I do not see any dedicated RT acceleration HW in the form of discreet RT cores or even additional ray-triangle intersection and BVH traversal acceleration HW near the texture units like on RDNA2.

RDNA2 might not be your RT cup of tea, but it is a lot better than Software emulation.
For BVH raytracing workloads,

RDNA 2 accelerates the following areas
1. Bound box intersection test
2. Triangle intersection test
BVH traversal workloads are processed on shader units.


RTX Turing accelerates the following areas
1. Bound box intersection test
2. Triangle intersection test
3. BVH traversal


RTX Ampere accelerates the following areas
1. Bound box intersection test
2. Triangle intersection test
3. Interpolated triangle position (RT motion blur)
4. BVH traversal

Raytracing denoise pass is done on compute shaders.
 
Last edited:

rnlval

Member
https://www.notebookcheck.net/Apple...and-the-Sony-PS5-gaming-console.573846.0.html

TL;DR: 10.4 TF, only 54W of power usage, 7.4 GB/s.

Caveat: it's assumed there are no changes to clock speeds compared to the M1, and as always bench for waitmarks.
When PS5 GPU has 10.28 TFLOPS compute, then it has about 11 TFLOPS RT cores. My argument is based on XSX GPU's 12.147 TFLOPS (FP32) compute with 13 TFLOPS RT cores.

For the intended gaming workload with RT in this generation, PS5 GPU has about 21 TFLOPS that is equivalent to two RX 5700 XTs with 1st GPU processing raster workloads while the 2nd GPU processing raytracing workloads.

Raytracing with XSX/PS5 era games needs plenty of TFLOPS. RT cores are purpose-designed math co-processing raytracing units.

Anandtech site goes into a bit of speculation along with the fie shots.
https://www.anandtech.com/show/1701...m1-max-giant-new-socs-with-allout-performance

For M1 Pro:
The company divulges that they’ve doubled up on the memory bus for the M1 Pro compared to the M1, moving from a 128-bit LPDDR4X interface to a new much wider and faster 256-bit LPDDR5 interface, promising system bandwidth of up to 200GB/s. We don’t know if that figure is exact or rounded, but an LPDDR5-6400 interface of that width would achieve 204.8GB/s.

For the M1 Max:
The packaging for the M1 Max changes slightly in that it’s bigger – the most obvious change is the increase of DRAM chips from 2 to 4, which also corresponds to the increase in memory interface width from 256-bit to 512-bit. Apple is advertising a massive 400GB/s of bandwidth, which if it’s LPDDR5-6400, would possibly be more exact at 409.6GB/s. This kind of bandwidth is unheard of in an SoC, but quite the norm in very high-end GPUs.

256-bit is equivalent to four 64 bit channels on Threadripper.

512-bit is equivalent to eight 64 bit channels on Epyc.

Epyc can operate without external southbridge as per ITX motherboards e.g.

YomrXCS.jpg


ITX Epyc without AMD's southbridge, hence it's acting like SoC. The above ITX Epyc is configured for four 64 bit memory channels.

For performance desktops, Intel and AMD need to evolve beyond dual 64-bit memory channels since it's stuck from the AMD K7 era's dual DIMMs and beyond 128-bit bus technology is available from the server SKUs.
 
Last edited:

ethomaz

Banned
RTX 3080 has TFLOPS coming from CUDA cores, RT cores, and Tensor cores, TMU's floating point texture filtering hardware, Polymorph units (geometry is floating-point data format), and ROPS (floating point capable blending hardware).

Most TFLOPS debates between AMD vs NVIDIA only cover the shader TFLOPS.
RTX 3080 is a 29.77TFs card.
8704 units * 1710 MHz + 2

It doesn’t include anything except the computer shaders TFs.
 
Last edited:

rnlval

Member
RTX 3080 is a 29.77TFs card.
8704 units * 1710 MHz + 2

It doesn’t include anything except the computer shaders TFs.
FYI, real-life RTX 3080 can pass 29.77 TFLOPS FP32.

rH8DMj4.jpg


----------------

TU102 has separate TIOPS (integer) and TFLOPS.


ofr340q.jpg


Integer workload didn't disappear with real-world PC games. Unlike Ampere, Turing can't reuse integer CUDA cores into FP CUDA cores.

Pure TFLOPS argument hides Turing RTX 2080's TIOPS capabilities that are used for real-world PC games.
 
Last edited:

ethomaz

Banned
FYI, real-life RTX 3080 can pass 29.77 TFLOPS FP32.
Yes, it can because nVidia GPU clocks runs most of time over the Boost Clock of 1710 MHz.

clocks-and-thermals.png


The Boos Clock in nVidia cards seems to be the bottom of the clock the card can run.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
For BVH raytracing workloads,

RDNA 2 accelerates the following areas
1. Bound box intersection test
2. Triangle intersection test
BVH traversal workloads are processed on shader units.


RTX Turing accelerates the following areas
1. Bound box intersection test
2. Triangle intersection test
3. BVH traversal


RTX Ampere accelerates the following areas
1. Bound box intersection test
2. Triangle intersection test
3. Interpolated triangle position (RT motion blur)
4. BVH traversal

Raytracing denoise pass is done on compute shaders.

The RDNA2 work still does not come cheap.

I find the fill rate and bandwidth staggering considering this is a TBDR still. Dreamcast liveth;).
 

twilo99

Member
For content creation it'll be fantastic hardware, particularly if you're heavily invested in Apple's software that benefits from their in house optimization.

It'll be interesting to see how Intel/AMD respond to M1. Apple's vertical integration makes an SOC approach like M1 a lot more feasible. PC OEM's don't operate with that kind of business model.

They can't really respond to M1 in any meaningful way.. Apple is coming to the game from the efficiency end of the spectrum, where Intel/AMD are coming at it from the other side. They will eventually meet in the middle I guess lol

Where Apple are with ARM is a few generation ahead of the competition.. Qualcomm and Samsung are their closest rivals there.
 

Redneckerz

Those long posts don't cover that red neck boy
So not that good unless it is offscreen lol

I really want to see real benchmarks because right now it seems to be only PR marketing.
That's bold, because the M1 even in Rosetta yields very playable games from stuff like Rise of the Tomb Raider, at 1080p.

I am no Apple fan, but their tech is pretty much Area 51 levels of customization - and performance. For all the BS they put out, their performance figures are quite to be believed.
 

ethomaz

Banned
That's bold, because the M1 even in Rosetta yields very playable games from stuff like Rise of the Tomb Raider, at 1080p.

I am no Apple fan, but their tech is pretty much Area 51 levels of customization - and performance. For all the BS they put out, their performance figures are quite to be believed.
I’m Apple fan and I do love their products but even so I know their PR are overboasted.
 
Last edited:

Redneckerz

Those long posts don't cover that red neck boy
I’m Apple fan and I do love their products but even so I know their PR are overboasted.
Perhaps. The M1 does deliver pretty well on what they said though.

I don't know. I am not a hardcore Apple follower. It just seems that with their turn to own Sillicon, what they claimed versus what it delivered is pretty on par.

I mean, it has to. They claim figures that are absolutely unearthly from a power/perf perspective... and yet they are doing it. Its just that no software actually makes use of it, but its there. *


* It should be noted that these claims go back to the iPhone, but there is indeed some PR caveats in there: Their iOS chips could reach the kind of single thread/multi thread performance as claimed, but they obmitted saying this isn't consistent performance, leading to heavily throttling within minutes.

With M1, they are now attached to power bricks and large batteries, meaning continious performance is likely far better. With M1 Pro and Max, they are really stating insane things here - But seeing M1, what isn't there to believe?

A socketed machine may well hold up in that note. I mean, 60 watts for RTX 3080 equivalent perf?
 

IntentionalPun

Ask me about my wife's perfect butthole
* It should be noted that these claims go back to the iPhone, but there is indeed some PR caveats in there: Their iOS chips could reach the kind of single thread/multi thread performance as claimed, but they obmitted saying this isn't consistent performance, leading to heavily throttling within minutes.

Which sucks for games, but iOS has pretty much always killed it w/ basic perf of menus, the browser, etc.. when Android got away from their earlier java implementation they caught up a bit, but even the most high end phones would feel sluggish compared to iOS.

I imagine it might be moot now though, last high end Android device I owned was like 2 years ago.
 

ethomaz

Banned
Which sucks for games, but iOS has pretty much always killed it w/ basic perf of menus, the browser, etc.. when Android got away from their earlier java implementation they caught up a bit, but even the most high end phones would feel sluggish compared to iOS.

I imagine it might be moot now though, last high end Android device I owned was like 2 years ago.
I don’t thing there are differences nowdays to be fair with high-end devices with a good Android customization (of course there are customizations that are still trash OneUI, MIUI, etc etc).

For example the OxygenOS is pretty much as responsible as iOS.

PS. I never had an Android phone but I have to use it something because my wife never liked iOS so when she asks to do something in her phone (happens a lot) I got to taste Android… I still don’t like some Android decisions but it doesn’t feel anything sluggish compared to iOS.
 
Last edited:

Polygonal_Sprite

Gold Member
yeah sure if you want to play games then your $3500 is better put into a PC.

these laptops are not aimed at people who want to play games. just like you might go spend $3500 on a PC for gaming there are people out there who would pay $3500 for a computer that they can use to improve their coding, rendering, or music production performance. i'm in the UK and the 16" macbook pro tops out at £5,900. for someone who makes £200-250 a day it quickly pays for itself. earning £200/day in a 5 day week it'd pay itself off in about a month and a half. when you need a tool for your job you want something high quality that will get the job done easier/faster, right?
What do you do that earns £200 a day if you don’t mind me asking? Cheers.
 

rnlval

Member
1. The RDNA2 work still does not come cheap.

2. I find the fill rate and bandwidth staggering considering this is a TBDR still. Dreamcast liveth;).
1. With added raytracing** with the existing raster gaming workload, AMD NAVI 21 doesn't have extra TFLOPS compute power and extra register storage when compared to GA102 (RTX 3080, RTX 3080 Ti, RTX 3090).

**Real-time raytracing denoise stage is processed on shaders. Raytracing BVH transverse workload is processed on shaders with RDNA 2.

2. NVIDIA's Maxwell and Pascal GPUs have tiled cache immediate render with L2 cache.

j9Cyk3F.jpg


rcxilBH.jpg


GTX 1080 Ti has 3 MB L2 cache with 484.4 GB/s external memory bandwidth.
RTX 2080 Ti has 5.5 MB L2 cache with 616.0 GB/s external memory bandwidth.
RTX 3080 Ti has 6 MB L2 cache with 912.4 GB/s external memory bandwidth.


Since AMD's tiled caching with L2 is less competitive when compared to NVIDIA's, AMD included 128 MB L3 cache with NAVI 21.

LXsHSa0.jpg
 
Last edited:

rnlval

Member
Perhaps. The M1 does deliver pretty well on what they said though.

I don't know. I am not a hardcore Apple follower. It just seems that with their turn to own Sillicon, what they claimed versus what it delivered is pretty on par.

I mean, it has to. They claim figures that are absolutely unearthly from a power/perf perspective... and yet they are doing it. Its just that no software actually makes use of it, but its there. *

* It should be noted that these claims go back to the iPhone, but there is indeed some PR caveats in there: Their iOS chips could reach the kind of single thread/multi thread performance as claimed, but they obmitted saying this isn't consistent performance, leading to heavily throttling within minutes.

With M1, they are now attached to power bricks and large batteries, meaning continious performance is likely far better. With M1 Pro and Max, they are really stating insane things here - But seeing M1, what isn't there to believe?

A socketed machine may well hold up in that note. I mean, 60 watts for RTX 3080 equivalent perf?
FYI, Mac Mini M1 has 39 watts full load.

RTX 3080 is fabricated on Samsung's 8 nm node that is based on 10 nm. Apple M1 is fabricated on TSMC's 5 nm node. NVIDIA's TSMC 5 nm node would be Ada Lovelace generation.

AMD's rDNA 2 with 6 CU reached mobile handsets with Samsung's 4 nm node.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
1. With added raytracing** with the existing raster gaming workload, AMD NAVI 21 doesn't have extra TFLOPS compute power and extra register storage when compared to GA102 (RTX 3080, RTX 3080 Ti, RTX 3090).

**Real-time raytracing denoise stage is processed on shaders. Raytracing BVH transverse workload is processed on shaders with RDNA 2.

2. NVIDIA's Maxwell and Pascal GPUs have tiled cache immediate render with L2 cache.

j9Cyk3F.jpg


rcxilBH.jpg


GTX 1080 Ti has 3 MB L2 cache with 484.4 GB/s external memory bandwidth.
RTX 2080 Ti has 5.5 MB L2 cache with 616.0 GB/s external memory bandwidth.
RTX 3080 Ti has 6 MB L2 cache with 912.4 GB/s external memory bandwidth.


Since AMD's tiled caching with L2 is less competitive when compared to NVIDIA's, AMD included 128 MB L3 cache with NAVI 21.

LXsHSa0.jpg
Beside using it as a chance to flex it into your nVIDIA vs RDNA2 e-peen contest, how does it have anything to do with what you quoted?
 

CRAIG667

Member
Apple wrote the book on spin…Sony just copied it

just chiming in to say M1 chip support on most mainstream apps is pretty shitty outside of the marketing deals Apple has

u buy an M1 mac and u will quickly hate your decision
I am a photographer, my M1 MacBook Air is the best decision I could have possibly made, this is coming from someone who hates the whole Apple ecosystem, but the M1 for what I do blows EVERYTHING Intel out of the water
 

BadBurger

Is 'That Pure Potato'
AMD is already selling defected yield PS5 APU into the PC market as AMD 4700S APU with 256 bit GDDR6-14000. 4700S APU has disabled 40 CU iGPU.

AMD could have sold XSX APU into AMD's graphics card product channels. Most AIB PC graphics cards are missing CPU, southbridge**, ACPI HAL, and UEFI boot loader.

**AMD Zen already has a southbridge function to operate as single-chip SoC mode. Some low-cost A520 motherboards don't include AMD's southbridge chipset.

AMD 4700S/PS5 APU is effectively an AMD graphics card with a Zen2 SoC attached. A major issue with GDDR6 is the supply.

You quoted the wrong person but you're absolutely right. AMD and NVidia are deep in this SoC game and have been for years.
 

Redneckerz

Those long posts don't cover that red neck boy
FYI, Mac Mini M1 has 39 watts full load.
Quite insane for the kind of perf it does.
AMD's rDNA 2 with 6 CU reached mobile handsets with Samsung's 4 nm node.
I suspect the context here is the 4nm mode (Exynos 2300).
Beside using it as a chance to flex it into your nVIDIA vs RDNA2 e-peen contest, how does it have anything to do with what you quoted?
I often find it difficult to understand his postings, not so much of what it is saying, but how it relates to the topic. In the above, i have to surmize that the production node is what rnvial is referencing here when mentioning how AMD RDNA2 with 6 CU is a 4 nm node.
 

Dream-Knife

Banned
FYI, real-life RTX 3080 can pass 29.77 TFLOPS FP32.

rH8DMj4.jpg


----------------

TU102 has separate TIOPS (integer) and TFLOPS.


ofr340q.jpg


Integer workload didn't disappear with real-world PC games. Unlike Ampere, Turing can't reuse integer CUDA cores into FP CUDA cores.

Pure TFLOPS argument hides Turing RTX 2080's TIOPS capabilities that are used for real-world PC games.
Where is this 29.77 figure coming from?
 

rnlval

Member
Beside using it as a chance to flex it into your nVIDIA vs RDNA2 e-peen contest, how does it have anything to do with what you quoted?
You posted

2. I find the fill rate and bandwidth staggering considering this is a TBDR still. Dreamcast liveth.

I post a counter-argument against you TBDR.
 

Panajev2001a

GAF's Pleasant Genius
You posted

2. I find the fill rate and bandwidth staggering considering this is a TBDR still. Dreamcast liveth.

I post a counter-argument against you TBDR.
Maybe you could have made it more concise than blasting a wall of marketing material without highlighting what you were quoting. The main point was that the M1 Max must do that RDNA2 does not.

The other point was that, sure, the difference between TBDR and the rest is getting less and less than it used to be, but there is still some for a while longer: you already have your geometry binned per tile (no extra work to try to make best use of that cache you mentioned) and have HSR computed and for the screen tile no other work wastes external bandwidth (MSAA can be resolved fully on the on chip tile memory). For 10-ish TFLOPS having 400 GB/s per bandwidth and that fillrate (especially at that wattage) is impressive.
 
Last edited:

rnlval

Member
Maybe you could have made it more concise than blasting a wall of marketing material when the post you were actually quoting was talking about work the M1 Max must do that RDNA2 does not.

The other post had that which I still stand by: the difference between TBDR and the rest is getting less and less than it used to be, but there is still some for a while longer: you already have your geometry binned per tile (no extra work to try to make best use of that cache you mentioned) and have HSR computed and for the screen tile no other work wastes external bandwidth (MSAA can be resolved fully on the on chip tile memory).
The purpose for Mesh Shader's early geometry culling. The purpose for PS5's primitive shaders early geometry culling.

TZlvGfF.jpg


You're quoting PowerVR's old marketing BS.

Maxwell's and Pascal's Tiled Caching render doesn't have RTX's DirectX12U mesh shader features.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
The purpose for Mesh Shader's early geometry culling. The purpose for PS5's primitive shaders early geometry culling.
Great which is another step you need to take and make use of. It does not come for free for developers… nor can bring other features like HW Transparency sorting for free (like PVR2DC did).
 

Panajev2001a

GAF's Pleasant Genius
The purpose for Mesh Shader's early geometry culling. The purpose for PS5's primitive shaders early geometry culling.

TZlvGfF.jpg


You're quoting PowerVR's old marketing BS.
Valiantly fought with AMD or NVIDIA slides. I see you appreciate irony (… enable primitive culling in shaders… guess shaders are auto generated too).
 
Last edited:

rnlval

Member
Great which is another step you need to take and make use of. It does not come for free for developers… nor can bring other features like HW Transparency sorting for free (like PVR2DC did).
DirectX12_1 has ROV (Rasterizer Order Views). XBox 360's has ROV-like features since the emulator (Xenia) mapped Xbox 360's certain ROPS feature to DirectX12_1's ROV.

Well, thanks for playing ;)… and for the double standards.
Reciprocal treatment is an easy concept to understand, hence I returned the same serve back to you.
 
Last edited:
So this chip the M1 Max is supposedly 435mmsq die size but fabbed on TSMC's 5nm node.

Considering it's only able to match the PS5 GPUs theoretical peak but with a much bigger die size, on a much smaller process node, it makes it seem a little less impressive.

I guess that 512bit memory interface is gonna cost you in die area, plus the M1 packs in dedicated ML silicon, but still... the size of the die is still surprising.
 

rnlval

Member
So this chip the M1 Max is supposedly 435mmsq die size but fabbed on TSMC's 5nm node.

Considering it's only able to match the PS5 GPUs theoretical peak but with a much bigger die size, on a much smaller process node, it makes it seem a little less impressive.

I guess that 512bit memory interface is gonna cost you in die area, plus the M1 packs in dedicated ML silicon, but still... the size of the die is still surprising.
435 mm2 is larger than XSX's 360.4 mm2 and PS5's 308 mm2. Low entry cost entry points are important for game consoles.

Ryzen 5000 "Cezanne" APU has 175 mm2 with a 7 nm node.
 

Panajev2001a

GAF's Pleasant Genius
So this chip the M1 Max is supposedly 435mmsq die size but fabbed on TSMC's 5nm node.

Considering it's only able to match the PS5 GPUs theoretical peak but with a much bigger die size, on a much smaller process node, it makes it seem a little less impressive.

I guess that 512bit memory interface is gonna cost you in die area, plus the M1 packs in dedicated ML silicon, but still... the size of the die is still surprising.

A lot of die size is SRAM (trying to keep CPU very well fed without increased power consumption): they halved the number of efficiency cores (yet kept the same 4 MB L2) and doubled the amount of the wide Performance cores (with 24 MB of L2 cache and plenty of L1 cache per core too) as well as 32-64 MB of System Level cache (64 MB in the M1 Max) shared by all processors in the SoC.

As you noted they do pack a neat punch in terms of ML acceleration on chip as well as new extra Silicon for the camera / ISP HW block which consoles do not have. I wonder how they are addressing the bandwidth they targeted for the SSD (7.8 GB/s raw).
 
Top Bottom