- cross-posted to:
- technology@beehaw.org
- cross-posted to:
- technology@beehaw.org
8GB RAM in M3 MacBook Pro Proves the Bottleneck in Real-World Tests::Apple’s new MacBook Pro models are powered by cutting-edge M3 Apple silicon, but the base configuration 14-inch model starting at $1,599…
Apple had to know these reviews were coming. A new iteration on their custom SOC is obviously going to make every tech site go bananas benchmarking and their claim that 8GB = 16GB is going to make them punish the machine even harder.
It’s like they decided a few bad reviews would cost them less than cutting their markup on RAM to make a 16GB entry level Pro machine for less than $2k.
The worst part is that in many retail chains like Costco, you can only get the 8GB version. I suspect the review reading segment of the population is smaller than we’d expect for such an expensive purchase. Previously they’ve crippled M1 machines that have 256Gb storage, only including one controller instead of two as in the 512+ machines. It’s a shame for MacBook Air, but totally unacceptable for a computer marketed as “Pro”
Not a pro cpu, just a pro chassis.
What’s worse is that their “8GB = 16GB” claim has a tiny bit of truth in it: many apps that are GPU-accelerated usually load/generate stuff on host RAM and then transfer it to the GPU RAM to launch some shaders/kernels on it and they do this repeatedly. The idea with Apple (also AMD when you consider APUs) is that since the RAM is “unified” you just have one RAM and you probably don’t have that redundancy anymore if those apps are built with that in mind, so in a sense if previously you had a 1GB buffer that had to live on both CPU and GPU RAM, this time it will only live in as a single 1GB buffer on Apple’s “unified” RAM. That’s still very different from the “8GB = 16GB” deceptive marketing by Apple.
You don’t have to put unified in quotes, it’s the proper term for an SoC that shares the same memory between the CPU and GPU.
The major advantage of unified memory is that it doesn’t have the copy overhead. When using a discrete GPU you need to load data onto the host and then copy it over to the GPU. And then if data on the GPU needs to be processed separately by the CPU (saved to a file, sent over the network, etc) you incur more overhead again. And let’s ignore more specific technologies like Direct I/O and io_uring for this discussion.
On an SoC with unified memory you don’t have this overhead. The CPU can (in theory) access the same memory space as the GPU with zero overhead, and it makes the performance hit from shuttling the data back and forth non-existent.
But there’s a massive downside, and it’s that it drastically cuts down your available memory, because now the CPU and GPU have only a single 8GB pool to use for both. Whereas in a system without unified memory and a discreet GPU would have the 8GB for the CPU in addition to whatever the GPU has. They don’t step on each other’s toes.
For example, if I use a system with 8GB of host RAM and a GPU with 6GB of VRAM to run a model of some kind (let’s say stable diffusion), it will load the model into the VRAM and not clog up the host RAM. Yes, the host will initially use system RAM to load the file descriptors and then shuttle the data to the GPU, but once that’s done the model isn’t kept on the host.
On a Mac it would load it onto the only memory available and the CPU would not have the full 8GB available to it the way an x86 system would have.
The point I’m making is that because of the unified architecture the 8GB is effectively even less than 8GB in a discrete GPU system. It’s worse.