M4 Mac mini #57

geerlingguy · 2024-11-11T02:51:49Z

Basic information

Board URL (official): https://www.apple.com/mac-mini/
Board purchased from: Apple (direct)
Board purchase date: October 29, 2024 (arrived Nov 11, 2024)
Board specs (as tested): M4 10/10/16-core, 32GB RAM, 1TB SSD, 10 GbE
Board price (as tested): 1499.00

Linux/system information

# output of `screenfetch`
                 -/+:.          jgeerling@jeff-mini
                :++++.          OS: 64bit macOS  
               /+++/.           Kernel: arm64 Darwin 24.1.0
       .:-::- .+/:-``.::-       Uptime: 5h 39m
    .:/++++++/::::/++++++/:`    Packages: 183
  .:///////////////////////:`   Shell: zsh 5.9
  ////////////////////////`     Resolution: 3840x2160 
 -+++++++++++++++++++++++`      DE: Aqua
 /++++++++++++++++++++++/       WM: Quartz Compositor
 /sssssssssssssssssssssss.      WM Theme: Blue (Dark)
 :ssssssssssssssssssssssss-     Font: FMonoMedium
  osssssssssssssssssssssssso/`  Disk: 190G / 995G (20%)
  `syyyyyyyyyyyyyyyyyyyyyyyy+`  CPU: Apple M4
   `ossssssssssssssssssssss/    GPU: Apple M4 
     :ooooooooooooooooooo+.     RAM: 3974MiB / 32768MiB
      `:+oo+/:-..-:/+o+/-      

# output of `uname -a`
Darwin jeff-mini.local 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:06:23 PDT 2024; root:xnu-11215.41.3~3/RELEASE_ARM64_T8132 arm64

Benchmark results

CPU

Geekbench 6: (3678 single / 14678 multi - https://browser.geekbench.com/v6/cpu/8791920)
299.93 Gflops (7.57 Gflops/W) (geerlingguy/top500-benchmark HPL result)
Cinebench 2024 (169 single / 893 multi / 3787 GPU)

Power

Idle power draw (at wall): 4.1 W
Maximum simulated power draw (stress-ng --matrix 0): 31.2 W
During Geekbench multicore benchmark: 36 W
During top500 HPL benchmark: 39.6 W
During Cinebench 2024: 38 W

Disk

Internal Apple Storage

Benchmark	Result
AmorphousDiskMark 4K random read QD64	1113.00 MB/s
AmorphousDiskMark 4K random write QD64	121.97 MB/s
AmorphousDiskMark 1M sequential read	3017.64 MB/s
AmorphousDiskMark 1M sequential write	3196.68 MB/s

Network

iperf3 results:

iperf3 -c $SERVER_IP: 9.40 Gbps
iperf3 -c $SERVER_IP --reverse: 9.38 Gbps
iperf3 -c $SERVER_IP --bidir: 9.37 Gbps up, 7.73 Gbps down

The 10 GbE connection adds about 2W to total system power draw.

(Be sure to test all interfaces, noting any that are non-functional.)

GPU

Cinebench 2024: 3787
Geekbench (Metal): 56652
Geekbench (OpenCL): 37773

Memory

tinymembench results:

Click to expand memory benchmark result

tinymembench v0.4.10 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :  30582.1 MB/s (2.4%)
 C copy backwards (32 byte blocks)                    :  30488.7 MB/s (2.7%)
 C copy backwards (64 byte blocks)                    :  30756.7 MB/s (0.8%)
 C copy                                               :  31050.0 MB/s (0.7%)
 C copy prefetched (32 bytes step)                    :  31217.9 MB/s (0.3%)
 C copy prefetched (64 bytes step)                    :  31255.8 MB/s (1.8%)
 C 2-pass copy                                        :  25266.3 MB/s (1.3%)
 C 2-pass copy prefetched (32 bytes step)             :  25340.9 MB/s (1.5%)
 C 2-pass copy prefetched (64 bytes step)             :  25332.9 MB/s (1.5%)
 C fill                                               :  45000.0 MB/s (10.3%)
 C fill (shuffle within 16 byte blocks)               :  35503.4 MB/s (2.5%)
 C fill (shuffle within 32 byte blocks)               :  37420.2 MB/s (4.0%)
 C fill (shuffle within 64 byte blocks)               :  41411.4 MB/s (6.9%)
 NEON 64x2 COPY                                       :  44108.2 MB/s (1.9%)
 NEON 64x2x4 COPY                                     :  44995.9 MB/s (3.8%)
 NEON 64x1x4_x2 COPY                                  :  43933.6 MB/s (3.5%)
 NEON 64x2 COPY prefetch x2                           :  38081.9 MB/s (4.2%)
 NEON 64x2x4 COPY prefetch x1                         :  37652.8 MB/s (0.8%)
 NEON 64x2 COPY prefetch x1                           :  38499.6 MB/s (1.7%)
 NEON 64x2x4 COPY prefetch x1                         :  36585.0 MB/s (1.6%)
 ---
 standard memcpy                                      :  44986.9 MB/s (2.0%)
 standard memset                                      :  69795.4 MB/s (1.2%)
 ---
 NEON LDP/STP copy                                    :  44254.6 MB/s (4.7%)
 NEON LDP/STP copy pldl2strm (32 bytes step)          :  45326.7 MB/s (4.9%)
 NEON LDP/STP copy pldl2strm (64 bytes step)          :  43931.5 MB/s (3.8%)
 NEON LDP/STP copy pldl1keep (32 bytes step)          :  44670.6 MB/s (2.6%)
 NEON LDP/STP copy pldl1keep (64 bytes step)          :  44082.8 MB/s (1.3%)
 NEON LD1/ST1 copy                                    :  42881.2 MB/s (1.6%)
 NEON STP fill                                        :  80754.5 MB/s (5.4%)
 NEON STNP fill                                       :  68623.4 MB/s (0.5%)
 ARM LDP/STP copy                                     :  43418.8 MB/s (0.3%)
 ARM STP fill                                         :  82462.2 MB/s (5.2%)
 ARM STNP fill                                        :  68986.8 MB/s (1.3%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    0.0 ns          /     0.1 ns 
    131072 :    0.0 ns          /     0.0 ns 
    262144 :    2.0 ns          /     3.0 ns 
    524288 :    2.9 ns          /     3.8 ns 
   1048576 :    3.4 ns          /     4.1 ns 
   2097152 :    3.7 ns          /     4.1 ns 
   4194304 :    5.1 ns          /     5.6 ns 
   8388608 :    6.1 ns          /     6.3 ns 
  16777216 :   12.7 ns          /    17.6 ns 
  33554432 :   49.1 ns          /    71.5 ns 
  67108864 :   71.1 ns          /    91.0 ns

`sbc-bench` results

The script doesn't run on macOS.

Phoronix Test Suite

Results from pi-general-benchmark.sh:

pts/encode-mp3: DNF (doesn't install on macOS)
pts/x264 4K: 12.82 fps
pts/x264 1080p: 55.53 fps
pts/phpbench: 1125967
pts/build-linux-kernel (defconfig): DNF (doesn't run on macOS)

Run inside a Docker container:

pts/encode-mp3: 4.250 s
pts/x264 4K: 25.27 fps
pts/x264 1080p: 108.50 fps
pts/phpbench: 932720
pts/build-linux-kernel (defconfig): 383.776 s

Additional Benchmarks

Ollama (LLMs)

See: https://github.com/geerlingguy/ollama-benchmark?tab=readme-ov-file#findings and geerlingguy/ollama-benchmark#2

System	CPU/GPU	Model	Eval Rate	Power (Peak)
M4 Mac mini (10 core CPU) / 32GB	GPU	llama3.2:3b	41.31 Tokens/s	30.1 W
M4 Mac mini (10 core CPU) / 32GB	GPU	llama3.1:8b	20.95 Tokens/s	29.4 W
M4 Mac mini (10 core CPU) / 32GB	GPU	llama2:13b	13.60 Tokens/s	29.8 W

The text was updated successfully, but these errors were encountered:

geerlingguy · 2024-11-11T02:56:05Z

This system is much more of a 'Single Board Computer' than a couple of the Ampere systems I'm also testing in this repo; and in a nice advancement over some SBCs, it has a power supply integrated in its diminutive body.

For a nice video showing how to upgrade the storage (lol why Apple still doesn't just go to M.2 is crazy): https://www.youtube.com/watch?v=cJPXLE9uPr8

And Snazzy Labs has a good teardown: https://www.youtube.com/watch?v=OYlF0NVXS70

I don't feel obligated to gut mine, since plenty of other people already have done so to theirs. I do feel obligated to test it like crazy before putting it into service. My current plan is to replace my M2 MacBook Air at home with this machine, and then the Air might come to the office and replace my old 2013 Air for a 'bench' laptop.

geerlingguy · 2024-11-12T06:09:23Z

I think my setup at my home desk leaves something to be desired. I noticed the M4 Pro CPU spikes to 100°C during multi-core tests like Cinebench. Geekbench 6 doesn't hit the cores as hard for as long, but it may have brief throttling as well, affecting the overall score.

I'm going to re-run all the tests in a better test environment at the studio tomorrow.

ThomasKaiser · 2024-11-12T09:16:56Z

why Apple still doesn't just go to M.2 is crazy

M.2 as in 'standard NVMe SSD with own controller'? Well, two reasons against:

lower profits
power efficiency ruined. At least with any consumer SSD so far I experienced way higher consumption figures than without (SBC use case: eMMC vs. OS on NVMe) and I guess a lot of what Apple is doing with the I/O subsystem is optimizing for both performance and battery life (or low consumption in the Mini's case). Unfortunately I haven't seen any tests so far emphasizing on the 'race to idle' concept also including I/O but by doing any fine-grained measurements with any Apple Silicon HW it's obvious that the I/O subsystem is most of the times in 'deep sleep' state or something lower (not that familiar with terminology in this area)

geerlingguy · 2024-11-12T16:09:38Z

At the office, where the Mac mini is in open air (ambient temp 23°C / 73°F), Cinebench is still pushing the SoC to 100°F pretty quickly. Maybe Apple's fan curve isn't aggressive enough?

I can hear the fan, but it's certainly very quiet. Only barely audible in the studio. Fan is at 1920 rpm

geerlingguy · 2024-11-12T16:17:03Z

After installing Macs Fan Control and setting the fan to 4900 rpm (max), the fan is audible (sounds about like the Qualcomm Dev Kit, lol), and temps are now down to 80°C after a minute:

Going to re-run Cinebench multi at max fan speed to see how it fares, and compare the two runs:

Test	Default fan settings (1900 rpm)	High fan speed (4000 rpm)
Cinebench 2024 Single	169	175
Cinebench 2024 Multi	893	801

The full system power draw averages 37.1W while running Cinebench 2024 continuously with default fan curve. Maxed out, I was getting up to 40.8W.

Odd result, the multi score is actually lower with the CPU not hitting max temps throughout! Not sure why, but color me surprised. Maybe silence is bliss.

geerlingguy · 2024-11-12T16:40:23Z

Cinebench results with Apple's default fan curve:

Cinebench results with 3000 rpm fan speed:

geerlingguy · 2024-11-12T17:55:38Z

Heh... I thought I had ordered an M4 Pro. Little did I know, I ordered an M4. Oopsie! It's still way faster than my M2 Air I'm replacing, and it still has 32 GB of RAM (double my Air), 1 TB HDD (double my Air), and 10 GbE (which my Air had to use a TB3 dongle for), so I'm happy with it, but disappointed I didn't look as closely when I ordered it.

Though looking at the prices now, I'm okay with not paying the like $500 premium for the Pro + RAM upgrade.

geerlingguy · 2024-11-12T17:58:58Z

Here's the power graph through the Cinebench runs:

It goes multi (default fan curve), GPU (default fan curve), single (3000 rpm), multi (3000 rpm), GPU (3000 rpm).

geerlingguy · 2024-11-12T20:03:10Z

@ThomasKaiser - Since the sbc-bench.sh script doesn't run on macOS/Darwin, would you like any tests in particular since I have my machine up and ready for some benchmarks? (Outside of what I've already run.)

geerlingguy · 2024-11-12T20:42:52Z

Disk results with AmorphousDiskMark 4.0.1:

geerlingguy · 2024-11-12T23:42:47Z

I ran my https://github.com/geerlingguy/ollama-benchmark a couple times, even tried loading up the llama3.1:70b model but that tried consuming almost 40 GB of RAM—which meant 12 GB of swap, and slaughtered the performance. Need more RAM to run larger models.

andrewginns · 2024-11-13T00:16:42Z

Would be great to run this standardised benchmark of llama models!

ggerganov/llama.cpp#4167

The pinned commit is very old (but required for obvious reasons) so newer versions of the repo might also be interesting to explore to see the impact and efficiency of the GPU/Neural engine.

film42 · 2024-11-13T00:19:18Z

I can hear the fan, but it's certainly very quiet. Only barely audible in the studio. Fan is at 1920 rpm

That's good news, but ouch! That seems like a really slow fan. Curious to know how well it handles high-res video transcoding. I'm sure I'm not the only one who thinks this screams "my next media server" for only $550 (on Amazon).

nreilly · 2024-11-13T03:35:50Z

At the office, where the Mac mini is in open air (ambient temp 23°C / 73°F), Cinebench is still pushing the SoC to 100°F pretty quickly. Maybe Apple's fan curve isn't aggressive enough?

I can hear the fan, but it's certainly very quiet. Only barely audible in the studio. Fan is at 1920 rpm

Please consider testing in High Power Mode.

geerlingguy · 2024-11-13T05:44:03Z

@nreilly - Low Power mode has been switched off in all my testing:

nreilly · 2024-11-13T08:03:40Z

@nreilly - Low Power mode has been switched off in all my testing

The Mac mini (2024) should have 3 options for the energy level. Low, Automatic and High. The Apple article indicates it's available for the Mac mini (2024), and doesn't clarify it's only for the M4 Pro versions, so I would expect it to be an option for you, but your screenshot obviously says no.

imadcat · 2024-11-13T10:09:23Z

Great thx for sharing! Is there also a computer vision (non LLM) AI benchmark result?

JayBrown · 2024-11-13T12:08:11Z

If only APFS and Disk Utility offered RAID5 functionality… I'd use this as a home server in a heartbeat, with two external M.2 thunderbolt enclosures, each sporting an M.2 gen3 x2 to 6xSATA adapter… using three SATA SSDs on each adapter for a combined storage pool. Afaik the only way to implement RAID5 is by using OWC's SoftRAID, and that's a subscription model, and I don't know if we should trust their software for an application as fundamental as storage. OpenZFS seems to be buggy as hell on macOS, so RAIDz1 is probably out of the question. Come on, Apple, give us more RAID levels already. 🙏

mibosshard · 2024-11-13T21:04:21Z

the M4 (non-Pro) only has the Low Power Mode. only M4 Pro Mac mini has High Power Mode. in Low Power Mode, the M4 SoC consumes less than 7 W under full load (HandBrake H.265 encoding on the CPUs).
https://browser.geekbench.com/v6/cpu/compare/8799758?baseline=8802915

geerlingguy · 2024-11-20T04:20:40Z

I unplugged 10 GbE and just used WiFi 6, and idle power consumption goes from 6W to 4.1W.

The HPL efficiency score increased from 6.74 Gflop/W to 7.57 Gflops/W (much wow).

film42 · 2024-11-20T04:51:29Z

Wow! Does apple power down the radio completely when wifi and Bluetooth are disabled? Can’t help but wonder if radios are using power at idle when 10 GbE is connected.

ThomasKaiser · 2024-11-20T07:55:03Z

I unplugged 10 GbE and just used WiFi 6

Have you also tried to 'downgrade' the Ethernet link by connecting to a GbE or 2.5GbE switch port and measured idle consumption?

geerlingguy · 2024-11-20T16:25:51Z

@ThomasKaiser - I haven't tried that yet. May do so soon, left my power cord at home so I can't test right now :( (at least unless I have a compatible AC cord laying around in one of my boxes lol)

geerlingguy · 2024-11-20T20:49:50Z

For some comparisons:

akarabach · 2024-11-21T12:15:11Z

Looks like a good candidate to run as a home server. With 2 thunderbolt 5 ports its not a problem anymore to connect even 990 pro ssds. Would love to see such kind of video on the channel!

ThomasKaiser · 2024-11-21T14:06:49Z

Since the sbc-bench.sh script doesn't run on macOS/Darwin, would you like any tests in particular

@geerlingguy in case the Mini is still in 'testing territory' with Xcode installed it should take just a few minutes of your time to get 7-zip 16.02 scores (16.02 to be comparable with sbc-bench scores – versions from 17.04 on up may perform different):

git clone https://github.com/ThomasKaiser/p7zip
cd p7zip
make -j$(sysctl -n hw.ncpu) INSTALL=install CC=gcc CXX=g++ OPTFLAGS='-O3'
for i in 1 2 3 4 5 ; do bin/7za b -mmt=1 | grep "Tot:" ; done | sort -n | tail -n1 ; for i in 1 2 3 4 5 ; do bin/7za b -mmt=6 | grep "Tot:" ; done | sort -n | tail -n1

This will download 16.02 sources (with all known vulnerabilities patched), build from source and run the 7-zip benchmark single-threaded and on all 6 P-cores in parallel (5 continous runs each, then displaying best score each).

Numbers to be compared with (counting/measuring only P-cores on the Macs):

192-core AmpereOne A192-32X (3200 MHz): single: 4783, multi: 745720 (155.91 ratio)
8-core Apple M1 Pro (3228 MHz): single: 5540, multi: 46540 (8.39 ratio)
8-core Apple M1 Max (3228 MHz): single: 5845, multi: 52170 (8.93 ratio)
8-core Apple M2 Pro (3504 MHz): single: 6830, multi: 57120 (8.36 ratio)
4-core Apple M3 (4054 MHz): single: 7570, multi: 31380 (4.15 ratio)
6-core Apple M4 (? MHz): single: ?, multi: ? (? ratio)

(the ratio for all the Apple results is obviously strange since should be lower than count of threads)

I checked parallel with sudo powermetrics | grep frequency in a following run whether the cpufreqs were consistent when running single-threaded vs. multi and found on the systems tested no difference (unlike M1 and M2 in the past that were clocked lower when all P-cores were active compared to only 1 or 2).

DaveHarwoodNZ · 2024-12-06T17:29:49Z

why Apple still doesn't just go to M.2 is crazy

M.2 as in 'standard NVMe SSD with own controller'? Well, two reasons against:

lower profits

power efficiency ruined.

The integrated controller appears to offer low-overhead encryption which shows almost no performance degradation. Not that many people will compare devices with encryption enabled.

geerlingguy changed the title ~~M4 Pro Mac Mini~~ M4 Pro Mac mini Nov 12, 2024

geerlingguy changed the title ~~M4 Pro Mac mini~~ M4 Mac mini Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M4 Mac mini #57

M4 Mac mini #57

geerlingguy commented Nov 11, 2024 •

edited

Loading

geerlingguy commented Nov 11, 2024

geerlingguy commented Nov 12, 2024

ThomasKaiser commented Nov 12, 2024

geerlingguy commented Nov 12, 2024 •

edited

Loading

geerlingguy commented Nov 12, 2024 •

edited

Loading

geerlingguy commented Nov 12, 2024 •

edited

Loading

geerlingguy commented Nov 12, 2024

geerlingguy commented Nov 12, 2024 •

edited

Loading

geerlingguy commented Nov 12, 2024 •

edited

Loading

geerlingguy commented Nov 12, 2024

geerlingguy commented Nov 12, 2024

andrewginns commented Nov 13, 2024 •

edited

Loading

film42 commented Nov 13, 2024

nreilly commented Nov 13, 2024

geerlingguy commented Nov 13, 2024

nreilly commented Nov 13, 2024

imadcat commented Nov 13, 2024

JayBrown commented Nov 13, 2024

mibosshard commented Nov 13, 2024

geerlingguy commented Nov 20, 2024

film42 commented Nov 20, 2024

ThomasKaiser commented Nov 20, 2024

geerlingguy commented Nov 20, 2024 •

edited

Loading

geerlingguy commented Nov 20, 2024

akarabach commented Nov 21, 2024

ThomasKaiser commented Nov 21, 2024 •

edited

Loading

DaveHarwoodNZ commented Dec 6, 2024

M4 Mac mini #57

M4 Mac mini #57

Comments

geerlingguy commented Nov 11, 2024 • edited Loading

Basic information

Linux/system information

Benchmark results

CPU

Power

Disk

Internal Apple Storage

Network

GPU

Memory

sbc-bench results

Phoronix Test Suite

Additional Benchmarks

Ollama (LLMs)

geerlingguy commented Nov 11, 2024

geerlingguy commented Nov 12, 2024

ThomasKaiser commented Nov 12, 2024

geerlingguy commented Nov 12, 2024 • edited Loading

geerlingguy commented Nov 12, 2024 • edited Loading

geerlingguy commented Nov 12, 2024 • edited Loading

geerlingguy commented Nov 12, 2024

geerlingguy commented Nov 12, 2024 • edited Loading

geerlingguy commented Nov 12, 2024 • edited Loading

geerlingguy commented Nov 12, 2024

geerlingguy commented Nov 12, 2024

andrewginns commented Nov 13, 2024 • edited Loading

film42 commented Nov 13, 2024

nreilly commented Nov 13, 2024

geerlingguy commented Nov 13, 2024

nreilly commented Nov 13, 2024

imadcat commented Nov 13, 2024

JayBrown commented Nov 13, 2024

mibosshard commented Nov 13, 2024

geerlingguy commented Nov 20, 2024

film42 commented Nov 20, 2024

ThomasKaiser commented Nov 20, 2024

geerlingguy commented Nov 20, 2024 • edited Loading

geerlingguy commented Nov 20, 2024

akarabach commented Nov 21, 2024

ThomasKaiser commented Nov 21, 2024 • edited Loading

DaveHarwoodNZ commented Dec 6, 2024

geerlingguy commented Nov 11, 2024 •

edited

Loading

`sbc-bench` results

geerlingguy commented Nov 12, 2024 •

edited

Loading

geerlingguy commented Nov 12, 2024 •

edited

Loading

geerlingguy commented Nov 12, 2024 •

edited

Loading

geerlingguy commented Nov 12, 2024 •

edited

Loading

geerlingguy commented Nov 12, 2024 •

edited

Loading

andrewginns commented Nov 13, 2024 •

edited

Loading

geerlingguy commented Nov 20, 2024 •

edited

Loading

ThomasKaiser commented Nov 21, 2024 •

edited

Loading