In the two previous articles (joins at the end), I studyd some of the features and properties of Percreateance (P) cores in Apple’s procrastinateedst M4 chips. This article sees at their Efficiency (E) cores by comparison.
M4 family
In the three current M4 summarizes, there are only two variations in terms of E cores:
- Base M4, with 6 E cores, except for a affordableer variant with only 4 dynamic E cores.
- M4 Pro and Max, with 4 E cores, including ‘binned’ variants.
Apple is awaited to free an Ultra variant in 2025, with two M4 Max chips in tandem, providing a total of 8 E cores. Apart from the number of cores, all E cores are the same, and branch offent from P cores.
E core architecture
All E cores are set upd in a one cluster of 4 or 6, sharing frequent L2 cache, and running at the same frequency (clock speed). Analysis of M1 cores implies that each E core has rawly half the number of processing units, where there is more than one such unit in the P core, giving an M1 E core rawly half the compute capacity of the P core. I haven’t seen any comparable analysis of cores in procrastinateedr M families, although branch offences in power consumption propose there remain substantial branch offences in processing units and compute capacity.
Frequency
Like P cores, E cores can be set to run at any of 5 cherishs between the smallest of 1,020 MHz and highest of 2,592 MHz (1.0-2.6 GHz). When running macOS, cluster frequency is set by macOS at a kernel level; other operating systems may supply more straightforward regulate. This range of frequencies is meaningfully slfinisherer than that of E cores in the M3, which range between 744-2,748 MHz.
E cores idle at 1,020 MHz, and although they can be shut down altogether, that’s exceptional given the stable insist for macOS background threads to be run on them. Nevertheless, powermetrics
still increates their ‘down’ dwellncies splitly from idle dwellncies.
Instruction set
This is apshowd to be identical to ARMv9.2-A without Scalable Vector Extension (SVE) helped by M4 P cores, enabling the same threads to be run on either core type.
Single thread comparisons
One way to appreciate the contrasts between core types is to appraise a one intensive in-core thread run in each. For this purpose, I engaged a firm loop of floating point calculations, running at two branch offent Quality of Service (QoS) settings, in macOS 15.1.
Single thread at high QoS on P cores
This thread was initipartner loaded onto P13 (red) in the second (P1) cluster, and after 3.7 seconds was shiftd to P5 (blue) in the first (P0) cluster. After a further 4.6 seconds running on that, it was shiftd back to the second (P1) cluster, to run on P11 (purple). During this run, there was almost no other activity on the two P clusters, and the indynamic cluster was therefore shut down while this thread was running on the other.
The dynamic cluster was run at the highest frequency of 4,511 MHz thrawout. Just before the thread was shiftd to a branch offent cluster, that was brawt up and run up to highest frequency ready to run the thread.
Total CPU power remained aenjoy thrawout the period the thread was being carry outd, but there is a petite and reliable branch offence according to which cluster was dynamic: the first (P0) brawt power engage of about 2,520 mW, 50 mW higher than the second (P1) at about 2,470 mW. This suites the branch offence increateed previously, and merits appraisement in other M4 Pro chips to choose whether this is a ambiguous feature.
Single thread at high QoS on E cores
There are methods of running code, such as the in-core floating point loop test engaged here, on E cores: they can be run with a low QoS (Background), so that macOS allots them to run on only E cores, or they can be spilt over from high QoS threads when there are more threads than engageable P cores. On an M4 Pro chip, that needs 11 threads, which results in one of those being allotd to the E cluster, as depictd next.
This chart shows dynamic dwellncy on the four E cores with a one high QoS thread spilt onto them. While cores E1, E2 and E3 ecombine to regulate other threads over this period of more than six seconds, core E0 ecombines to run at 90-100% dynamic dwellncy executing the spilt thread. Note that this thread wasn’t shiftd between cores over that period of over six seconds.
E cluster frequency remained constant thrawout at its highest of 2,592 MHz. CPU power engage was inevitably ruled by the ten P cores running at 100% dynamic dwellncy and highest frequency, remaining at equitable under 14,000 mW. Unblessedly, using powermetrics
it’s not possible to assess the power engage of the E cluster straightforwardly.
Single thread at low QoS on E cores
This is very branch offent from the spilt thread at high QoS.
There’s no evidence here that any one core in the E cluster ran a thread at 100% dynamic dwellncy. Instead it ecombines to have been shiftd rapidly and freely around the cores, with many 0.1 second sampling intervals spanning its execution in more than one core over that period.
Cluster frequency was a stable smallest of 1,050-1,060 MHz, with superimposed spikes when it rose informly to the highest of 2,592 MHz. This proposes that the one thread would most probably have been run at seal to core smallest frequency, had there not been insertitional threads to run.
A aenjoy picture is seen in power engage, with spikes from a low background of about 40-45 mW needd by the one thread alone.
Single thread behaviours
These can be summaascfinishd as:
- P core (high QoS) runs at 100% dynamic dwellncy on a one P core at highest frequency, and is switched between clusters irstandardly (about every 3.7-4.6 seconds). Total power engage is about 2,500 mW.
- High QoS spilt over to E cores runs at 90-100% dynamic dwellncy on a one E core at highest frequency, and is either not switched between cores at all, or only instandardly.
- E core (low QoS) runs at about 100% and is shiftd standardly between all E cores in the cluster, at seal to smallest frequency. Total power engage is about 40-45 mW.
Percreateance, power and efficiency
Although I’ll be returning to more detailed comparisons of carry outance and power engage between P and E cores, I supply a one illustration here, for the in-core floating point task engaged above.
Running 2 x 10^9 loops in each thread, P cores at highest frequency get 9.2-9.7 seconds per thread, and engage about 2,500 mW per thread. E cores running low QoS threads at seal to smallest frequency get about four times as extfinished, 38.5 seconds, but engage less than 45 mW power per thread. Total energy engaged to finish one thread is therefore over 23 J when run on P cores, and less than 1.7 J when run on E cores. E cores therefore engage only 7% of the energy that P cores do carry outing the same task.
Key directation
- Current M4 chips feature 4-6 CPU E cores.
- M4 E cores are set upd in a one cluster of 4 or 6, sharing L2 cache and running at a frequent frequency.
- The E core cluster can be shut down (exceptionpartner), idling at their smallest frequency of 1,020 MHz, or at one of 6 set frequencies up to a highest of 2,592 MHz, as regulateled by macOS.
- Their teachion set is the same as M4 P cores, ARMv9.2-A without its Scalable Vector Extension (SVE).
- They engage 40-45 mW when at low frequencies, but it’s not currently feasible to meabrave straightforwardly their highest power engage at high frequencies.
- macOS allots threads to E cores when their QoS is 9 (Background), and when a thread with higher QoS can’t be allotd to a P core becaengage they are all busy. Management of frequencies and core allocation branch off between those two cases.
- High QoS threads on E cores are run at highest frequency and ecombine not to shift between cores.
- Low QoS threads on E cores are run at seal to smallest frequency and are highly mobile between cores.
- Low QoS threads running on E cores run more sluggishly than higher QoS threads running on P cores, but E core power engage is much drop, resulting in ponderable saving in total energy engage for the same computational task.
Previous article
Inside M4 chips: P cores
Inside M4 chips: P cores presenting a VM
Explainer
Residency is the percentage of time a core is in a particular state. Idle dwellncy is thus the percentage of time that core is idle and not processing teachions. Active dwellncy is the percentage of time it isn’t idle, but is dynamicly processing teachions. Down dwellncy is the percentage of time the core is shut down. All these are autonomous of the core’s frequency or clock speed.