Normal view
Book Review: ‘Careless People,’ by Sarah Wynn-Williams
-
The Eclectic Light Company
- M4 Pro full on: when CPU and GPU draw over 50 W, and how Low Power mode changes that
M4 Pro full on: when CPU and GPU draw over 50 W, and how Low Power mode changes that
Most testing and benchmarks avoid putting heavy loads on CPU and GPU at the same time, so running an Apple silicon chip ‘full on’. This article explores what happens in the CPU and GPU of an M4 Pro when they’re drawing a total of over 50 W, and how that changes in Low Power mode. It concludes my investigations of power modes, for the time being.
Methods
Three test runs were performed on a Mac mini M4 Pro with 10 P and 4 E cores, and a 20-core GPU. In each run, Blender Benchmarks were run using Metal, and shortly after the start of the first of those, monster, 3 billion tight loops of NEON code were run on CPU cores at maximum Quality of Service in 10 threads. From previous separate runs, the monster test runs the GPU at its maximum frequency of 1,578 MHz and 100% active residency, to use about 20 W, and that NEON code runs all 10 P cores at high frequency of about 3,852 MHz and 100% active residency to use about 32 W. This combined testing was performed in each of the three power modes: Low Power, Automatic, and High Power.
In addition to recording test performance, powermetrics
was run during the start of each NEON test at its shortest sampling period, with both cpu_power
and gpu_power
samplers active.
Performance
There was no difference in performance between High Power and Automatic settings, which completed both tasks with the same performance as when they were run separately:
- NEON time separate 2.12 s, together High Power 2.12 s, Auto 2.12 s
- monster performance separate 1215-1220, together High Power 1221, Auto 1220.
As expected, Low Power performance was greatly reduced. NEON time was 4.33 s (49% performance), even slower than running alone at Low Power (2.87 s), and monster performance 795, slightly lower than running alone at Low Power (837).
High Power mode
This first graph shows CPU core cluster frequencies and active residencies for a period of 0.3 seconds when the monster test was already running, and the NEON test was started.
At time 0, the P0 cluster (black) was shut down, and the P1 cluster (red) running with one core at 100% active residency, a second at about 60%, and at about 3,900 MHz. As the ten test threads were loaded onto the two clusters, cluster frequencies were quickly brought to 3,852 MHz, by reducing that of the P1 cluster and rapidly increasing that of the P0 cluster.
By 0.1 seconds, both clusters were at full active residency and running at 3,852 MHz, where they remained until the NEON test threads completed.
Power used by the CPU followed the same pattern, rising rapidly from about 6,000 mW to about 32,000 mW at 0.1 seconds. GPU power varied between 8,600-23,000 mW, resulting in a peak total power of slightly less than 52,000 mW, and a dip to 40,600 mW. Typical sustained power with both CPU and GPU tests running was 50-52 W.
Low Power mode
These results are more complicated, and involve significant use of the E cluster.
This graph shows active residency alone, and this time includes the E cluster, shown in blue, and the GPU, in purple. NEON test threads were initially loaded into the two P clusters, filling them at 0.13 seconds. After that, threads were moved from some of those P cores to run on E cores instead, leaving just two test threads running on each of the P clusters by 0.26 seconds. Over much of that time the GPU had full active residency, but as that fell threads were moved from E cores back to P cores. By the end of this period of 0.5 seconds, 4 of 5 cores in each of the two P clusters were at 100%, and the GPU was also at 100% active residency.
This bar chart shows changing cluster total active residency for the E (red) and two P (blues) clusters by sample. With 10 test threads and significant overhead, the total should have reached at least 1,000%, which was only achieved in sample 4, and from sample 13 onwards.
Those active residencies are shown in the lower section of this graph (with open circles), together with cluster frequencies (filled circles) above them. As the P clusters were being loaded with test threads, both P clusters (black) were brought to a frequency of only 1,800 MHz, compared with 3,852 MHz in the High Power test. The E cluster (blue) was run throughout at its maximum frequency of 2,592 MHz, except for one sample period. GPU frequency (purple) remained below 1,000 MHz throughout, compared with a steady maximum of 1,578 MHz when at High Power.
Power changed throughout this initial period running the NEON test. Initially, CPU power (red) rose to a peak of 6,660 mW, then fell slowly to 3,500 mW before rising again to about 6,000 mW. GPU power rose to peak at just over 7,000 mW, but at one stage fell to only 26 mW. Total power used by the CPU and GPU ranged between 11-13.2 W, apart from a short period when it fell below 5 W. Those are all far lower than the steadier power use in High Power mode.
How macOS limits power
Running these tests in Low Power mode elicited some of the most sophisticated controls I have seen in Apple silicon chips. Compared to being run unfettered in Automatic or High Power mode, macOS used a combination of strategies to keep CPU and GPU total power use below 13.5 W:
- P core frequencies were limited to 1,800 MHz, instead of 3,852 MHz.
- High QoS threads that would normally have been run on P cores were transferred to E cores, which were then run at their maximum frequency of 2,592 MHz.
- Threads continued to be transferred between E and P cores to balance performance against power use.
- GPU frequency was limited to below 1,000 MHz.
- Despite reducing power use to a total of 25% of High Power mode, effects on performance were far less, attaining about 50% of that at High Power mode.
References
How Low Power mode controls CPU cores
Power Modes and Apple silicon CPUs
Last Week on My Mac: Power throttle
Inside M4 chips: CPU power, energy and mystery
Inside M4 chips: Matrix processing and Power Modes
Power Modes and Apple Silicon GPUs
Evaluating M3 Pro CPU cores: 1 General performance
Explainer
Residency is the percentage of time a core is in a specific state. Idle residency is thus the percentage of time that core is idle and not processing instructions. Active residency is the percentage of time it isn’t idle, but is actively processing instructions. Down residency is the percentage of time the core is shut down. All these are independent of the core’s frequency or clock speed.
三重门 – 1
听【随机波动 134:一边做官一边自省是可能的吗】,来宾是《世上为什么要有图书馆》的作者杨素秋,作为陕西科技大学的一位老师,在某种政府轮值体系下,到西安市碑林区做了一年的文旅副局长,在这一年间,创建了碑林区的第一座图书馆。在布置图书馆,尤其是选书的过程中,坚持品味,拒绝了各种以回扣为主的劣质书商。这本书的很大一部分,就是她在建馆过程中,对整个官僚体系的吐槽。
听播客的时候,我一直在走神。思考的东西和播客内容关系不大:关于在体制内生存,同时还有「良知」的人,我对这样的人,是什么样的态度?态度有什么变化?他们和我,到底有着怎样的关联 or 距离呢?
随着进入体制成为一种,在利己乃至求生的维度上,越来越理所当然的选择。因为它太普及了,于是,它所伴随的(在我的同温层面上的)罪孽感、耻辱感,反而没有多年前那么重了。一些三观基本靠谱的人,也选择了进入体制工作。他们或者听家里安排、随波逐流,或者也有一些鸡贼谋利的心思,或者……在其它层面烦扰的事情太多了,在这一方面也就无所谓怎样了。然后,这群人在日常工作环境中,一方面确实承受了体制环境的痛苦;另一方面,会从他们所在的位置和视角,对体制进行更多的观察和感受。就像社交网络上看到的吐槽,就像播客里对《世上为什么要有图书馆》的评价:一本难得的,从自上而下的视角描绘官僚系统的田野笔记。
作者谈到自己在文旅局挂靠一年时的心态,和我的一些工作经历有点像,——知道自己只是一个过客,于是和那些必须依赖这个系统而生存的人,心态和生活方式都不一样的。在很多地方,我是抱着「围观顺便领一份薪水」的态度工作的,我知道过不了多久就会辞职离开,我不会迫于,为了让自己在这个系统里长久待下去,而去做一些更深的改变。于是我无所谓会哟一些个性张扬、或者相对于环境出格的表现,而这些表现,会获得那些在体制内生存而三观还 ok 的人的欣赏、赞扬、甚至共鸣。于是我们日常的聊天内容,也可以更多彩一些,即使在国委办公室里,也能找到这样的人。某种意义上讲,体制内这样的人多了,可能体制也会随之而改变吧?——打住!最后这句属于过分意淫了,不可能的。
然而,其实和这样的人,还是能够感到一种隔阂的。我不是在说政治观点的不同,而是(人生历险 vs 稳妥过日子)这样的方面。他们可能刚毕业就结婚,可能是妈宝男,或者老公家里有钱……虽然对方也会口头上感慨,说羡慕我的生活方式,但我能看出,那显然不会是对方的选择。——这些当然也不会影响我们在办公室日常闲侃,但有时遇到一些,不涉及立场,却展现出(激情 vs 保守)的小事时,大家的选择都不一样。
二三十年前,还没那么多被互联网揭露出的社会事件,大家还不怎么谈政治的时候,我和他们的各种生活方式上的分歧,就始终存在,渐行渐远。而这些年,只是在政治、性别意识……等方面,又新加了一层层滤网。大多数人,连这些新滤网都无法通过,于是,能够体会生活方式分歧的机会,反而越来越少了。我最近反思后发觉,自己似乎把政治、性别等这些方面的同温层,看得过于决定性了?这些确实很重要,是做朋友,不,是做人的基本标准,但满足了这些维度的人,也未必就能快乐地玩耍到一起。那些几十年间被掩盖的分歧,没什么机会去触碰的分歧,其实都还在。
上面的想法,是我听播客时就有了的。但我坚持等到,把那本《世上为什么要有图书馆》读过,再来整理确认那些文字。不然,只凭播客里的访谈,就说和作者有共鸣,或者匆匆标榜出距离,感觉都很奇怪。因为我在听播客时,也能感觉出,作者和《随机波动》的主播们,有些微妙的频道差异,经常是这一方兴高采烈提起某个话题,另一方不感兴趣就岔开了。总之经常有不对劲的地方。
书写的不错。后半段塞了很多文化随笔,和主题关系不大,但前面那些吐槽官僚,和筹划图书馆的部分,很好看的,推荐去读。但我意识到不对劲的地方在哪了。作者经常反思,对于自己占据权力高位,是否会迷失的自省或自嘲。在遵照上级指示,去各种店面视察时,一边吐槽,一边也尽量应付了事。但在新冠疫情期间,检查酒店是否非法采买海外生鲜时,格外严格、敏锐,文中隐隐为自己能揪出不法商贩而自矜。大概作者是按部就班,家庭美满,于是比较惜命的人,遇到真正在乎的场合,潜意识就直接站在了权力的那一边。——我可以选择不使用手里的权力,但需要的话,可以随时把它拿起来。