Normal view
Her Grandfather Owned the Yankees. Now She’s Producing ‘Damn Yankees.’
Trump’s Coal Plan Is Doomed
France’s Energy Giant Sees Opportunity in the Volatile Electricity Market
© Desiré van den Berg for The New York Times
Are Your Energy Costs Rising?
Iranians Brace for Economic Impact of New U.N. Sanctions
At U.N. Summit, China Promises to Reduce Greenhouse Gas Emissions
What Will China’s Green-Tech Ambitions Cost the World?
Which cores does Visual Look Up use?
A couple of weeks ago I estimated how much power and energy were used when performing Visual Look Up (VLU) on an Apple silicon Mac, and was surprised to discover how little that was, concluding that “it’s not actually that demanding on the capability of the hardware”. This article returns to those measurements and looks in more detail at what the CPU cores and GPU were doing.
That previous article gives full details of what I did. In brief, this was performed on a Mac mini M4 Pro running macOS Sequoia 15.6.1, using an image of cattle in a field, opened in Preview. powermetrics
collected samples in periods of 100 ms throughout, and a full log extract was obtained to relate time to logged events.
Power use by CPU cores, GPU and neural engine (ANE) are shown in this chart from that article. This tallies against log records for the main work in VLU being performed in samples 10-24, representing a time interval of approximately 1.0-2.4 seconds after the start. There were also briefer periods of activity around 3.2 seconds on the GPU, 4.2 seconds on the CPU, and 6.6-7.1 seconds on the CPU. The latter correlated with online access to Apple’s SMOOT service to populate and display the VLU window.
To gain further detail, powermetrics
measurements of CPU core cluster frequencies, active residencies of each core, and GPU frequency and active residency, were analysed for the first 80 collection periods.
Frequency and active residency
Cluster frequencies in MHz are shown in the chart above for the one E and two P clusters, and the GPU. These show:
- The E cores (black) ran at a baseline of 1200-1300 MHz for much of the time, reaching their maximum frequency of 2592 MHz during the main VLU period at 1.0-2.4 seconds.
- The first P cluster (blue), P0, was active in short bursts over the first 1.5 seconds, and again between 6.3-7.0 seconds. For the remainder of the period the cluster was shut down.
- The second P cluster (red), P1, was most active during the three periods of high power use, although it didn’t quite reach its maximum frequency of 4512 MHz. When there was little core activity, it was left to idle at 1260 MHz but wasn’t shut down.
- The GPU (yellow) ran at 338 MHz or was shut down for almost all the time, with one brief peak at 927 MHz.
This chart shows the total active residencies for each of the three CPU clusters, obtained by adding their % measurements. Thus the maximum for the E cluster is 400%, and 500% for each of the two P clusters, and 1,400% in all. These are broadly equivalent to the CPU % shown in Activity Monitor, and take no account of frequency. These show:
- The E cores (pale blue) had the highest active residency throughout, ranging from as little as 30% when almost idle around 5 seconds, to just over 300% during the main VLU phase at 1.4 seconds.
- The first P cluster (purple) remained almost inactive throughout.
- The second P cluster (red) was only active during the periods of highest work, particularly between 1.0-2.4 seconds and again at 6.4-7.1 seconds. For much of the rest of the test it had close to zero active residency.
Taken together, these show that a substantial proportion of the processing undertaken in VLU was performed by the E cores, with shorter peaks of activity in some of the cores in the second P cluster. For much of the time, though, all ten P cores were either idle or shut down.
Load
Combining frequency and active residency into a single value is difficult for the two types of CPU core. To provide a rough metric, I have calculated ‘cluster load’ astotal cluster active residency x (cluster frequency / maximum core frequency)
where the maximum frequency of these E cores is taken as 2592 MHz, and the P cores as 4512 MHz. For example, in the sample period at 2.2 seconds, the P1 cluster frequency was 4449 MHz, and the total active residency for the five cores was 122%. Thus the P1 cluster load was 122 x (4449/4512) = 120.3%. Maximum load for that cluster would have been 500%.
The chart above shows load values for:
- The E cluster (black) riseing to 150-260% during the peak of VLU activity, from a baseline of 20-30%.
- The P0 cluster (blue) which never reached 10% after the initial sample at 0 seconds.
- The P1 cluster (red) spiking at 90-150% during the three most active phases, otherwise remaining below 10%.
Caution is required when comparing E with P cores on this measurement, as not only is E core maximum frequency only 57% that of P cores, but it’s generally assumed that their maximum processing capacity is roughly half that of P cores. Even with that reservation, it’s clear that a substantial proportion of the processing performed in this VLU was on the E cores, with just one cluster of P cores active in short spikes.
Finally, it’s possible to examine the correlation between total P cluster load and total CPU power.
This chart shows calculated total P load and reported total CPU power use. The linear regression shown isCPU power = 4.1 + (42.2 x total load)
giving a power use of 4,200 mW for a load of 100%, equating to a single P core running at maximum frequency.
Conclusions
- Cluster frequencies and active residencies measured in CPU cores followed the same phases as seen in CPU power, with most of the processing load of VLU in the the early stage, between 1.0-2.4 seconds, a shorter peak at 6.6-7.1 seconds correlating with online lookup, and a small peak at about 4.2 seconds.
- A substantial proportion of the processing performed for VLU was run on E rather than P cores, with P cores only being used for brief periods.
- Visual Look Up used remarkably little of the capability of an M4 Pro chip.
Exxon Wants to Make More of the Materials Needed for E.V. Batteries
China’s Renewable Energy Investment Helping Stem Fossil Fuel Growth, Report Says
Last Week on My Mac: Coming soon to your Mac’s neural engine
If you’ve read any of my articles here about the inner workings of CPU cores in Apple silicon chips, you’ll know I’m no stranger to using the command tool powermetrics
to discover what they’re up to. Last week I attempted something more adventurous when trying to estimate how much power and energy are used in a single Visual Look Up (VLU).
My previous tests have been far simpler: start powermetrics
collecting sample periods using Terminal, then run a set number of core-intensive threads in my app AsmAttic, knowing those would complete before that sampling stopped. Analysing dozens of sets of measurements of core active residency, frequency and power use is pedestrian, but there’s no doubt as to when the tests were running, nor which cores they were using.
VLU was more intricate, in that once powermetrics
had started sampling, I had to double-click an image to open it in Preview, wait until its Info tool showed stars to indicate that stage was complete, open the Info window, spot the buttons that appeared on recognised objects, select one and click on it to open the Look Up window. All steps had to be completed within the 10 seconds of sampling collections, leaving me with the task of matching nearly 11,000 log entries for that interval against sampling periods in powermetrics'
hundred samples.
The first problem is syncing time between the log, which gives each entry down to the microsecond, and the sampling periods. Although the latter are supposed to be 100 ms duration, in practice powermetrics
is slightly slower, and most ranged between about 116 and 129 ms. As the start time of each period is only given to the nearest second, it’s impossible to know exactly when each sample was obtained.
Correlating log entries with events apparent in the time-course of power use is also tricky. Some are obvious, and the start of sampling was perhaps the easiest giveaway as powermetrics
has to be run using sudo
to obtain elevated privileges, which leaves unmistakeable evidence in the log. Clicks made on Preview’s tools are readily missed, though, even when you have a good estimate of the time they occurred.
Thus, the sequence of events is known with confidence, and it’s not hard to establish when VLU was occurring. As a result, estimating overall power and energy use for the whole VLU also has good confidence, although establishing finer detail is more challenging.
The final caution applies to all power measurements made using powermetrics
, that those are approximate and uncalibrated. What may be reported as 40 mW could be more like 10 or 100 mW.
In the midst of this abundance of caution, one fact stands clear: VLU hardly stresses any part of an Apple silicon chip. Power used during the peak of CPU core, GPU and neural engine (ANE) activity was a small fraction of the values measured during my previous core-intensive testing. At no time did the ten P cores in my M4 Pro come close to the power used when running more than one thread of intensive floating-point arithmetic, and the GPU and ANE spent much of time twiddling their thumbs.
Yet when Apple released VLU in macOS Monterey, it hadn’t been expecting to be able to implement it at all in Intel chips because of its computational demand. What still looks like magic can now be accomplished with ease even in a base M1 model. And when we care to leave our Macs running, mediaanalysisd
will plod steadily through recently saved images performing object recognition and classification to add them to Spotlight’s indexes, enabling us to search images by labels describing their contents. Further digging in Apple’s documentation reveals that VLU and indexing of discovered object types is currently limited by language to English, French, German, Italian, Spanish and Japanese.
Some time in the next week or three, when Apple releases macOS Tahoe, we’ll start seeing Apple silicon Macs stretch their wings with the first apps to use its Foundation Models. These are based on the same Large Language Models (LLMs) already used in Writing Tools, and run entirely on-device, unlike ChatGPT. This has unfortunately been eclipsed by Tahoe’s controversial redesign, but as more developers get to grips with these new AI capabilities, you should start to see increasingly novel features appearing.
What developers will do with them is currently less certain. These LLMs are capable of working with text including dialogue, thus are likely to appear early in games, and should provide specialist variants of more generic Writing Tools. They can also return numbers rather than text, and suggest and execute commands and actions that could be used in predictive automation. Unlike previous support for AI techniques such as neural networks, Foundation Models present a simple, high-level interface that can require just a few lines of code.
If you’ve got an Apple silicon Mac, there’s a lot of potential coming in Tahoe, once you’ve jiggled its settings to accommodate its new style.
How much power does Visual Look Up use?
Look in the log, and Visual Look Up (VLU) on an Apple silicon Mac apparently involves a great deal of work, in both CPU cores and the neural engine (ANE). This article reports a first attempt to estimate power and energy use for a single VLU on an image.
To estimate this, I measured CPU, GPU and ANE power in sampling periods of 100 ms using powermetrics
, and correlated events seen there with those recorded in a log extract over the same period, obtained using LogUI. The test was performed on a Mac mini M4 Pro running macOS 15.6.1, using Preview to perform the VLU on a single image showing a small group of cattle in an upland field. Power measurements were collected from a moment immediately before opening the image, and ceased several seconds after VLU was complete.
When used like this, powermetrics
imposes remarkably little overhead on the CPU cores, but its sampling periods are neither exact nor identical. This makes it difficult to correlate log entries and their precise timestamps with sampling periods. While powermetrics
gives power use in mW, those measurements aren’t calibrated and making assumptions about their accuracy is hazardous. Nevertheless, they remain the best estimates available.
Log narrative
The first step in log analysis was to identify the starting time of powermetrics
sampling periods. Although execution of that command left no trace in its entries, as it has to be run with elevated privileges using sudo
, its approval was obvious in entries concluding30.677182 com.apple.opendirectoryd ODRecordVerifyPassword completed
A subsequent entry at 30.688828 seconds was thus chosen as the start time for sampling periods, and all times given below as given in seconds after that time zero.
The following relevant events were identified in the log extract at elapsed times given in seconds:
- 1.3
com.apple.VisionKit
Signpost Begin: “VisionKit MAD Parse Request” - 1.3
com.apple.mediaanalysis
Running task VCPMADServiceImageProcessingTask - 1.4 ANE started and an ObjectDetectionModel run for 0.2 s
- 1.6 ANE activity and a NatureWorldModel run for 0.25 s
- 2.0 ANE activity for 0.15 s
- 2.4 ANE activity for 0.1 s
- 8.1 ANE activity and a UnifiedModel run for 0.01 s
- 8.1 PegasusKit queried Apple’s SMOOT service, the external connection used to populate the VLU window.
Thus, the ANE was run almost continuously from 1.4-2.2 seconds after the start of sampling, otherwise was used little over the total period of about 9 seconds. Over that period of activity, an initial model used to detect objects was succeeded by a later model to identify objects in a ‘nature world’.
Power and energy estimates
From the log record, it was deduced that the VLU was started in powermetrics
sample 10 (1.0 seconds elapsed), and essentially complete by sample 75 (7.5 seconds elapsed), a period of approximately 6.5 seconds, following which power use was low until the end of the sampling periods. All subsequent calculations refer to that series of samples and period of time.
Sums, averages and maxima of power measurements for that period of 6.5 seconds are:
- CPU 64,289 mW total, 989 mW average, 7,083 mW maximum (10 P cores)
- GPU 3,151 mW total, 48 mW average, 960 mW maximum (20 cores)
- ANE 1,551 mW total, 24 mW average, 671 mW maximum
- total 68,991 mW total, 1,061 mW average, 7,083 mW maximum.
Thus for the whole VLU, 93% of power was used by the CPU, 4.6% by the GPU, and only 2.2% by the ANE.
For comparison, in the M4 Pro chip running maximal in-core loads, each P core can use 1.3 W running floating point code, and 3 W running NEON code. The chip’s 20-core GPU was previously measured as using a steady maximum power of 20 W, with peaks at 25 W.
As each power sample covers 0.1 seconds, energy used during each sampling period is power/0.1, thus the total energy used over the 6.5 second period of VLU is:
- CPU 6.4 J
- GPU 0.3 J
- ANE 0.2 J
- total 6.9 J.
Those are small compared to the test threads used previously, which cost 3-8 J for each P core used.
Power over time
Power used in each 100 ms sampling period varied considerably over the whole 10 seconds. The chart below shows total power for the CPU.
Highest power was recorded between samples 10-25, corresponding to 1.0-2.5 seconds elapsed since the start of measurements, and most events identified in the log. Later bursts of power use occurred at about 4.2 seconds, and between 6.6-7.1 seconds, which most probably corresponded to opening the info window and performing the selected look-up.
Almost all power use by the neural engine occurred between 1.5-2.1 seconds, correlating well with the period in which substantial models were being run.
Peak GPU power use occurred around 1.0-1.5 seconds when the image was first displayed, at 3.1-3.2 seconds, and between 6.5-7.4 seconds. It’s not known whether any of those were the result of image processing for VLU as GPU-related log entries are unusual.
Composite total power use demonstrates how small and infrequent ANE and GPU use was in comparison to that of the CPU.
Conclusions
Given the limitations of this single set of measurements, I suggest that, on Apple silicon Macs
- power and energy cost of VLU is remarkably low;
- the great majority of work done in VLU is performed in the CPU;
- although use of the neural engine may result in substantial performance improvements, VLU doesn’t make heavy demands on the neural engine in terms of power or energy use;
- VLU may appear impressive, but it’s not actually that demanding on the capability of the hardware.
On the Front Lines of Climate Change, Firefighters Are Getting Very Sick
How China Went From Clean Energy Copycat to Global Innovator
Trump Is Turning Us Into a Doddering Industrial Giant
三重门 – 1
听【随机波动 134:一边做官一边自省是可能的吗】,来宾是《世上为什么要有图书馆》的作者杨素秋,作为陕西科技大学的一位老师,在某种政府轮值体系下,到西安市碑林区做了一年的文旅副局长,在这一年间,创建了碑林区的第一座图书馆。在布置图书馆,尤其是选书的过程中,坚持品味,拒绝了各种以回扣为主的劣质书商。这本书的很大一部分,就是她在建馆过程中,对整个官僚体系的吐槽。
听播客的时候,我一直在走神。思考的东西和播客内容关系不大:关于在体制内生存,同时还有「良知」的人,我对这样的人,是什么样的态度?态度有什么变化?他们和我,到底有着怎样的关联 or 距离呢?
随着进入体制成为一种,在利己乃至求生的维度上,越来越理所当然的选择。因为它太普及了,于是,它所伴随的(在我的同温层面上的)罪孽感、耻辱感,反而没有多年前那么重了。一些三观基本靠谱的人,也选择了进入体制工作。他们或者听家里安排、随波逐流,或者也有一些鸡贼谋利的心思,或者……在其它层面烦扰的事情太多了,在这一方面也就无所谓怎样了。然后,这群人在日常工作环境中,一方面确实承受了体制环境的痛苦;另一方面,会从他们所在的位置和视角,对体制进行更多的观察和感受。就像社交网络上看到的吐槽,就像播客里对《世上为什么要有图书馆》的评价:一本难得的,从自上而下的视角描绘官僚系统的田野笔记。
作者谈到自己在文旅局挂靠一年时的心态,和我的一些工作经历有点像,——知道自己只是一个过客,于是和那些必须依赖这个系统而生存的人,心态和生活方式都不一样的。在很多地方,我是抱着「围观顺便领一份薪水」的态度工作的,我知道过不了多久就会辞职离开,我不会迫于,为了让自己在这个系统里长久待下去,而去做一些更深的改变。于是我无所谓会哟一些个性张扬、或者相对于环境出格的表现,而这些表现,会获得那些在体制内生存而三观还 ok 的人的欣赏、赞扬、甚至共鸣。于是我们日常的聊天内容,也可以更多彩一些,即使在国委办公室里,也能找到这样的人。某种意义上讲,体制内这样的人多了,可能体制也会随之而改变吧?——打住!最后这句属于过分意淫了,不可能的。
然而,其实和这样的人,还是能够感到一种隔阂的。我不是在说政治观点的不同,而是(人生历险 vs 稳妥过日子)这样的方面。他们可能刚毕业就结婚,可能是妈宝男,或者老公家里有钱……虽然对方也会口头上感慨,说羡慕我的生活方式,但我能看出,那显然不会是对方的选择。——这些当然也不会影响我们在办公室日常闲侃,但有时遇到一些,不涉及立场,却展现出(激情 vs 保守)的小事时,大家的选择都不一样。
二三十年前,还没那么多被互联网揭露出的社会事件,大家还不怎么谈政治的时候,我和他们的各种生活方式上的分歧,就始终存在,渐行渐远。而这些年,只是在政治、性别意识……等方面,又新加了一层层滤网。大多数人,连这些新滤网都无法通过,于是,能够体会生活方式分歧的机会,反而越来越少了。我最近反思后发觉,自己似乎把政治、性别等这些方面的同温层,看得过于决定性了?这些确实很重要,是做朋友,不,是做人的基本标准,但满足了这些维度的人,也未必就能快乐地玩耍到一起。那些几十年间被掩盖的分歧,没什么机会去触碰的分歧,其实都还在。
上面的想法,是我听播客时就有了的。但我坚持等到,把那本《世上为什么要有图书馆》读过,再来整理确认那些文字。不然,只凭播客里的访谈,就说和作者有共鸣,或者匆匆标榜出距离,感觉都很奇怪。因为我在听播客时,也能感觉出,作者和《随机波动》的主播们,有些微妙的频道差异,经常是这一方兴高采烈提起某个话题,另一方不感兴趣就岔开了。总之经常有不对劲的地方。
书写的不错。后半段塞了很多文化随笔,和主题关系不大,但前面那些吐槽官僚,和筹划图书馆的部分,很好看的,推荐去读。但我意识到不对劲的地方在哪了。作者经常反思,对于自己占据权力高位,是否会迷失的自省或自嘲。在遵照上级指示,去各种店面视察时,一边吐槽,一边也尽量应付了事。但在新冠疫情期间,检查酒店是否非法采买海外生鲜时,格外严格、敏锐,文中隐隐为自己能揪出不法商贩而自矜。大概作者是按部就班,家庭美满,于是比较惜命的人,遇到真正在乎的场合,潜意识就直接站在了权力的那一边。——我可以选择不使用手里的权力,但需要的话,可以随时把它拿起来。