Explainer: Yara rules
Security utilities that detect known malicious software do so using sets of detection rules. Since their introduction by Victor Alvarez of VirusTotal 12 years ago, the most common method of expressing these rules is in a text file with the extension .yara. Apparently YARA stands for either YARA: Another Recursive Acronym, or Yet Another Ridiculous Acronym.
Yara rules are used extensively in macOS by both the original XProtect and its sibling XProtect Remediator. Those of XProtect are found in the XProtect.yara file in XProtect.bundle, in /Library/Apple/System/Library/CoreServices and its additional location in /private/var/protected/xprotect in Sequoia and later. Further Yara rules are also encrypted and embedded in XProtect Remediator’s scanning modules, as detailed by Koh M. Nakagawa in the FFRI Security GitHub.
As used by Apple, each rule can consist of up to three sections:
- meta, containing the rule’s metadata including a description and in more recent cases a UUID for the rule.
- strings, specifying some of the content of the file, typically in the form of hexadecimal strings such as
{ A0 6B }
. - condition, a logical expression that, when satisfied, meets the requirements of that rule, so identifying it as malicious.
I’ll exemplify these using Yara rules used in XProtect version 5310.
Private rules
At the start of the Yara file are any private rules. These are a bit like macros, in that they define properties that are then used in multiple rules later. Laid out in compact form, an example reads:
private rule Shebang
{ meta:
description = "private rule to match shell scripts by shebang (!#)"
condition:
uint16(0) == 0x2123
}
This starts with description metadata for this private rule, then states the condition for satisfying this rule, that the first 16-bit unsigned integer in the file contains the hex 0x2123, the UTF-8 characters 0x21 or ! and 0x23 or #. In the reverse order of #!
they’re known as the shebang, and the opening characters of many shell scripts, but in this rule they are given the other way around because of the byte order used in the 16-bit integer.
This defines what’s required of a Shebang, and can be included in the conditions of other Yara rules. Instead of having to redefine the same feature in every rule, they can simply include that Shebang rule in their condition.
Regular rules
After 3-5 private rules, the XProtect Yara file goes on to enumerate 372 normal rules, including this one:
rule XProtect_MACOS_SOMA_JLEN
{meta:
description = "MACOS.SOMA.JLEN"
uuid = "4215C9D4-57D5-4D30-82E1-96477493E8D5"
strings:
$a0 = { 4c 8d 3d ?? 74 0b 00 4c 8d 25 ?? ?? 0b 00 4c 8d 2d ?? ?? 0b 00 48 8d 5d c0 }
$a1 = { 73 62 06 91 d4 03 00 f0 94 42 21 91 }
condition:
Macho and ( ( $a0 ) or ( $a1 ) ) and filesize < 5MB
}
This rule has its internal code name as its description, and has been assigned the UUID shown. It then defines two binary strings, a0
and a1
, the former containing ‘wild’ values expressed using the question mark ?. The condition for satisfying the rule is that the file must:
- be a Mach-O binary, and
- contain either a0 or a1, and
- its size must be less than 5 MB.
Further details about Yara rules are given here.
Implementation
There are standard implementations that can check files against custom sets of Yara rules. Rules are normally compiled into binary form from their text originals before use. Full details are given on VirusTotal.
Apple’s use of Yara files is mysterious, as for some years all the descriptions used arbitrary code names as obfuscation. When the source of all the rules is given in plain text, it’s hard to see what purpose that served, and it meant that users were told that MACOS.0e32a32 had been detected in an XProtect scan, for instance. Thankfully, Apple has more recently replaced most of those with more meaningful names.
I’m grateful to Duncan for asking me to explain this, and hope I have been successful. I’m also grateful to isometry and an anonymous commenter for straightening out the confusion over the Shebang.