Researchers from Cambridge University have disclosed information about a vulnerability that affects many modern software code compilers. The paper, titled Trojan Source, describes an insidious attack in which attackers can hide targeted malicious code in software source code.
The attack relies on how compilers handle the Unicode identifiers used to determine text orientation (left to right or vice versa). The weak point is the Unicode Bidi algorithm, which allows words written from right to left and from left to right to be combined. For example, thanks to this algorithm, it is possible to combine words in Arabic and English. It also allows the text written from right to left to be read from left to right and vice versa.
In some cases the sequencing set by the Bidi algorithm is not sufficient for switching the display order of character groups and in such cases special control characters are used. Bidi redefinition even makes it possible to display individual characters in an order different from their logical encoding.
Exploitation of the vulnerability makes it possible to add commands that will be displayed as part of a comment or line when the programmer checks the code. The source notes that this type of attack has previously been used to mask the file extensions of malware distributed via email in phishing campaigns. This approach allows vulnerabilities to be built into the source code and, unless they make significant changes to the logic, are difficult to detect during code review.
The researchers made their work publicly available a few months after its completion. During that time, several patches were prepared to fix the problem for developers using the Rust language. Additional recommendations to solve this problem for other programming languages will be published later.