Explore IFUNC with CVE-2024-3094

CVE-2024-3094

Malicious code was discovered in the upstream tarballs of xz, starting with version 5.6.0. Through a series of complex obfuscations, the liblzma build process extracts a prebuilt object file from a disguised test file existing in the source code, which is then used to modify specific functions in the liblzma code. This results in a modified liblzma library that can be used by any software linked against this library, intercepting and modifying the data interaction with this library.

link

Purpose statement

In this article, we’ll dive into the technical details of how the malicious code “modifies specific functions in the liblzma code”.

Preparation

IFUNC

IFUNC, which stands for GNU indirect function support, is a feature of the GNU toolchain. This feature enables selecting an implementation for a function either at program startup or when the function is first called. (Without this feature, when multiple implementations exist, the dynamic linker uses the first one it finds.)

For example, this feature is used in glibc to select the most optimal implementation based on CPU features. In other words, software can adapt its implementation based on the runtime hardware environment. I’d like to explore SIMD-based performance improvements in another article.

The function that selects the implementation is called a resolver function. It is invoked when the address of the target function is needed. In modern systems, addresses are typically resolved at program startup.

The following is minimal code that demonstrates IFUNC. Similar code is used in commit [c] to verify IFUNC support.

static void func(void) { return; }
typedef void (*fn_t)(void);
static fn_t resolve_func(void) { return func; }
void func_ifunc (void) __attribute__ ((__ifunc__ ("resolve_func")));
int main(void){ func_ifunc(); return 0; }

GOT/PLT

The next step is to understand GOT/PLT.

In other words, the essence of IFUNC is its ability to dynamically modify GOT entries to redirect function calls. Let’s examine how these structures are represented in the binary using the following sample program.

int main(void) {
    char buffer[100];
    
    printf("=== GOT/PLT Demo ===\n");
    
    const char *text = "Hello, GOT/PLT!";
    size_t len = strlen(text);
    printf("String length: %zu\n", len);
    
    strcpy(buffer, text);
    printf("Copied string: %s\n", buffer);
    
    puts("This is from puts()");
    
    return 0;
}

Let’s verify that the binary contains GOT and PLT sections:

readelf -S $BINARY | grep -E "\.got|\.got\.plt"
  [14] .plt.got          PROGBITS         0000000000001070  00001070
  [23] .got              PROGBITS         0000000000003fc0  00002fc0
  [24] .got.plt          PROGBITS         0000000000003fe8  00002fe8
readelf -S $BINARY | grep -E "\.plt|\.plt\.got"
  [11] .rela.plt         RELA             0000000000000660  00000660
  [13] .plt              PROGBITS         0000000000001020  00001020
  [14] .plt.got          PROGBITS         0000000000001070  00001070
  [24] .got.plt          PROGBITS         0000000000003fe8  00002fe8

Next, let’s examine the disassembly of the PLT entries. The following output shows that the address of printf’s jump stub is 0x1060, and the first instruction looks up the corresponding GOT entry. b2 2f 00 00 in little endian is 0x2fb2. The GOT entry address is calculated as 0x1066 + 0x2fb2 = 0x4018, where 0x1066 comes from 0x1060 + 6 (instruction size), since this uses RIP-relative addressing.

objdump -d -j .plt $BINARY
(snipped)
0000000000001060 <printf@plt>:
    1060:	ff 25 b2 2f 00 00    	jmp    *0x2fb2(%rip)        # 4018 <printf@GLIBC_2.2.5>
    1066:	68 03 00 00 00       	push   $0x3
    106b:	e9 b0 ff ff ff       	jmp    1020 <_init+0x20>

We can see the entry at 0x4018 in the GOT.PLT section. Note that this table has not been resolved yet (these are the initial placeholder values).

readelf -r $BINARY | grep -A 20 "Relocation section '.rela.plt'"
Relocation section '.rela.plt' at offset 0x660 contains 4 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000004000  000300000007 R_X86_64_JUMP_SLO 0000000000000000 strcpy@GLIBC_2.2.5 + 0
000000004008  000400000007 R_X86_64_JUMP_SLO 0000000000000000 puts@GLIBC_2.2.5 + 0
000000004010  000500000007 R_X86_64_JUMP_SLO 0000000000000000 strlen@GLIBC_2.2.5 + 0
000000004018  000600000007 R_X86_64_JUMP_SLO 0000000000000000 printf@GLIBC_2.2.5 + 0

The following output shows that the entry at 0x4018 contains 0x1066. This is the initial GOT value, which points back to the next instruction (the push operation) in the PLT stub, triggering lazy binding on the first call.

readelf -x .got.plt $BINARY
Hex dump of section '.got.plt':
 NOTE: This section has relocations against it, but these have NOT been applied to this dump.
  0x00003fe8 e03d0000 00000000 00000000 00000000 .=..............
  0x00003ff8 00000000 00000000 36100000 00000000 ........6.......
  0x00004008 46100000 00000000 56100000 00000000 F.......V.......
  0x00004018 66100000 00000000                   f.......

In the disassembly of the main function, we can see the following instruction for the printf call. This shows that calling printf actually jumps to its corresponding PLT entry.

    11df:	e8 7c fe ff ff       	call   1060 <printf@plt>

I would have liked to observe the GOT entries being updated at runtime, but unfortunately my experimental setup didn’t allow for this.

The Vulnerability

For a detailed analysis of the vulnerability, I recommend reading [2]. Here is my understanding of how it worked.

The attacker introduced a resolver function named crc64_resolve through their commits. The malicious build script then modified is_arch_extension_supported within crc64_resolve, causing it to call external function called _get_cpuid. The external function was linked from the malicious object files during compilation. At program startup, the malicious function overwrote GOT entries, including the one for RSA_public_decrypt.

Closing Thoughts

I’d like to highlight the attack techniques beyond IFUNC itself. In this attack, the build script modified the source code itself and compiled it together with malicious object files. The crucial aspect was that the attacker became a maintainer, gained the ability to host release packages, and made subtle modifications through build scripts that were not included in the repository. The sophistication of these build scripts was remarkable.

Another interesting aspect was the runtime function selection mechanism itself. Having recently learned about the ELF format, examining the binary helped deepen my understanding. I suspect that analyzing the malicious test file binaries would reveal exactly how the external functions overwrote the GOT and what backdoor was implemented in RSA_public_decrypt, but I didn’t have the energy to pursue this further.

This article is written by K.Waki

Software Engineer. English Learner. Opinions expressed here are mine alone.