Introduction
compatmalloc is a drop-in memory-hardening allocator for Linux. It builds as a shared library (libcompatmalloc.so) that you can inject into any dynamically linked program via LD_PRELOAD, replacing the standard C memory allocator with one that actively detects and mitigates common heap exploitation techniques.
Why compatmalloc?
Heap vulnerabilities -- use-after-free, heap buffer overflows, double frees, and metadata corruption -- remain among the most exploited bug classes in native software. Standard allocators like glibc's ptmalloc2 are optimized for throughput and make no attempt to detect misuse at runtime.
compatmalloc exists to close that gap. It provides:
- Detection of use-after-free, buffer overruns (via canaries), and double-free conditions.
- Mitigation through delayed memory reuse (quarantine), out-of-band metadata, and guard pages.
- Compatibility with the full glibc malloc ABI, so existing binaries work without recompilation.
Design goals
-
Drop-in replacement. Export every symbol that glibc's malloc provides (
malloc,free,realloc,calloc,posix_memalign,aligned_alloc,memalign,valloc,pvalloc,malloc_usable_size,mallopt,mallinfo,mallinfo2). Programs that link against glibc should work unchanged. -
Defense in depth. Each hardening feature targets a different exploitation primitive. Features can be toggled individually through Cargo feature flags.
-
No standard library dependency. The allocator is built as a
cdylibwith#![no_std]-style patterns internally, usinglibcfor system calls anddlsym(RTLD_NEXT)for fallback to the real allocator. This avoids circular dependencies and keeps the binary small. -
Reasonable performance. The allocator is not a benchmark champion, but its overhead should be acceptable for development, testing, and hardened production deployments.
How it works
When loaded via LD_PRELOAD, compatmalloc's exported symbols override glibc's. A library constructor (__attribute__((constructor)) equivalent via .init_array) runs before main(), resolving the real libc functions via dlsym(RTLD_NEXT) and initializing the hardened allocator.
All allocations smaller than 16 KiB go through a slab allocator with per-CPU arenas. Larger allocations get individual mmap regions with optional guard pages on both sides. An out-of-band metadata table (stored in a separate mmap region) tracks each allocation's requested size, canary value, and freed-state flag, preventing attackers from corrupting heap metadata by overflowing adjacent allocations.
When the allocator is disabled (via COMPATMALLOC_DISABLE=1), all calls pass through to glibc, making it easy to toggle off in production if needed.
Getting Started
Prerequisites
- Rust toolchain (stable channel). Install via rustup.
- Linux x86_64 (the primary supported platform).
- A C compiler toolchain (
gccorclang) for linking the cdylib.
Build
Clone the repository and build the release library:
git clone https://github.com/t-cun/compatmalloc.git
cd compatmalloc
cargo build --release
The output shared library is at:
target/release/libcompatmalloc.so
Basic usage with LD_PRELOAD
Inject the library into any dynamically linked program:
LD_PRELOAD=./target/release/libcompatmalloc.so <your-program>
For example:
# Run bash with compatmalloc
LD_PRELOAD=./target/release/libcompatmalloc.so bash -c 'echo "hello from hardened malloc"'
# Run Python
LD_PRELOAD=./target/release/libcompatmalloc.so python3 -c 'print("works")'
# Run a server
LD_PRELOAD=./target/release/libcompatmalloc.so ./my-server
Verify it works
You can confirm that compatmalloc is intercepting allocations by checking that the library is loaded:
LD_PRELOAD=./target/release/libcompatmalloc.so \
bash -c 'cat /proc/self/maps | grep compatmalloc'
This should show the library mapped into the process address space.
You can also check exported symbols:
nm -D target/release/libcompatmalloc.so | grep -E ' T (malloc|free|calloc|realloc)$'
Expected output:
0000000000xxxxxx T calloc
0000000000xxxxxx T free
0000000000xxxxxx T malloc
0000000000xxxxxx T realloc
Disable at runtime
If you need to bypass the hardened allocator without removing LD_PRELOAD, set the kill-switch environment variable:
COMPATMALLOC_DISABLE=1 LD_PRELOAD=./target/release/libcompatmalloc.so <your-program>
This makes all allocator calls pass through to glibc. See Configuration for all available options.
Run the test suite
cargo test --workspace
This runs unit tests for all internal modules (size classes, bitmap, metadata table, etc.).
Building
Prerequisites
- Rust stable toolchain. Install via rustup:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh - Linux x86_64. The primary supported platform. The library uses Linux-specific APIs (
mmap,mprotect,futex,/proc/self/maps). - C linker. The
cccrate will usegccorclangfor linking the cdylib. On Ubuntu/Debian:apt install build-essential.
Build commands
Debug build
cargo build --workspace
Output: target/debug/libcompatmalloc.so
The debug build includes debug assertions (debug_assert!) and debug symbols. It is suitable for development and testing but not for performance measurement.
Release build
cargo build --workspace --release
Output: target/release/libcompatmalloc.so
The release profile is configured in the workspace Cargo.toml with aggressive optimizations:
[profile.release]
opt-level = 3
lto = "fat"
codegen-units = 1
panic = "abort"
opt-level = 3-- maximum optimization.lto = "fat"-- full link-time optimization across all crates.codegen-units = 1-- single codegen unit for better optimization (slower compile).panic = "abort"-- no unwinding (smaller binary, no landing pads).
Hardened profile
A custom hardened profile is available for deployments that want a balance between debuggability and performance:
cargo build --workspace --profile hardened
[profile.hardened]
inherits = "release"
opt-level = 2
overflow-checks = true
debug = 1
opt-level = 2-- slightly less aggressive optimization (faster compile, slightly larger binary).overflow-checks = true-- arithmetic overflow panics instead of wrapping.debug = 1-- line-level debug info (for useful backtraces without full debug bloat).
Feature flags
The compatmalloc crate defines the following features:
| Feature | Default | Description |
|---|---|---|
hardened | Yes | Meta-feature that enables all hardening features below |
quarantine | Via hardened | Delay reuse of freed memory |
guard-pages | Via hardened | Place inaccessible pages around allocations |
slot-randomization | Via hardened | Randomize slot selection within size classes |
canaries | Via hardened | Detect buffer overflows via canary bytes |
poison-on-free | Via hardened | Fill freed memory with a poison pattern |
write-after-free-check | Via hardened | Detect writes to freed memory during quarantine eviction |
zero-on-free | Via hardened | Zero memory on free (defense against information leaks) |
Building with specific features
# All hardening (default)
cargo build --release
# No hardening (passthrough-like performance)
cargo build --release --no-default-features
# Only quarantine and guard pages
cargo build --release --no-default-features --features quarantine,guard-pages
# Everything except zero-on-free (reduce free overhead)
cargo build --release --no-default-features \
--features quarantine,guard-pages,slot-randomization,canaries,poison-on-free,write-after-free-check
Linker scripts
The build script (build.rs) configures platform-specific linker behavior:
- Linux: A version script (
linker/version_script.lds) controls which symbols are exported. Only the standard C allocator symbols are exported; all internal Rust symbols are hidden. - macOS: All symbols are exported by default (no special configuration yet).
- Windows: A
.deffile (linker/exports.def) lists exported symbols.
Running tests
# All tests
cargo test --workspace
# Release mode tests
cargo test --workspace --release
# Tests with no default features
cargo test --workspace --no-default-features
# Tests for a single feature
cargo test --workspace --no-default-features --features canaries
Checking code quality
# Format check
cargo fmt --all -- --check
# Clippy lints
cargo clippy --workspace --all-targets --all-features -- -D warnings
Cross-compilation
The library is primarily designed for Linux x86_64. Cross-compilation to other Linux architectures (aarch64, etc.) should work but is not tested in CI. Non-Linux platforms (macOS, Windows) have stub platform implementations but are not fully supported.
# Example: cross-compile for aarch64-linux
rustup target add aarch64-unknown-linux-gnu
cargo build --release --target aarch64-unknown-linux-gnu
Configuration
compatmalloc reads configuration from environment variables at initialization time (before main() runs). All configuration is optional; the defaults provide a good balance of security and performance.
Environment variables
COMPATMALLOC_DISABLE
Type: presence-based (any value enables it) Default: not set (allocator is enabled)
When this variable is set to any non-empty value, the hardened allocator is completely bypassed. All malloc/free/realloc/calloc calls are forwarded directly to glibc via dlsym(RTLD_NEXT).
Use this as a kill-switch if you suspect the allocator is causing issues with a specific program:
COMPATMALLOC_DISABLE=1 LD_PRELOAD=./libcompatmalloc.so ./my-program
Implementation: Checked during init via config::is_disabled(). The init state machine transitions to DISABLED instead of READY, and the dispatch macro routes all calls to the passthrough allocator.
COMPATMALLOC_ARENA_COUNT
Type: unsigned integer Default: number of CPUs (capped at 32)
Sets the number of slab arenas. Each arena has its own set of size-class slabs and its own locks, so more arenas reduce contention in multi-threaded programs.
COMPATMALLOC_ARENA_COUNT=8 LD_PRELOAD=./libcompatmalloc.so ./my-server
Valid range: 1 to 32 (MAX_ARENAS). Values above 32 are clamped. A value of 0 means "use the default" (number of CPUs).
Tradeoff: More arenas reduce lock contention but increase memory usage (each arena independently maps slab regions). For single-threaded programs, COMPATMALLOC_ARENA_COUNT=1 is optimal.
Thread-to-arena mapping: Threads are assigned to arenas by thread_id % num_arenas. This provides a rough approximation of per-CPU arenas without requiring sched_getcpu.
COMPATMALLOC_QUARANTINE_SIZE
Type: unsigned integer (bytes)
Default: 4194304 (4 MiB)
Sets the maximum total bytes held in the quarantine queue. When the quarantine's total byte count would exceed this limit, the oldest entries are evicted (and their slots returned to the free list) until the limit is satisfied.
# Larger quarantine for more thorough use-after-free detection
COMPATMALLOC_QUARANTINE_SIZE=16777216 LD_PRELOAD=./libcompatmalloc.so ./my-program
# Smaller quarantine to reduce memory overhead
COMPATMALLOC_QUARANTINE_SIZE=1048576 LD_PRELOAD=./libcompatmalloc.so ./my-program
Valid range: 0 to usize::MAX. A value of 0 means entries are evicted immediately (effectively disabling quarantine delay, though the quarantine code path is still executed).
Note: This variable only has effect when the quarantine feature is enabled (it is enabled by default in the hardened feature set).
Configuration timing
All environment variables are read once during the library constructor (__attribute__((constructor)) equivalent via .init_array). Configuration cannot be changed at runtime. This design avoids the need for synchronization on configuration reads in the hot path.
The read sequence during init:
passthrough::resolve_real_functions()-- resolve glibc symbols viadlsym.config::read_config()-- readCOMPATMALLOC_ARENA_COUNTandCOMPATMALLOC_QUARANTINE_SIZE.config::is_disabled()-- checkCOMPATMALLOC_DISABLE.HardenedAllocator::init()-- apply configuration values.
Summary table
| Variable | Type | Default | Description |
|---|---|---|---|
COMPATMALLOC_DISABLE | presence | not set | Bypass hardened allocator entirely |
COMPATMALLOC_ARENA_COUNT | uint | CPU count (max 32) | Number of per-thread slab arenas |
COMPATMALLOC_QUARANTINE_SIZE | uint (bytes) | 4194304 (4 MiB) | Maximum quarantine queue size |
ABI Contract
compatmalloc exports every symbol that a glibc-linked program may reference for dynamic memory management. This page documents each exported function, its parameters, return values, and error behavior.
All functions use the C ABI (extern "C").
Standard C allocator
malloc
void *malloc(size_t size);
Allocate size bytes of uninitialized memory.
- Parameters:
size-- number of bytes to allocate. - Returns: Pointer to the allocated memory, aligned to at least 16 bytes. Returns
NULLon failure (anderrnois set toENOMEMby the underlying mapping). - Special case:
malloc(0)returns a valid, unique, non-NULLpointer (to a 1-byte internal allocation). This pointer must be passed tofree().
free
void free(void *ptr);
Release memory previously returned by malloc, calloc, realloc, or an alignment function.
- Parameters:
ptr-- pointer to free. IfNULL, the call is a no-op. - Behavior on double-free: If the
write-after-free-checkfeature is enabled and the metadata table detects the pointer was already freed, the allocator writes a diagnostic to stderr and callsabort(). - Behavior on invalid pointer: Pointers not recognized by any arena or the large allocator are silently ignored for compatibility.
realloc
void *realloc(void *ptr, size_t size);
Change the size of a previously allocated block.
- Parameters:
ptr-- pointer to the existing allocation. IfNULL, behaves likemalloc(size).size-- new size in bytes. If0, behaves likefree(ptr)and returnsNULL.
- Returns: Pointer to the resized block (may differ from
ptr). ReturnsNULLon failure; the original block is left unchanged. - Copy behavior: When a new block is allocated,
min(old_size, new_size)bytes are copied from the old block.
calloc
void *calloc(size_t nmemb, size_t size);
Allocate zeroed memory for an array.
- Parameters:
nmemb-- number of elements.size-- size of each element.
- Returns: Pointer to zero-initialized memory. Returns
NULLif the multiplicationnmemb * sizeoverflows or if allocation fails. - Overflow protection: Uses
checked_mulinternally. On overflow, setserrnotoENOMEMand returnsNULL.
POSIX alignment APIs
posix_memalign
int posix_memalign(void **memptr, size_t alignment, size_t size);
Allocate memory with a specified alignment (POSIX).
- Parameters:
memptr-- output pointer. Must not beNULL.alignment-- must be a power of two and at leastsizeof(void *)(8 bytes on 64-bit).size-- number of bytes.
- Returns:
0on success (pointer stored in*memptr).EINVALifmemptrisNULLoralignmentis invalid.ENOMEMif allocation fails.
aligned_alloc
void *aligned_alloc(size_t alignment, size_t size);
Allocate memory with a specified alignment (C11).
- Parameters:
alignment-- must be a power of two.size-- must be a multiple ofalignment(unlesssizeis0).
- Returns: Aligned pointer, or
NULLon failure (witherrnoset toEINVALorENOMEM).
memalign
void *memalign(size_t alignment, size_t size);
Allocate memory with a specified alignment (legacy).
- Parameters:
alignment-- must be a power of two.size-- number of bytes.
- Returns: Aligned pointer, or
NULLon failure.
valloc
void *valloc(size_t size);
Allocate page-aligned memory.
- Parameters:
size-- number of bytes. - Returns: Pointer aligned to the system page size (4096 bytes), or
NULLon failure.
pvalloc
void *pvalloc(size_t size);
Allocate page-aligned memory, rounding the size up to a page boundary.
- Parameters:
size-- number of bytes (rounded up to the next multiple of the page size). - Returns: Page-aligned pointer, or
NULLon failure.
GNU extensions
malloc_usable_size
size_t malloc_usable_size(void *ptr);
Return the usable size of an allocation.
- Parameters:
ptr-- pointer returned by an allocator function. IfNULL, returns0. - Returns: The number of usable bytes in the allocation. This may be larger than the originally requested size (due to size-class rounding) but programs must not rely on the excess bytes persisting across
realloc.
mallopt
int mallopt(int param, int value);
Set allocator tuning parameters.
- Behavior: Accepts all calls and returns
1(success) but performs no action. Provided solely for binary compatibility with programs that callmallopt.
mallinfo / mallinfo2
struct mallinfo mallinfo(void);
struct mallinfo2 mallinfo2(void); // Linux only
Return allocator statistics.
- Behavior: Returns a zeroed struct. Provided solely for binary compatibility.
mallinfo2is only exported on Linux targets.
Minimum alignment
All allocations are aligned to at least 16 bytes (MIN_ALIGN), which matches max_align_t on 64-bit Linux. This guarantees correct alignment for any C data type.
Deviations from glibc
compatmalloc aims for full ABI compatibility with glibc's malloc, but makes deliberate behavioral choices that differ in edge cases. This page documents every known deviation.
malloc(0)
| Behavior | glibc | compatmalloc |
|---|---|---|
malloc(0) | Returns a unique non-NULL pointer (implementation-defined minimum size) | Returns a unique non-NULL pointer (internally allocates 1 byte, rounded up to a 16-byte slot) |
Both return a valid pointer that must be freed. The difference is academic; compatmalloc's behavior is conformant with the C standard, which states that malloc(0) may return either NULL or a unique pointer.
Minimum allocation alignment
| Behavior | glibc | compatmalloc |
|---|---|---|
| Minimum alignment | 16 bytes on 64-bit | 16 bytes (MIN_ALIGN) |
No deviation. Both guarantee alignment to max_align_t.
malloc_usable_size
| Behavior | glibc | compatmalloc |
|---|---|---|
| Usable size | Typically the chunk size minus overhead (often significantly larger than requested) | The slab slot size for the allocation's size class |
glibc often returns usable sizes much larger than requested due to its chunk-based design. compatmalloc returns the size-class slot size, which is typically closer to the requested size. Programs that depend on malloc_usable_size returning a value much larger than requested may behave differently.
realloc(ptr, 0)
| Behavior | glibc | compatmalloc |
|---|---|---|
realloc(ptr, 0) | Frees ptr, returns NULL (glibc 2.x behavior; was implementation-defined) | Frees ptr, returns NULL |
No deviation in practice. Note that the C standard makes realloc(ptr, 0) implementation-defined; both implementations choose to free and return NULL.
mallopt / mallinfo / mallinfo2
| Behavior | glibc | compatmalloc |
|---|---|---|
mallopt | Adjusts internal tuning parameters | Accepts the call, returns success, does nothing |
mallinfo / mallinfo2 | Returns live statistics | Returns zeroed structs |
These functions are provided only for binary compatibility. Programs that rely on mallopt to tune allocator behavior (e.g., M_MMAP_THRESHOLD) will find those tunings silently ignored. Programs that display mallinfo statistics will see all zeros.
Freed memory contents
| Behavior | glibc | compatmalloc (default features) |
|---|---|---|
After free(ptr) | Memory contents undefined; typically contains freelist pointers | Memory is poisoned with 0xFE bytes |
When the poison-on-free feature is enabled (included in the default hardened feature set), freed memory is overwritten with a poison byte (0xFE). Programs that access freed memory will read predictable but invalid data rather than stale user content or heap metadata.
If the zero-on-free feature is also enabled, memory is zeroed after poison checking, ensuring no sensitive data persists.
Double free
| Behavior | glibc | compatmalloc |
|---|---|---|
| Double free | May print a diagnostic and abort, or may corrupt the heap silently depending on the tcache state | Detected via metadata flags; aborts with a diagnostic message to stderr |
compatmalloc's out-of-band metadata tracks the freed state of each allocation, providing more reliable double-free detection than glibc's inline freelist checks.
Thread safety during init
| Behavior | glibc | compatmalloc |
|---|---|---|
| Early allocations before full init | Uses a brk-based arena | Uses a static 64 KiB bootstrap buffer; allocations from this buffer cannot be freed or reallocated back to the system |
The bootstrap buffer is a fixed-size bump allocator used only during the brief window when dlsym itself may call malloc before the real libc functions are resolved. Under normal operation, the bootstrap buffer is used for a handful of small allocations during initialization and is never exhausted.
Aligned allocation internals
| Behavior | glibc | compatmalloc |
|---|---|---|
| Over-aligned allocations | Uses dedicated aligned chunk logic | Over-allocates by size + alignment, then returns an aligned offset within the allocation |
This approach is correct but wastes up to alignment - 1 bytes per over-aligned allocation. For alignments of 16 bytes or less, no extra allocation is needed because the slab allocator already guarantees 16-byte alignment.
Failure Modes
This page documents what happens when compatmalloc encounters error conditions at runtime. Understanding these failure modes is important for debugging and for setting expectations about how the allocator behaves under stress or attack.
Out of memory (OOM)
Slab allocation failure
When mmap fails while allocating a new slab region, the slab allocator returns a null pointer. This propagates up through malloc, which returns NULL to the caller. The allocator does not abort on OOM; it follows the C standard convention of returning NULL.
Large allocation failure
When mmap fails for a large allocation, LargeAlloc::create returns None, and malloc returns NULL.
Metadata table growth failure
When the metadata table exceeds its 75% load factor and mmap fails for the new table, the grow function returns without growing. Subsequent insertions may degrade to long probe chains but will still function as long as there is at least one empty slot. The allocator does not abort.
Calloc overflow
If nmemb * size overflows usize, calloc sets errno to ENOMEM and returns NULL. No allocation is attempted.
Heap corruption detected
Canary violation
Trigger: free or realloc of a pointer whose canary bytes have been modified (buffer overflow detected).
Behavior: When the canary check fails, the allocator writes a diagnostic message to stderr and calls abort():
compatmalloc: canary check failed -- heap buffer overflow detected
The process is terminated immediately. This is intentional: a corrupted canary means the heap is in an unknown state, and continuing execution could allow exploitation.
Write-after-free detected
Trigger: A quarantine entry's poison bytes have been modified when the entry is evicted (write-after-free detected).
Behavior: The allocator writes a diagnostic to stderr and calls abort():
compatmalloc: write-after-free detected during quarantine eviction
Double free
Trigger: free is called on a pointer whose metadata FLAG_FREED bit is already set.
Behavior: For large allocations, the allocator writes a diagnostic to stderr and calls abort():
compatmalloc: double free detected (large)
For slab allocations, double-free detection relies on the metadata table's freed flag.
Guard page violation
Trigger: A read or write to a guard page (buffer overflow/underflow past the allocation region boundary).
Behavior: The kernel delivers SIGSEGV to the process. The allocator does not handle this signal; it results in the default behavior (core dump and termination). The faulting address will be within a guard page region, which can be identified in the core dump.
Diagnostic output
All diagnostic messages are written directly to file descriptor 2 (stderr) using libc::write, with no heap allocation. This ensures that diagnostics work even when the heap is corrupted. After writing the message, the allocator calls libc::abort(), which generates a SIGABRT and (on most configurations) a core dump.
The diagnostic path is implemented in hardening::abort_with_message:
#![allow(unused)] fn main() { pub fn abort_with_message(msg: &str) -> ! { unsafe { libc::write(2, msg.as_ptr() as *const libc::c_void, msg.len()); libc::abort(); } } }
Kill-switch behavior
When COMPATMALLOC_DISABLE=1 is set, the allocator enters disabled mode during initialization. All allocation calls pass through to glibc via dlsym(RTLD_NEXT). No hardening features are active, and no hardening-related failures can occur.
Summary table
| Condition | Behavior | Exit? |
|---|---|---|
mmap fails (OOM) | Returns NULL | No |
calloc size overflow | Returns NULL, sets errno = ENOMEM | No |
| Metadata table growth fails | Continues with existing table | No |
| Canary violation | Diagnostic to stderr, abort() | Yes |
| Write-after-free | Diagnostic to stderr, abort() | Yes |
| Double free (large) | Diagnostic to stderr, abort() | Yes |
| Guard page violation | SIGSEGV (kernel-delivered) | Yes |
Unknown pointer to free | Silently ignored | No |
Hardening Overview
compatmalloc implements multiple layers of heap hardening, each targeting a different exploitation primitive. All hardening features are enabled by default through the hardened Cargo feature set and can be toggled individually.
Feature flags
| Feature | Default | Description |
|---|---|---|
quarantine | On | Delay memory reuse to detect use-after-free |
guard-pages | On | Place inaccessible pages around allocations |
slot-randomization | On | Randomize slot selection within size classes |
canaries | On | Write canary bytes after allocations to detect overflows |
poison-on-free | On | Fill freed memory with a poison pattern |
write-after-free-check | On | Verify poison bytes on eviction from quarantine |
zero-on-free | On | Zero memory after free (defense against information leaks) |
To build with all hardening (the default):
cargo build --release
To build with no hardening (passthrough performance baseline):
cargo build --release --no-default-features
To build with specific features:
cargo build --release --no-default-features --features quarantine,guard-pages
Defense-in-depth model
The hardening features form layers that work together:
Allocation request
|
v
[Slab allocator with per-CPU arenas]
|
+-- Slot randomization (unpredictable address)
+-- Canary bytes (detect buffer overruns)
+-- Out-of-band metadata (prevent metadata corruption)
+-- Guard pages (hardware-enforced bounds)
|
On free:
|
+-- Double-free detection (metadata flag check)
+-- Poison fill (detect use-after-free reads)
+-- Quarantine (delay reuse, detect stale writes)
+-- Zero-on-free (clear sensitive data)
Each layer provides value independently, but their combination makes exploitation significantly more difficult. An attacker must simultaneously bypass:
- Canary validation to overflow without detection.
- Poison checking to write after free without detection.
- Quarantine delays to reclaim a specific address.
- Guard pages to overflow beyond the allocation region.
- Out-of-band metadata to corrupt heap management data.
- Slot randomization to predict allocation addresses.
Per-feature documentation
- Use-After-Free Detection -- Quarantine and poison-based detection.
- Heap Metadata Protection -- Out-of-band metadata table.
- Stale Pointer Mitigation -- Delayed reuse through quarantine.
- Guard Pages -- Hardware-enforced memory boundaries.
- ARM Memory Tagging (MTE) -- Hardware memory tagging on ARM64 (replaces canaries, poison, and zero-on-free).
Use-After-Free Detection
Use-after-free (UAF) is one of the most exploited memory safety vulnerabilities. It occurs when a program continues to access memory through a pointer after that memory has been freed. compatmalloc employs two complementary techniques to detect UAF: poison filling and quarantine-based write detection.
Poison on free
Feature flag: poison-on-free
When memory is freed, the entire allocation is overwritten with a poison byte pattern (0xFE). This provides two benefits:
-
Deterministic crash on read-after-free. Programs that read freed memory will encounter the poison pattern instead of stale data. Dereferencing a pointer value of
0xFEFEFEFEFEFEFEFEon x86_64 will typically cause a segfault, turning a silent data corruption bug into a crash. -
Information leak prevention. Sensitive data (passwords, keys, session tokens) is overwritten immediately on free, reducing the window during which it can be extracted from the heap.
Implementation
The poison fill is performed by hardening::poison::poison_region, which calls core::ptr::write_bytes with the poison byte (0xFE, defined in util::POISON_BYTE). The operation is a simple memset and adds minimal overhead.
Write-after-free detection
Feature flag: write-after-free-check
When an allocation is evicted from quarantine (see Stale Pointer Mitigation), the allocator checks whether the poison bytes are still intact. If any byte has been modified, it indicates that something wrote to the memory after it was freed -- a write-after-free condition.
Detection flow
free(ptr)
|
+-- Poison fill: memset(ptr, 0xFE, size)
+-- Mark as freed in metadata table
+-- Push into quarantine
|
... time passes, quarantine fills up ...
|
Quarantine eviction:
+-- Check poison: are all bytes still 0xFE?
| |
| +-- YES: no write-after-free, safe to reuse
| +-- NO: write-after-free detected, abort
|
+-- Actually recycle the slot
Poison checking implementation
The poison check (hardening::poison::check_poison) reads memory in 8-byte (u64) chunks for performance, comparing against the expected pattern 0xFEFEFEFEFEFEFEFE. Remaining bytes are checked individually. This makes the check fast even for large allocations.
Zero on free
Feature flag: zero-on-free
When enabled alongside poison-on-free, memory is zeroed after the poison check passes (or unconditionally if poison checking is disabled). This ensures that no sensitive data remains in the allocation even after it leaves quarantine.
The zeroing happens just before the slot is returned to the free pool:
Quarantine eviction:
+-- Check poison (if enabled)
+-- Zero fill: memset(ptr, 0x00, size)
+-- Return slot to slab free list
Double-free detection
The out-of-band metadata table tracks whether each allocation has been freed via a FLAG_FREED bit in the AllocationMeta::flags field. When free is called:
- The metadata for the pointer is looked up.
- If
is_freed()returnstrue, the allocator writes a diagnostic message to stderr and callsabort(). - Otherwise, the freed flag is set via
mark_freed().
This detection is more reliable than glibc's inline freelist checks because the metadata is stored in a separate memory region that cannot be corrupted by a heap buffer overflow.
Heap Metadata Protection
Traditional allocators like glibc's ptmalloc2 store heap metadata (chunk sizes, freelist pointers) inline, immediately adjacent to user data. This design is efficient but means that a heap buffer overflow can corrupt the allocator's own bookkeeping, enabling powerful exploitation techniques like unlink attacks, fastbin corruption, and tcache poisoning.
compatmalloc eliminates this attack surface by storing all allocation metadata out-of-band in a separate memory region.
Out-of-band metadata table
The metadata table (hardening::metadata::MetadataTable) is a hash table backed by its own mmap region, completely separate from the slab and large allocation regions. It maps pointer addresses to AllocationMeta structs:
#![allow(unused)] fn main() { pub struct AllocationMeta { pub requested_size: usize, // The size the caller asked for pub checksum_value: u64, // Integrity checksum for corruption detection pub flags: u8, // State flags (e.g., FLAG_FREED) } }
Why this matters
With inline metadata, an attacker who can overflow a heap buffer by even a single byte may be able to:
- Modify the size of the next chunk, enabling overlapping allocations.
- Corrupt freelist pointers, redirecting allocations to attacker-controlled addresses.
- Forge fake chunks to confuse the allocator's validation checks.
With out-of-band metadata, none of these attacks work. The metadata lives in a different virtual memory region, so overflowing a user allocation cannot reach it.
Implementation details
Hash table design
The metadata table uses open addressing with linear probing:
- Keys are the pointer address cast to
usize. - Initial capacity is 16,384 entries.
- Load factor threshold is 75%. When exceeded, the table grows by 2x via a new
mmapand full rehash. - Hash function uses a multiplicative hash (
key * 0x9E3779B97F4A7C15, the golden ratio constant) with a xor-shift mix for good distribution. - Deletion uses backward-shift deletion (not tombstones) to maintain probe chain integrity.
Concurrency
The table is protected by a raw mutex (sync::RawMutex, implemented via Linux futex). All operations (insert, get, remove, mark_freed) acquire the lock for their duration.
Memory isolation
The table's backing memory is allocated via mmap(MAP_PRIVATE | MAP_ANONYMOUS), placing it at an address chosen by the kernel. This address is independent of the slab and large allocation regions, providing spatial separation.
Growth
When the load factor exceeds 75%, a new region of double the capacity is mapped, all entries are rehashed into it, and the old region is unmapped. This operation is performed under the lock to ensure consistency.
Lookup on every free
Every call to free looks up the pointer in the metadata table to:
- Check the
FLAG_FREEDbit for double-free detection. - Retrieve the
requested_sizefor canary checking and poison filling. - Retrieve the
checksum_valuefor integrity validation.
This adds a hash table lookup to every free operation, but the table is kept small relative to the number of live allocations, and the multiplicative hash provides good cache behavior.
Tradeoffs
| Benefit | Cost |
|---|---|
| Immune to heap metadata corruption attacks | Extra memory for the hash table (~25 bytes per live allocation) |
| Reliable double-free detection | Hash table lookup on malloc, free, and realloc |
| Canary and size tracking without inline headers | Mutex contention under heavy multi-threaded allocation |
Stale Pointer Mitigation
A stale pointer is a pointer that once referred to a valid allocation but now points to memory that has been freed and potentially reallocated for a different purpose. Stale pointers are the root cause of use-after-free vulnerabilities: if the memory is reallocated, the stale pointer now aliases a live object, and reads/writes through it corrupt unrelated data.
compatmalloc mitigates stale pointer exploitation through quarantine -- a bounded queue that delays the reuse of freed memory.
The quarantine
Feature flag: quarantine
When memory is freed, it is not immediately returned to the slab allocator's free list. Instead, it is pushed into a FIFO quarantine queue. The memory remains allocated (from the OS perspective) but is not available for new allocations. When the quarantine is full, the oldest entry is evicted and its slot is finally returned to the free list.
How it helps
Without quarantine, a freed slot can be immediately reused by the next malloc of the same size class. An attacker can trigger this reliably by controlling the timing of allocations and frees. With quarantine:
-
Temporal separation. Hundreds of frees must occur before a specific slot is reused, making timing-based heap grooming attacks much harder.
-
Write-after-free detection window. While memory is in quarantine, it remains poisoned. If anything writes to it during this window, the poison check on eviction will detect the corruption.
-
Reduced exploit reliability. Even if an attacker can trigger a use-after-free, the window during which the freed memory is reused for a useful (to the attacker) object is dramatically reduced.
Implementation
The quarantine (hardening::quarantine::Quarantine) is a fixed-capacity ring buffer with 256 slots per arena, protected by the arena lock.
head tail
| |
[ evicted ] [ entry ] [ entry ] [ ... ] [ entry ] [ empty ] [ empty ]
|__________________________________|
queued (not yet reusable)
Eviction policy
Entries are evicted when either condition is met:
- Byte budget exceeded. The total bytes in quarantine plus the new entry would exceed
max_bytes. Oldest entries are evicted until the budget is satisfied. - Slot count exceeded. The ring buffer is full (256 entries). The oldest entry is evicted.
The byte budget defaults to 4 MiB (DEFAULT_QUARANTINE_BYTES) and can be configured via the COMPATMALLOC_QUARANTINE_SIZE environment variable.
Eviction processing
When an entry is evicted from quarantine:
- If
write-after-free-checkis enabled, the poison bytes are verified. - If
zero-on-freeis enabled, the memory is zeroed. - The slot is returned to the slab allocator's free list for reuse.
Concurrency
The quarantine is embedded in each arena and protected by the arena lock. No separate quarantine lock is needed. A free call pushes one entry and potentially evicts older entries while the arena lock is held.
Configuration
| Environment variable | Default | Description |
|---|---|---|
COMPATMALLOC_QUARANTINE_SIZE | 4194304 (4 MiB) | Maximum bytes held in quarantine |
Setting the quarantine size to 0 effectively disables quarantine (entries are evicted immediately), though the feature flag must also be disabled to eliminate the overhead entirely.
Setting a larger quarantine size increases the delay before memory is reused, improving detection probability at the cost of higher memory usage.
Tradeoffs
| Benefit | Cost |
|---|---|
| Delays memory reuse, breaking heap grooming attacks | Increased resident memory (up to quarantine_size bytes held in reserve) |
| Enables write-after-free detection during quarantine window | One mutex acquisition per free call |
| Makes exploit timing unreliable | Slight increase in free latency |
Guard Pages
Guard pages are regions of virtual memory marked as inaccessible (PROT_NONE) that the allocator places around allocation regions. Any read or write that crosses the boundary of an allocation into a guard page triggers an immediate hardware fault (segfault), providing deterministic detection of buffer overflows and underflows.
How guard pages work
Feature flag: guard-pages
When guard pages are enabled, the allocator inserts inaccessible pages at the boundaries of memory regions:
Large allocations
Each large allocation (>16 KiB) gets its own mmap region with the following layout:
+-------------------+---------------------+-------------------+
| Guard page | User data | Guard page |
| (PROT_NONE) | (PROT_READ | | (PROT_NONE) |
| 4096 bytes | PROT_WRITE) | 4096 bytes |
+-------------------+---------------------+-------------------+
^ ^ ^
| | |
base user_ptr base + total_size
A buffer overflow past the end of the user data hits the rear guard page and faults. A buffer underflow (writing before the allocation) hits the front guard page.
Slab regions
Slab regions use the same pattern: guard pages are placed before and after the contiguous block of slots. This means that an overflow past the last slot in a slab, or an underflow before the first slot, will hit a guard page. However, overflows between adjacent slots within the same slab will not be caught by guard pages (canaries provide detection for those cases).
Implementation
Guard pages are implemented using platform memory protection primitives:
- Linux:
mprotect(addr, PAGE_SIZE, PROT_NONE)on the guard regions after mapping the full region withmmap. - The guard pages consume virtual address space but no physical memory (the kernel does not back
PROT_NONEpages with RAM).
The overhead functions are defined in hardening::guard_pages:
#![allow(unused)] fn main() { // Per slab region: one guard page before + one after pub const fn slab_guard_overhead() -> usize { PAGE_SIZE * 2 // 8192 bytes when enabled } // Per large allocation: one guard page before + one after pub const fn large_guard_overhead() -> usize { PAGE_SIZE * 2 } }
When the guard-pages feature is disabled, these functions return 0 and no guard pages are mapped.
What guard pages catch
| Scenario | Detected? |
|---|---|
| Linear buffer overflow past end of large allocation | Yes -- hits rear guard page |
| Linear buffer underflow before large allocation | Yes -- hits front guard page |
| Overflow past the last slot in a slab | Yes -- hits rear guard page |
| Overflow between adjacent slots in same slab | No -- caught by canaries instead |
| Wild pointer write to an arbitrary address | Only if it happens to land on a guard page |
Virtual memory cost
Guard pages consume virtual address space but not physical RAM. On 64-bit Linux, the virtual address space is 128 TiB, so the overhead is negligible. The per-region cost is:
- Large allocations: +8 KiB virtual per allocation (2 pages).
- Slab regions: +8 KiB virtual per slab (2 pages, amortized across all slots in the slab).
For a slab with 64 slots of 1024 bytes each (64 KiB data), the guard page overhead is 8 KiB / 64 KiB = 12.5% of virtual address space. For smaller size classes with more slots per slab, the overhead is proportionally lower.
Interaction with other features
Guard pages complement the other hardening features:
- Canaries detect overflows within a slab (between adjacent slots) that guard pages cannot catch.
- Poison filling detects use-after-free, which guard pages do not address.
- Out-of-band metadata prevents corruption of allocator state, which guard pages alone cannot guarantee for within-slab overflows.
Together, these features provide comprehensive coverage: guard pages handle boundary overflows with hardware enforcement, canaries handle intra-slab overflows with software checks, and metadata isolation prevents allocator state corruption regardless of overflow direction.
ARM Memory Tagging (MTE)
On ARM64 processors with Memory Tagging Extension (ARMv8.5-A+), compatmalloc uses hardware memory tagging to replace several software hardening mechanisms with zero-cost hardware enforcement.
How it works
MTE assigns a 4-bit tag (values 1-15) to each 16-byte memory granule. Every pointer also carries a tag in its top byte. On every memory access, the CPU checks that the pointer tag matches the memory tag — a mismatch triggers a synchronous fault.
compatmalloc uses MTE as follows:
- On malloc: the slot is tagged with a random hardware tag via the
IRG(Insert Random Tag) instruction. The returned pointer carries this tag. - On free: the slot is re-tagged with a different random tag via
tag_freed. Any dangling pointers still carrying the old tag will fault on access.
Runtime detection
MTE support is always compiled on aarch64 targets. At startup, compatmalloc checks for MTE hardware via getauxval(AT_HWCAP2) and enables it in synchronous mode via prctl(PR_SET_TAGGED_ADDR_CTRL). If MTE is not available, the allocator falls back to software hardening with no overhead from the detection check.
Slab backing memory is mapped with PROT_MTE when MTE is available to enable tag storage.
What MTE replaces
When MTE is active, the following software mechanisms are skipped:
| Software mechanism | What it does | MTE equivalent |
|---|---|---|
| Canary write (malloc) | Fills gap bytes with checksum-derived pattern | Hardware tag covers the entire slot |
| Canary check (free) | Verifies gap bytes are uncorrupted | Tag mismatch faults on any out-of-bounds access |
| Poison fill (free) | Fills freed memory with 0xCD pattern | Re-tagging prevents access to freed memory |
| Zero-on-free | Zeros freed memory to prevent info leak | Re-tagging prevents reads of freed memory |
The following mechanisms are kept with MTE because they are orthogonal:
| Mechanism | Why it stays |
|---|---|
| Quarantine | Delays slot reuse; MTE re-tagging detects access, but quarantine makes exploitation harder even if the 1/15 tag collision occurs |
| Guard pages | Protects against large overflows at page boundaries; MTE operates at 16-byte granularity |
| Slot randomization | Reduces heap spray predictability; orthogonal to tag-based detection |
| Double-free detection | Atomic CAS flag (try_mark_freed) runs before any MTE operations; MTE is not involved |
| Metadata integrity check | Checksum verification on out-of-band metadata; independent of MTE |
Coverage comparison
| Threat | Software hardening | MTE |
|---|---|---|
| Heap buffer overflow | Canary detects on free | Faults immediately on access |
| Heap buffer underflow | Front canary detects on free | Faults immediately on access |
| Use-after-free read | Poison corrupts data; zero-on-free clears it | Faults immediately (freed memory re-tagged) |
| Use-after-free write | Poison check detects on quarantine eviction | Faults immediately |
| Double free | Atomic CAS flag aborts immediately | Atomic CAS flag aborts immediately (same mechanism) |
| Info leak (freed data) | Zero-on-free clears freed slots | Re-tagging prevents reads (data not cleared) |
MTE provides strictly better detection timing for overflow, underflow, and use-after-free: faults occur at the moment of the invalid access rather than on the next free() or quarantine eviction.
Trade-offs
Probabilistic detection: MTE uses 15 possible tag values (4 bits, excluding tag 0). When a slot is freed and re-tagged, there is a 1/15 (~6.7%) chance the new tag matches the old tag, which would not detect a stale access. Software canaries are deterministic but only checked at free time.
No data clearing: MTE prevents access to freed memory but does not zero or poison the contents. If the 1/15 tag collision occurs, stale data could be read. Software zero-on-free eliminates this possibility entirely.
Hardware requirement: MTE requires ARMv8.5-A or later with OS kernel support. Compatible Linux platforms include AWS Graviton 3+ and Android devices with Pixel 8+ (or equivalent Armv9 SoCs). Apple Silicon has the hardware capability but macOS does not currently expose MTE to userspace. On hardware without MTE, the software fallback provides equivalent coverage at higher cost.
Performance impact
MTE eliminates the per-operation cost of canary writes, canary checks, poison fills, and zero-on-free. On MTE-capable hardware, this removes the dominant per-allocation overhead sources while maintaining equivalent or better security coverage.
Benchmarks
compatmalloc prioritizes security over raw performance. This page describes the performance characteristics, overhead sources, and how to run benchmarks to measure the impact on your workloads.
Latest CI Results (x86_64)
Auto-generated by CI on 2026-03-08 04:47 UTC from commit
760fbb2. Results are from GitHub Actions runners (shared infrastructure) and may vary between runs. Each allocator is run 3 times; the best (lowest latency) result is kept.
Multi-Allocator Comparison
| Allocator | Weighted Overhead | Latency (64B) | Throughput 1T | Ratio | Throughput 4T | Ratio | Peak RSS |
|---|---|---|---|---|---|---|---|
| compatmalloc | +11.5300% | 14.5 ns | 65.48 Mops/s | .87x | 150.78 Mops/s | .87x | 15096 KB |
| glibc | 0% | 12.3 ns | 74.53 Mops/s | 1.00x | 171.84 Mops/s | 1.00x | 10656 KB |
| jemalloc | +58.1100% | 9.9 ns | 95.32 Mops/s | 1.27x | 257.40 Mops/s | 1.49x | 37576 KB |
| mimalloc | +16.6200% | 8.9 ns | 81.17 Mops/s | 1.08x | 199.16 Mops/s | 1.15x | 25044 KB |
| passthrough | +64.5900% | 20.9 ns | 43.21 Mops/s | .57x | 19.00 Mops/s | .11x | 10996 KB |
| scudo | +310.1300% | 53.4 ns | 18.34 Mops/s | .24x | 38.73 Mops/s | .22x | 14664 KB |
Ratio interpretation: Latency ratio < 1.0 = faster than glibc. Throughput ratio > 1.0 = faster than glibc.
Hardened allocators: compatmalloc, scudo. These have security features (guard pages, quarantine, etc.) that add overhead vs. pure-performance allocators.
Peak RSS measured via
/usr/bin/time -vduring a single benchmark run. Hardening features (quarantine, guard pages) increase memory usage.
malloc/free Latency by Size (glibc)
size= 16: 12.8 ns
size= 32: 12.3 ns
size= 64: 12.3 ns
size= 128: 12.4 ns
size= 256: 12.3 ns
size= 512: 12.3 ns
size= 1024: 12.3 ns
size= 4096: 23.2 ns
size= 16384: 23.5 ns
size= 65536: 24.0 ns
size= 262144: 24.4 ns
size= 16: 15.9 ns
size= 64: 16.6 ns
size= 256: 26.8 ns
size= 1024: 28.5 ns
size= 4096: 70.3 ns
size= 65536: 764.3 ns
malloc/free Latency by Size (compatmalloc)
size= 16: 14.9 ns
size= 32: 14.5 ns
size= 64: 14.5 ns
size= 128: 14.7 ns
size= 256: 14.5 ns
size= 512: 14.5 ns
size= 1024: 14.5 ns
size= 4096: 14.5 ns
size= 16384: 14.5 ns
size= 65536: 24.2 ns
size= 262144: 24.2 ns
size= 16: 13.7 ns
size= 64: 13.8 ns
size= 256: 14.2 ns
size= 1024: 19.8 ns
size= 4096: 60.7 ns
size= 65536: 765.3 ns
Multi-threaded Throughput (glibc)
threads=1: 74.53 Mops/sec
threads=2: 144.38 Mops/sec
threads=4: 171.84 Mops/sec
threads=8: 167.43 Mops/sec
Multi-threaded Throughput (compatmalloc)
threads=1: 65.48 Mops/sec
threads=2: 125.34 Mops/sec
threads=4: 150.78 Mops/sec
threads=8: 145.36 Mops/sec
Real-World Application Overhead
| Application | glibc | compatmalloc | Overhead |
|---|---|---|---|
| python-json | .0728s | .0857s | 17.00% |
| redis | 3.3375s | 3.3545s | 0% |
| nginx | 5.1043s | 5.1042s | -1.00% |
| sqlite | .2104s | .1281s | -40.00% |
| git | .3197s | .1816s | -44.00% |
Application benchmarks measure wall-clock time for real programs (Python, Redis, nginx, SQLite, Git). Overhead = (compatmalloc_time / glibc_time - 1) * 100%.
Performance characteristics
Expected overhead
Compared to glibc's ptmalloc2, compatmalloc adds overhead from several sources:
| Source | Per-malloc cost | Per-free cost |
|---|---|---|
| Metadata table insert | Hash + linear probe + mutex | -- |
| Metadata table lookup | -- | Hash + linear probe + mutex |
| Canary write | memset of gap bytes | Canary check (byte comparison) |
| Poison fill | -- | memset of allocation |
| Quarantine push/evict | -- | Mutex + ring buffer enqueue |
| Zero-on-free | -- | memset of allocation (on eviction) |
| Guard page setup | mprotect (large alloc only) | -- |
For small allocations (16-256 bytes), the dominant costs are the metadata table operations and the canary/poison fills. For large allocations, the mmap/munmap syscalls dominate regardless of hardening.
Size class efficiency
The slab allocator uses 4-per-doubling size classes, which means internal fragmentation is at most 25% for any allocation. Size classes range from 16 bytes to 16,384 bytes (36 classes total).
Arena contention
With the default arena count (one per CPU), contention is low for most workloads. Programs with many threads performing high-frequency allocations may benefit from explicitly setting COMPATMALLOC_ARENA_COUNT to a higher value.
Running benchmarks
Microbenchmark suite
The benchmark suite is a standalone binary that measures allocator performance via LD_PRELOAD:
# Build the library and benchmark
cargo build --release
rustc -O benches/src/micro.rs -o target/release/micro
# Run with glibc (baseline)
ALLOCATOR_NAME=glibc ./target/release/micro
# Run with compatmalloc
ALLOCATOR_NAME=compatmalloc \
LD_PRELOAD=./target/release/libcompatmalloc.so \
./target/release/micro
Full comparison script
To compare against multiple allocators (glibc, jemalloc, mimalloc, scudo):
./benches/scripts/run_comparison.sh
Disabling hardening for comparison
To measure the overhead of hardening features, build with no features:
cargo build --release --no-default-features
ALLOCATOR_NAME=minimal \
LD_PRELOAD=./target/release/libcompatmalloc.so \
./target/release/micro
LD_PRELOAD benchmarks with external programs
For realistic benchmarks, test with real applications:
# Time a build with and without compatmalloc
time cargo build --release
time LD_PRELOAD=./target/release/libcompatmalloc.so \
cargo build --release
# Python workload
time python3 -c "
import json
data = [{'key': str(i), 'value': list(range(100))} for i in range(10000)]
result = json.dumps(data)
parsed = json.loads(result)
"
time LD_PRELOAD=./target/release/libcompatmalloc.so python3 -c "
import json
data = [{'key': str(i), 'value': list(range(100))} for i in range(10000)]
result = json.dumps(data)
parsed = json.loads(result)
"
Tuning for performance
If the overhead is too high for your use case, you can selectively disable features:
| Configuration | Approximate overhead reduction |
|---|---|
Disable zero-on-free | Removes one memset per free |
Disable poison-on-free | Removes one memset per free (and disables write-after-free check) |
| Reduce quarantine size | Reduces memory pressure and eviction processing |
Disable guard-pages | Removes mprotect calls and reduces virtual address space usage |
Disable canaries | Removes canary write/check per alloc/free |
COMPATMALLOC_DISABLE=1 | Bypasses all hardening (passthrough to glibc) |
Weighted composite overhead
The headline "Weighted Overhead" metric computes a single overhead percentage that accounts for real-world allocation size distributions. Instead of reporting only the 64-byte latency, we weight each allocation size by its frequency in typical programs (based on jemalloc/tcmalloc telemetry data):
| Size | Weight | Rationale |
|---|---|---|
| 16B | 20% | Most common (tiny objects, pointers, small structs) |
| 32B | 15% | Second most common |
| 64B | 15% | Common for small structs, string headers |
| 128B | 12% | Medium-small objects |
| 256B | 10% | Strings, small buffers |
| 512B | 8% | Buffers |
| 1K | 5% | Page-ish allocations |
| 4K | 5% | Page-aligned allocations |
| 16K | 4% | Large buffers |
| 64K | 3% | Near mmap threshold |
| 256K | 3% | Very large allocations |
Formula: overhead = (Σ weight_i × (alloc_latency_i / glibc_latency_i) − 1) × 100%
A weighted overhead of +15% means compatmalloc is 15% slower than glibc across a representative workload mix. Negative values indicate compatmalloc is faster.
Methodology notes
When benchmarking allocators, keep the following in mind:
- Warm up the allocator. The first few allocations may be slower due to slab initialization and metadata table growth.
- Test with realistic workloads. Microbenchmarks of
malloc/freeloops do not represent real application behavior. - Measure RSS, not just time. Hardening features (quarantine, guard pages) increase resident memory. Use
getrusageor/proc/self/statusto measureVmRSS. - Account for variance. Run benchmarks multiple times and report medians. Allocator performance can be sensitive to ASLR and system load.
- Best-of-3 selection. CI results use the minimum latency and maximum throughput from 3 runs. This filters out noise from shared infrastructure while reflecting the allocator's true capability.
- Compare against other allocators. The comparison table includes jemalloc and mimalloc (performance-focused) alongside scudo (hardened, like compatmalloc). This provides context for the overhead of hardening features.
CVE Case Studies
This section demonstrates how compatmalloc's hardening features detect and prevent exploitation techniques used in real-world CVEs affecting glibc's heap allocator.
Methodology
For each CVE, we provide:
- A minimal proof-of-concept (C program in
tests/cve/) that demonstrates the exploitation technique - Side-by-side output showing behavior under glibc vs. compatmalloc
- Analysis of which hardening features provide protection and their limitations
Honest assessment
compatmalloc is an open-source project aiming to improve. Where our hardening has limitations, we document them. Our software canary checks detect overflows on free(), not at the moment of overflow. Our quarantine detects write-after-free on eviction, not at the moment of write. On ARM64 hardware with Memory Tagging Extension (MTE), these limitations are eliminated — overflows and use-after-free are detected immediately at the point of access. On hardware without MTE, the software mechanisms provide equivalent coverage with delayed detection.
Case studies
| CVE | CVSS | Technique | compatmalloc detection |
|---|---|---|---|
| CVE-2024-2961 | 8.8 | Buffer overflow / tcache poisoning | Canary check + out-of-band metadata |
| CVE-2023-6246 | 7.8 | Heap buffer overflow / metadata corruption | Canary check + guard pages |
| Double-free | -- | Double-free / tcache dup | Metadata FLAG_FREED check |
Running the demos
# Build compatmalloc
cargo build --release
# Run all CVE demos
./tests/cve/run_demos.sh
# Run a specific demo
gcc -o /tmp/demo tests/cve/double_free.c
# With glibc:
/tmp/demo
# With compatmalloc:
LD_PRELOAD=./target/release/libcompatmalloc.so /tmp/demo
CVE-2024-2961: iconv Buffer Overflow
Vulnerability Summary
| Field | Value |
|---|---|
| CVE | CVE-2024-2961 |
| CVSS | 8.8 (High) |
| Affected | glibc <= 2.39 |
| Disclosed | 2024-04-17 |
| Type | Out-of-bounds write in iconv() |
The iconv() function in glibc overflows the output buffer by 1-3 bytes when converting strings to the ISO-2022-CN-EXT character set. The overflow occurs because escape sequence writes for SS2 and SS3 designations lack bounds checks.
Exploitation Technique
In real-world exploitation (the CNEXT exploit chain against PHP), attackers use this 1-3 byte overflow to corrupt tcache forward pointers in adjacent freed heap chunks:
- Groom the heap so a freed chunk sits immediately after the iconv output buffer
- Trigger the iconv overflow to modify the low byte(s) of the tcache
fdpointer - The corrupted pointer redirects a subsequent
malloc()to an attacker-controlled address - Write to that address to overwrite
__free_hookor a GOT entry - Trigger the hook to achieve remote code execution
This technique -- tcache poisoning via buffer overflow -- works because glibc stores freelist metadata (the fd pointer) inline within freed chunks, directly adjacent to user data.
Proof of Concept
Source: tests/cve/tcache_poison.c
The PoC demonstrates the exploitation technique (1-byte write past the requested allocation size) rather than calling iconv() directly. This keeps it simple and version-independent.
gcc -o /tmp/tcache_poison tests/cve/tcache_poison.c
glibc output
=== Tcache Poisoning via 1-Byte Overflow Demo ===
(CVE-2024-2961 exploitation technique)
[1] chunk_a = malloc(50) => 0x...
[2] chunk_b = malloc(50) => 0x...
distance: 64 bytes
[3] free(chunk_b) => chunk_b enters tcache
[4] chunk_b tcache fd = 0x...
[5] Simulating 1-byte overflow from chunk_a into chunk_b...
[6] free(chunk_a) => queued for deferred canary check
[7] Triggering batch flush (70 frees)...
[!] 1-byte overflow was NOT detected.
Under glibc, writing 1 byte past the requested 50-byte allocation lands within glibc's usable 56-byte region of the 64-byte chunk. No detection occurs. In the real CVE, larger overflows (1-3 bytes into adjacent chunks) corrupt the tcache fd pointer.
compatmalloc output
=== Tcache Poisoning via 1-Byte Overflow Demo ===
(CVE-2024-2961 exploitation technique)
[1] chunk_a = malloc(50) => 0x...
[2] chunk_b = malloc(50) => 0x...
distance: -46784 bytes
[3] free(chunk_b) => chunk_b enters tcache
[4] chunk_b tcache fd = 0x4242424242424242
[5] Simulating 1-byte overflow from chunk_a into chunk_b...
[6] free(chunk_a) => queued for deferred canary check
[7] Triggering batch flush (70 frees)...
compatmalloc: heap buffer overflow detected (canary corrupted)
compatmalloc aborts immediately when the canary check detects the overflow.
What compatmalloc catches
Two independent layers of defense apply:
-
Canary bytes. compatmalloc places canary bytes in the padding between the requested size and the slot size. For
malloc(50)in a 64-byte slot, bytes[50..64)contain canary values. The 1-byte write at offset 50 corrupts the canary, which is detected onfree(). -
Out-of-band metadata. Even without canaries, compatmalloc stores all freelist metadata in a separate
mmapregion -- not inline within freed chunks. There are nofdpointers adjacent to user data to corrupt. The fundamental prerequisite of tcache poisoning (corruptible inline metadata) does not exist. -
Slot randomization. Allocations are not placed adjacently in predictable order, making heap grooming significantly harder.
What compatmalloc does NOT catch
- The overflow is not detected at the moment it happens. Canary checks run on
free()(specifically during deferred batch verification). If the overflowed buffer is never freed, detection is delayed. - compatmalloc does not fix the
iconv()bug itself. It prevents the exploitation technique (tcache poisoning) from succeeding, but the overflow still occurs iniconv(). - Intra-slot overflows between adjacent slots in the same slab are caught by canaries, not guard pages. Guard pages only protect slab boundaries.
References
CVE-2023-6246: syslog Heap Buffer Overflow
Vulnerability Summary
| Field | Value |
|---|---|
| CVE | CVE-2023-6246 |
| CVSS | 7.8 (High) |
| Affected | glibc >= 2.36 |
| Disclosed | 2024-01-30 |
| Type | Heap-based buffer overflow in __vsyslog_internal() |
A heap-based buffer overflow in glibc's __vsyslog_internal() function, called by syslog() and vsyslog(). When openlog() is not called (or called with ident set to NULL) and the program name (argv[0]) exceeds 1024 bytes, the function overflows a heap buffer. Discovered by Qualys.
Exploitation Technique
The exploit targets su, a common SUID-root program:
- Execute
suwith an extremely longargv[0](> 1024 bytes) - PAM calls
syslog()on authentication failure without callingopenlog()first __vsyslog_internal()copies the program name into a heap buffer without proper bounds checking- The overflow corrupts adjacent heap chunk metadata (size, flags, prev_size)
- Subsequent heap operations trigger controlled writes via techniques like unsafe unlink
- Achieve local privilege escalation to root
The key enabler: glibc stores malloc metadata (chunk headers) inline, directly adjacent to user data. An overflow from one allocation silently corrupts the metadata of the next chunk.
Proof of Concept
Source: tests/cve/heap_overflow.c
The PoC simulates the syslog overflow pattern: allocate a buffer and write past its end.
gcc -o /tmp/heap_overflow tests/cve/heap_overflow.c
glibc output
=== Heap Buffer Overflow Detection Demo ===
(CVE-2023-6246 pattern)
[1] malloc(100) => 0x...
[2] memset(0x..., 'X', 120) => overflow by 20 bytes!
[3] free(0x...) => queued for deferred canary check
[4] Triggering batch flush (70 frees)...
[!] Heap overflow was NOT detected on free().
Under glibc, the adjacent chunk's metadata may be
silently corrupted, enabling exploitation.
glibc does not detect the overflow. The 20 extra bytes silently overwrite whatever follows the allocation in memory.
compatmalloc output
=== Heap Buffer Overflow Detection Demo ===
(CVE-2023-6246 pattern)
[1] malloc(100) => 0x...
[2] memset(0x..., 'X', 120) => overflow by 20 bytes!
[3] free(0x...) => queued for deferred canary check
[4] Triggering batch flush (70 frees)...
compatmalloc: heap buffer overflow detected (canary corrupted)
compatmalloc detects the overflow when the canary bytes are checked during batch verification.
What compatmalloc catches
-
Canary bytes. For
malloc(100), compatmalloc returns a 112-byte slot. Bytes[100..112)contain canary values derived from a cryptographic secret. The 20-byte overflow destroys these canaries. Onfree(), the canary check detects the corruption and aborts. -
Out-of-band metadata. Even if the overflow extends past the slot boundary, it cannot corrupt allocator metadata because metadata is stored in a separate
mmapregion. The fundamental unsafe-unlink exploitation technique (corrupting inline chunk headers) is not possible. -
Guard pages. For overflows that extend past the end of a slab region, guard pages (
PROT_NONE) trigger a hardware fault (SIGSEGV). This provides immediate detection without waiting forfree().
What compatmalloc does NOT catch
- The overflow is detected on
free(), not at the moment of the write. Between the overflow and the canary check, the program continues executing with corrupted memory. If exploitation completes beforefree()is called, the canary check may come too late. - compatmalloc does not fix the
syslog()bug. Using compatmalloc asLD_PRELOADprevents heap corruption from being exploitable, but the buffer overflow in__vsyslog_internal()still occurs. - Intra-slab overflows between adjacent slots in the same slab are detected by canaries, not guard pages. An overflow that exactly fills the canary gap and stops would not be detected (though this is difficult to achieve in practice without knowing the canary secret).
References
Double-Free Detection
Overview
Double-free is one of the most fundamental heap exploitation primitives, appearing across dozens of CVEs in components that rely on glibc's allocator. It has been observed in glibc's own regcomp(), in application-level parsers, and in many other components. Rather than tracking a single CVE, this page describes the general class of double-free vulnerabilities and how compatmalloc detects them.
Exploitation Technique: Tcache Dup
When a chunk is freed twice, it appears in the tcache freelist twice, creating a cycle:
tcache[64B]: chunk_A -> chunk_A -> chunk_A -> ... (cycle)
Two subsequent malloc() calls of the same size return the same pointer:
char *a = malloc(64); // returns chunk_A
char *b = malloc(64); // returns chunk_A again!
// a == b -- both point to the same memory
This enables type confusion: the program believes a and b are separate allocations, but writes through one are visible through the other. An attacker can use this to overwrite function pointers, vtable entries, or other security-sensitive data.
glibc's mitigation history
| glibc version | Detection mechanism | Bypassable? |
|---|---|---|
| < 2.29 | None | N/A -- no detection at all |
| 2.29+ | tcache key (random value stored at offset 8 in freed chunk) | Yes -- the key is stored inline and can be overwritten by a heap write primitive |
| 2.32+ | PROTECT_PTR (pointer mangling via XOR with address) | Harder but still inline -- can be bypassed with an info leak |
All of glibc's mitigations store detection data inline within the freed chunk's user data region. An attacker with any heap write capability can clear or forge these values before triggering the second free().
Proof of Concept
Source: tests/cve/double_free.c
gcc -o /tmp/double_free tests/cve/double_free.c
glibc output (>= 2.29)
=== Double-Free Detection Demo ===
[1] malloc(64) => 0x...
[2] free(0x...) => OK
[3] free(0x...) => double free! (should be caught)
free(): double free detected in tcache 2
Modern glibc (>= 2.29) does detect this case via the tcache key. However, the key is stored inline at chunk + 8 and can be overwritten by an attacker with a write-after-free primitive before the second free().
compatmalloc output
=== Double-Free Detection Demo ===
[1] malloc(64) => 0x...
[2] free(0x...) => OK
[3] free(0x...) => double free! (should be caught)
compatmalloc: double free detected
compatmalloc aborts immediately on the second free().
What compatmalloc catches
-
Out-of-band FLAG_FREED check. The metadata table stores a
FLAG_FREEDbit for every allocation in a separatemmapregion. On everyfree():- Look up the pointer in the metadata table
- If
FLAG_FREEDis already set, abort with "double free detected" - Otherwise, set
FLAG_FREED
-
Cannot be bypassed by heap writes. Because the metadata table is in a separate memory region (not adjacent to user data), an attacker cannot corrupt the
FLAG_FREEDbit via a buffer overflow or use-after-free write. This is the fundamental advantage over glibc's inline tcache key approach. -
No version-dependent behavior. The detection works identically regardless of glibc version, allocation size, or tcache state. Every
free()is checked, every time.
What compatmalloc does NOT catch
- Aliased pointer double-frees. If a program has two pointers to the same allocation (e.g.,
a = malloc(64); b = a;) and frees both, compatmalloc detects this because it tracks the allocation address, not the pointer variable. Bothfree(a)andfree(b)resolve to the same metadata entry. - Root cause identification. The abort happens at the second
free()call, not at the point where the bug was introduced. For complex programs, the stack trace at the abort may not directly reveal why the double-free occurred. - Deliberate double-free patterns. Some (buggy) programs intentionally double-free and rely on glibc silently accepting it. These programs will abort under compatmalloc. This is by design -- double-free is always a bug.