Introduction

compatmalloc is a drop-in memory-hardening allocator for Linux. It builds as a shared library (libcompatmalloc.so) that you can inject into any dynamically linked program via LD_PRELOAD, replacing the standard C memory allocator with one that actively detects and mitigates common heap exploitation techniques.

Why compatmalloc?

Heap vulnerabilities -- use-after-free, heap buffer overflows, double frees, and metadata corruption -- remain among the most exploited bug classes in native software. Standard allocators like glibc's ptmalloc2 are optimized for throughput and make no attempt to detect misuse at runtime.

compatmalloc exists to close that gap. It provides:

Detection of use-after-free, buffer overruns (via canaries), and double-free conditions.
Mitigation through delayed memory reuse (quarantine), out-of-band metadata, and guard pages.
Compatibility with the full glibc malloc ABI, so existing binaries work without recompilation.

Design goals

Drop-in replacement. Export every symbol that glibc's malloc provides (malloc, free, realloc, calloc, posix_memalign, aligned_alloc, memalign, valloc, pvalloc, malloc_usable_size, mallopt, mallinfo, mallinfo2). Programs that link against glibc should work unchanged.
Defense in depth. Each hardening feature targets a different exploitation primitive. Features can be toggled individually through Cargo feature flags.
No standard library dependency. The allocator is built as a cdylib with #![no_std]-style patterns internally, using libc for system calls and dlsym(RTLD_NEXT) for fallback to the real allocator. This avoids circular dependencies and keeps the binary small.
Reasonable performance. The allocator is not a benchmark champion, but its overhead should be acceptable for development, testing, and hardened production deployments.

How it works

When loaded via LD_PRELOAD, compatmalloc's exported symbols override glibc's. A library constructor (__attribute__((constructor)) equivalent via .init_array) runs before main(), resolving the real libc functions via dlsym(RTLD_NEXT) and initializing the hardened allocator.

All allocations smaller than 16 KiB go through a slab allocator with per-CPU arenas. Larger allocations get individual mmap regions with optional guard pages on both sides. An out-of-band metadata table (stored in a separate mmap region) tracks each allocation's requested size, canary value, and freed-state flag, preventing attackers from corrupting heap metadata by overflowing adjacent allocations.

When the allocator is disabled (via COMPATMALLOC_DISABLE=1), all calls pass through to glibc, making it easy to toggle off in production if needed.

Getting Started

Prerequisites

Rust toolchain (stable channel). Install via rustup.
Linux x86_64 (the primary supported platform).
A C compiler toolchain (gcc or clang) for linking the cdylib.

Build

Clone the repository and build the release library:

git clone https://github.com/t-cun/compatmalloc.git
cd compatmalloc
cargo build --release

The output shared library is at:

target/release/libcompatmalloc.so

Basic usage with LD_PRELOAD

Inject the library into any dynamically linked program:

LD_PRELOAD=./target/release/libcompatmalloc.so <your-program>

For example:

# Run bash with compatmalloc
LD_PRELOAD=./target/release/libcompatmalloc.so bash -c 'echo "hello from hardened malloc"'

# Run Python
LD_PRELOAD=./target/release/libcompatmalloc.so python3 -c 'print("works")'

# Run a server
LD_PRELOAD=./target/release/libcompatmalloc.so ./my-server

Verify it works

You can confirm that compatmalloc is intercepting allocations by checking that the library is loaded:

LD_PRELOAD=./target/release/libcompatmalloc.so \
  bash -c 'cat /proc/self/maps | grep compatmalloc'

This should show the library mapped into the process address space.

You can also check exported symbols:

nm -D target/release/libcompatmalloc.so | grep -E ' T (malloc|free|calloc|realloc)$'

Expected output:

0000000000xxxxxx T calloc
0000000000xxxxxx T free
0000000000xxxxxx T malloc
0000000000xxxxxx T realloc

Disable at runtime

If you need to bypass the hardened allocator without removing LD_PRELOAD, set the kill-switch environment variable:

COMPATMALLOC_DISABLE=1 LD_PRELOAD=./target/release/libcompatmalloc.so <your-program>

This makes all allocator calls pass through to glibc. See Configuration for all available options.

Run the test suite

cargo test --workspace

This runs unit tests for all internal modules (size classes, bitmap, metadata table, etc.).

Building

Prerequisites

Rust stable toolchain. Install via rustup:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Linux x86_64. The primary supported platform. The library uses Linux-specific APIs (mmap, mprotect, futex, /proc/self/maps).
C linker. The cc crate will use gcc or clang for linking the cdylib. On Ubuntu/Debian: apt install build-essential.

Build commands

Debug build

cargo build --workspace

Output: target/debug/libcompatmalloc.so

The debug build includes debug assertions (debug_assert!) and debug symbols. It is suitable for development and testing but not for performance measurement.

Release build

cargo build --workspace --release

Output: target/release/libcompatmalloc.so

The release profile is configured in the workspace Cargo.toml with aggressive optimizations:

[profile.release]
opt-level = 3
lto = "fat"
codegen-units = 1
panic = "abort"

opt-level = 3 -- maximum optimization.
lto = "fat" -- full link-time optimization across all crates.
codegen-units = 1 -- single codegen unit for better optimization (slower compile).
panic = "abort" -- no unwinding (smaller binary, no landing pads).

Hardened profile

A custom hardened profile is available for deployments that want a balance between debuggability and performance:

cargo build --workspace --profile hardened

[profile.hardened]
inherits = "release"
opt-level = 2
overflow-checks = true
debug = 1

opt-level = 2 -- slightly less aggressive optimization (faster compile, slightly larger binary).
overflow-checks = true -- arithmetic overflow panics instead of wrapping.
debug = 1 -- line-level debug info (for useful backtraces without full debug bloat).

Feature flags

The compatmalloc crate defines the following features:

Feature	Default	Description
`hardened`	Yes	Meta-feature that enables all hardening features below
`quarantine`	Via `hardened`	Delay reuse of freed memory
`guard-pages`	Via `hardened`	Place inaccessible pages around allocations
`slot-randomization`	Via `hardened`	Randomize slot selection within size classes
`canaries`	Via `hardened`	Detect buffer overflows via canary bytes
`poison-on-free`	Via `hardened`	Fill freed memory with a poison pattern
`write-after-free-check`	Via `hardened`	Detect writes to freed memory during quarantine eviction
`zero-on-free`	Via `hardened`	Zero memory on free (defense against information leaks)

Building with specific features

# All hardening (default)
cargo build --release

# No hardening (passthrough-like performance)
cargo build --release --no-default-features

# Only quarantine and guard pages
cargo build --release --no-default-features --features quarantine,guard-pages

# Everything except zero-on-free (reduce free overhead)
cargo build --release --no-default-features \
  --features quarantine,guard-pages,slot-randomization,canaries,poison-on-free,write-after-free-check

Linker scripts

The build script (build.rs) configures platform-specific linker behavior:

Linux: A version script (linker/version_script.lds) controls which symbols are exported. Only the standard C allocator symbols are exported; all internal Rust symbols are hidden.
macOS: All symbols are exported by default (no special configuration yet).
Windows: A .def file (linker/exports.def) lists exported symbols.

Running tests

# All tests
cargo test --workspace

# Release mode tests
cargo test --workspace --release

# Tests with no default features
cargo test --workspace --no-default-features

# Tests for a single feature
cargo test --workspace --no-default-features --features canaries

Checking code quality

# Format check
cargo fmt --all -- --check

# Clippy lints
cargo clippy --workspace --all-targets --all-features -- -D warnings

Cross-compilation

The library is primarily designed for Linux x86_64. Cross-compilation to other Linux architectures (aarch64, etc.) should work but is not tested in CI. Non-Linux platforms (macOS, Windows) have stub platform implementations but are not fully supported.

# Example: cross-compile for aarch64-linux
rustup target add aarch64-unknown-linux-gnu
cargo build --release --target aarch64-unknown-linux-gnu

Configuration

compatmalloc reads configuration from environment variables at initialization time (before main() runs). All configuration is optional; the defaults provide a good balance of security and performance.

Environment variables

COMPATMALLOC_DISABLE

Type: presence-based (any value enables it) Default: not set (allocator is enabled)

When this variable is set to any non-empty value, the hardened allocator is completely bypassed. All malloc/free/realloc/calloc calls are forwarded directly to glibc via dlsym(RTLD_NEXT).

Use this as a kill-switch if you suspect the allocator is causing issues with a specific program:

COMPATMALLOC_DISABLE=1 LD_PRELOAD=./libcompatmalloc.so ./my-program

Implementation: Checked during init via config::is_disabled(). The init state machine transitions to DISABLED instead of READY, and the dispatch macro routes all calls to the passthrough allocator.

COMPATMALLOC_ARENA_COUNT

Type: unsigned integer Default: number of CPUs (capped at 32)

Sets the number of slab arenas. Each arena has its own set of size-class slabs and its own locks, so more arenas reduce contention in multi-threaded programs.

COMPATMALLOC_ARENA_COUNT=8 LD_PRELOAD=./libcompatmalloc.so ./my-server

Valid range: 1 to 32 (MAX_ARENAS). Values above 32 are clamped. A value of 0 means "use the default" (number of CPUs).

Tradeoff: More arenas reduce lock contention but increase memory usage (each arena independently maps slab regions). For single-threaded programs, COMPATMALLOC_ARENA_COUNT=1 is optimal.

Thread-to-arena mapping: Threads are assigned to arenas by thread_id % num_arenas. This provides a rough approximation of per-CPU arenas without requiring sched_getcpu.

COMPATMALLOC_QUARANTINE_SIZE

Type: unsigned integer (bytes) Default: 4194304 (4 MiB)

Sets the maximum total bytes held in the quarantine queue. When the quarantine's total byte count would exceed this limit, the oldest entries are evicted (and their slots returned to the free list) until the limit is satisfied.

# Larger quarantine for more thorough use-after-free detection
COMPATMALLOC_QUARANTINE_SIZE=16777216 LD_PRELOAD=./libcompatmalloc.so ./my-program

# Smaller quarantine to reduce memory overhead
COMPATMALLOC_QUARANTINE_SIZE=1048576 LD_PRELOAD=./libcompatmalloc.so ./my-program

Valid range: 0 to usize::MAX. A value of 0 means entries are evicted immediately (effectively disabling quarantine delay, though the quarantine code path is still executed).

Note: This variable only has effect when the quarantine feature is enabled (it is enabled by default in the hardened feature set).

Configuration timing

All environment variables are read once during the library constructor (__attribute__((constructor)) equivalent via .init_array). Configuration cannot be changed at runtime. This design avoids the need for synchronization on configuration reads in the hot path.

The read sequence during init:

passthrough::resolve_real_functions() -- resolve glibc symbols via dlsym.
config::read_config() -- read COMPATMALLOC_ARENA_COUNT and COMPATMALLOC_QUARANTINE_SIZE.
config::is_disabled() -- check COMPATMALLOC_DISABLE.
HardenedAllocator::init() -- apply configuration values.

Summary table

Variable	Type	Default	Description
`COMPATMALLOC_DISABLE`	presence	not set	Bypass hardened allocator entirely
`COMPATMALLOC_ARENA_COUNT`	uint	CPU count (max 32)	Number of per-thread slab arenas
`COMPATMALLOC_QUARANTINE_SIZE`	uint (bytes)	4194304 (4 MiB)	Maximum quarantine queue size

ABI Contract

compatmalloc exports every symbol that a glibc-linked program may reference for dynamic memory management. This page documents each exported function, its parameters, return values, and error behavior.

All functions use the C ABI (extern "C").

Standard C allocator

`malloc`

void *malloc(size_t size);

Allocate size bytes of uninitialized memory.

Parameters: size -- number of bytes to allocate.
Returns: Pointer to the allocated memory, aligned to at least 16 bytes. Returns NULL on failure (and errno is set to ENOMEM by the underlying mapping).
Special case: malloc(0) returns a valid, unique, non-NULL pointer (to a 1-byte internal allocation). This pointer must be passed to free().

`free`

void free(void *ptr);

Release memory previously returned by malloc, calloc, realloc, or an alignment function.

Parameters: ptr -- pointer to free. If NULL, the call is a no-op.
Behavior on double-free: If the write-after-free-check feature is enabled and the metadata table detects the pointer was already freed, the allocator writes a diagnostic to stderr and calls abort().
Behavior on invalid pointer: Pointers not recognized by any arena or the large allocator are silently ignored for compatibility.

`realloc`

void *realloc(void *ptr, size_t size);

Change the size of a previously allocated block.

Parameters:
- ptr -- pointer to the existing allocation. If NULL, behaves like malloc(size).
- size -- new size in bytes. If 0, behaves like free(ptr) and returns NULL.
Returns: Pointer to the resized block (may differ from ptr). Returns NULL on failure; the original block is left unchanged.
Copy behavior: When a new block is allocated, min(old_size, new_size) bytes are copied from the old block.

`calloc`

void *calloc(size_t nmemb, size_t size);

Allocate zeroed memory for an array.

Parameters:
- nmemb -- number of elements.
- size -- size of each element.
Returns: Pointer to zero-initialized memory. Returns NULL if the multiplication nmemb * size overflows or if allocation fails.
Overflow protection: Uses checked_mul internally. On overflow, sets errno to ENOMEM and returns NULL.

POSIX alignment APIs

`posix_memalign`

int posix_memalign(void **memptr, size_t alignment, size_t size);

Allocate memory with a specified alignment (POSIX).

Parameters:
- memptr -- output pointer. Must not be NULL.
- alignment -- must be a power of two and at least sizeof(void *) (8 bytes on 64-bit).
- size -- number of bytes.
Returns: 0 on success (pointer stored in *memptr). EINVAL if memptr is NULL or alignment is invalid. ENOMEM if allocation fails.

`aligned_alloc`

void *aligned_alloc(size_t alignment, size_t size);

Allocate memory with a specified alignment (C11).

Parameters:
- alignment -- must be a power of two.
- size -- must be a multiple of alignment (unless size is 0).
Returns: Aligned pointer, or NULL on failure (with errno set to EINVAL or ENOMEM).

`memalign`

void *memalign(size_t alignment, size_t size);

Allocate memory with a specified alignment (legacy).

Parameters:
- alignment -- must be a power of two.
- size -- number of bytes.
Returns: Aligned pointer, or NULL on failure.

`valloc`

void *valloc(size_t size);

Allocate page-aligned memory.

Parameters: size -- number of bytes.
Returns: Pointer aligned to the system page size (4096 bytes), or NULL on failure.

`pvalloc`

void *pvalloc(size_t size);

Allocate page-aligned memory, rounding the size up to a page boundary.

Parameters: size -- number of bytes (rounded up to the next multiple of the page size).
Returns: Page-aligned pointer, or NULL on failure.

GNU extensions

`malloc_usable_size`

size_t malloc_usable_size(void *ptr);

Return the usable size of an allocation.

Parameters: ptr -- pointer returned by an allocator function. If NULL, returns 0.
Returns: The number of usable bytes in the allocation. This may be larger than the originally requested size (due to size-class rounding) but programs must not rely on the excess bytes persisting across realloc.

`mallopt`

int mallopt(int param, int value);

Set allocator tuning parameters.

Behavior: Accepts all calls and returns 1 (success) but performs no action. Provided solely for binary compatibility with programs that call mallopt.

`mallinfo` / `mallinfo2`

struct mallinfo  mallinfo(void);
struct mallinfo2 mallinfo2(void);  // Linux only

Return allocator statistics.

Behavior: Returns a zeroed struct. Provided solely for binary compatibility. mallinfo2 is only exported on Linux targets.

Minimum alignment

All allocations are aligned to at least 16 bytes (MIN_ALIGN), which matches max_align_t on 64-bit Linux. This guarantees correct alignment for any C data type.

Deviations from glibc

compatmalloc aims for full ABI compatibility with glibc's malloc, but makes deliberate behavioral choices that differ in edge cases. This page documents every known deviation.

malloc(0)

Behavior	glibc	compatmalloc
`malloc(0)`	Returns a unique non-`NULL` pointer (implementation-defined minimum size)	Returns a unique non-`NULL` pointer (internally allocates 1 byte, rounded up to a 16-byte slot)

Both return a valid pointer that must be freed. The difference is academic; compatmalloc's behavior is conformant with the C standard, which states that malloc(0) may return either NULL or a unique pointer.

Minimum allocation alignment

Behavior	glibc	compatmalloc
Minimum alignment	16 bytes on 64-bit	16 bytes (`MIN_ALIGN`)

No deviation. Both guarantee alignment to max_align_t.

malloc_usable_size

Behavior	glibc	compatmalloc
Usable size	Typically the chunk size minus overhead (often significantly larger than requested)	The slab slot size for the allocation's size class

glibc often returns usable sizes much larger than requested due to its chunk-based design. compatmalloc returns the size-class slot size, which is typically closer to the requested size. Programs that depend on malloc_usable_size returning a value much larger than requested may behave differently.

realloc(ptr, 0)

Behavior	glibc	compatmalloc
`realloc(ptr, 0)`	Frees `ptr`, returns `NULL` (glibc 2.x behavior; was implementation-defined)	Frees `ptr`, returns `NULL`

No deviation in practice. Note that the C standard makes realloc(ptr, 0) implementation-defined; both implementations choose to free and return NULL.

mallopt / mallinfo / mallinfo2

Behavior	glibc	compatmalloc
`mallopt`	Adjusts internal tuning parameters	Accepts the call, returns success, does nothing
`mallinfo` / `mallinfo2`	Returns live statistics	Returns zeroed structs

These functions are provided only for binary compatibility. Programs that rely on mallopt to tune allocator behavior (e.g., M_MMAP_THRESHOLD) will find those tunings silently ignored. Programs that display mallinfo statistics will see all zeros.

Freed memory contents

Behavior	glibc	compatmalloc (default features)
After `free(ptr)`	Memory contents undefined; typically contains freelist pointers	Memory is poisoned with `0xFE` bytes

When the poison-on-free feature is enabled (included in the default hardened feature set), freed memory is overwritten with a poison byte (0xFE). Programs that access freed memory will read predictable but invalid data rather than stale user content or heap metadata.

If the zero-on-free feature is also enabled, memory is zeroed after poison checking, ensuring no sensitive data persists.

Double free

Behavior	glibc	compatmalloc
Double free	May print a diagnostic and abort, or may corrupt the heap silently depending on the tcache state	Detected via metadata flags; aborts with a diagnostic message to stderr

compatmalloc's out-of-band metadata tracks the freed state of each allocation, providing more reliable double-free detection than glibc's inline freelist checks.

Thread safety during init

Behavior	glibc	compatmalloc
Early allocations before full init	Uses a brk-based arena	Uses a static 64 KiB bootstrap buffer; allocations from this buffer cannot be freed or reallocated back to the system

The bootstrap buffer is a fixed-size bump allocator used only during the brief window when dlsym itself may call malloc before the real libc functions are resolved. Under normal operation, the bootstrap buffer is used for a handful of small allocations during initialization and is never exhausted.

Aligned allocation internals

Behavior	glibc	compatmalloc
Over-aligned allocations	Uses dedicated aligned chunk logic	Over-allocates by `size + alignment`, then returns an aligned offset within the allocation

This approach is correct but wastes up to alignment - 1 bytes per over-aligned allocation. For alignments of 16 bytes or less, no extra allocation is needed because the slab allocator already guarantees 16-byte alignment.

Failure Modes

This page documents what happens when compatmalloc encounters error conditions at runtime. Understanding these failure modes is important for debugging and for setting expectations about how the allocator behaves under stress or attack.

Out of memory (OOM)

Slab allocation failure

When mmap fails while allocating a new slab region, the slab allocator returns a null pointer. This propagates up through malloc, which returns NULL to the caller. The allocator does not abort on OOM; it follows the C standard convention of returning NULL.

Large allocation failure

When mmap fails for a large allocation, LargeAlloc::create returns None, and malloc returns NULL.

Metadata table growth failure

When the metadata table exceeds its 75% load factor and mmap fails for the new table, the grow function returns without growing. Subsequent insertions may degrade to long probe chains but will still function as long as there is at least one empty slot. The allocator does not abort.

Calloc overflow

If nmemb * size overflows usize, calloc sets errno to ENOMEM and returns NULL. No allocation is attempted.

Heap corruption detected

Canary violation

Trigger: free or realloc of a pointer whose canary bytes have been modified (buffer overflow detected).

Behavior: When the canary check fails, the allocator writes a diagnostic message to stderr and calls abort():

compatmalloc: canary check failed -- heap buffer overflow detected

The process is terminated immediately. This is intentional: a corrupted canary means the heap is in an unknown state, and continuing execution could allow exploitation.

Write-after-free detected

Trigger: A quarantine entry's poison bytes have been modified when the entry is evicted (write-after-free detected).

Behavior: The allocator writes a diagnostic to stderr and calls abort():

compatmalloc: write-after-free detected during quarantine eviction

Double free

Trigger: free is called on a pointer whose metadata FLAG_FREED bit is already set.

Behavior: For large allocations, the allocator writes a diagnostic to stderr and calls abort():

compatmalloc: double free detected (large)

For slab allocations, double-free detection relies on the metadata table's freed flag.

Guard page violation

Trigger: A read or write to a guard page (buffer overflow/underflow past the allocation region boundary).

Behavior: The kernel delivers SIGSEGV to the process. The allocator does not handle this signal; it results in the default behavior (core dump and termination). The faulting address will be within a guard page region, which can be identified in the core dump.

Diagnostic output

All diagnostic messages are written directly to file descriptor 2 (stderr) using libc::write, with no heap allocation. This ensures that diagnostics work even when the heap is corrupted. After writing the message, the allocator calls libc::abort(), which generates a SIGABRT and (on most configurations) a core dump.

The diagnostic path is implemented in hardening::abort_with_message:

#![allow(unused)]
fn main() {
pub fn abort_with_message(msg: &str) -> ! {
    unsafe {
        libc::write(2, msg.as_ptr() as *const libc::c_void, msg.len());
        libc::abort();
    }
}
}

Kill-switch behavior

When COMPATMALLOC_DISABLE=1 is set, the allocator enters disabled mode during initialization. All allocation calls pass through to glibc via dlsym(RTLD_NEXT). No hardening features are active, and no hardening-related failures can occur.

Summary table

Condition	Behavior	Exit?
`mmap` fails (OOM)	Returns `NULL`	No
`calloc` size overflow	Returns `NULL`, sets `errno = ENOMEM`	No
Metadata table growth fails	Continues with existing table	No
Canary violation	Diagnostic to stderr, `abort()`	Yes
Write-after-free	Diagnostic to stderr, `abort()`	Yes
Double free (large)	Diagnostic to stderr, `abort()`	Yes
Guard page violation	`SIGSEGV` (kernel-delivered)	Yes
Unknown pointer to `free`	Silently ignored	No

Hardening Overview

compatmalloc implements multiple layers of heap hardening, each targeting a different exploitation primitive. All hardening features are enabled by default through the hardened Cargo feature set and can be toggled individually.

Feature flags

Feature	Default	Description
`quarantine`	On	Delay memory reuse to detect use-after-free
`guard-pages`	On	Place inaccessible pages around allocations
`slot-randomization`	On	Randomize slot selection within size classes
`canaries`	On	Write canary bytes after allocations to detect overflows
`poison-on-free`	On	Fill freed memory with a poison pattern
`write-after-free-check`	On	Verify poison bytes on eviction from quarantine
`zero-on-free`	On	Zero memory after free (defense against information leaks)

To build with all hardening (the default):

cargo build --release

To build with no hardening (passthrough performance baseline):

cargo build --release --no-default-features

To build with specific features:

cargo build --release --no-default-features --features quarantine,guard-pages

Defense-in-depth model

The hardening features form layers that work together:

Allocation request
       |
       v
  [Slab allocator with per-CPU arenas]
       |
       +-- Slot randomization (unpredictable address)
       +-- Canary bytes (detect buffer overruns)
       +-- Out-of-band metadata (prevent metadata corruption)
       +-- Guard pages (hardware-enforced bounds)
       |
  On free:
       |
       +-- Double-free detection (metadata flag check)
       +-- Poison fill (detect use-after-free reads)
       +-- Quarantine (delay reuse, detect stale writes)
       +-- Zero-on-free (clear sensitive data)

Each layer provides value independently, but their combination makes exploitation significantly more difficult. An attacker must simultaneously bypass:

Canary validation to overflow without detection.
Poison checking to write after free without detection.
Quarantine delays to reclaim a specific address.
Guard pages to overflow beyond the allocation region.
Out-of-band metadata to corrupt heap management data.
Slot randomization to predict allocation addresses.

Per-feature documentation

Use-After-Free Detection -- Quarantine and poison-based detection.
Heap Metadata Protection -- Out-of-band metadata table.
Stale Pointer Mitigation -- Delayed reuse through quarantine.
Guard Pages -- Hardware-enforced memory boundaries.
ARM Memory Tagging (MTE) -- Hardware memory tagging on ARM64 (replaces canaries, poison, and zero-on-free).

Use-After-Free Detection

Use-after-free (UAF) is one of the most exploited memory safety vulnerabilities. It occurs when a program continues to access memory through a pointer after that memory has been freed. compatmalloc employs two complementary techniques to detect UAF: poison filling and quarantine-based write detection.

Poison on free

Feature flag: poison-on-free

When memory is freed, the entire allocation is overwritten with a poison byte pattern (0xFE). This provides two benefits:

Deterministic crash on read-after-free. Programs that read freed memory will encounter the poison pattern instead of stale data. Dereferencing a pointer value of 0xFEFEFEFEFEFEFEFE on x86_64 will typically cause a segfault, turning a silent data corruption bug into a crash.
Information leak prevention. Sensitive data (passwords, keys, session tokens) is overwritten immediately on free, reducing the window during which it can be extracted from the heap.

Implementation

The poison fill is performed by hardening::poison::poison_region, which calls core::ptr::write_bytes with the poison byte (0xFE, defined in util::POISON_BYTE). The operation is a simple memset and adds minimal overhead.

Write-after-free detection

Feature flag: write-after-free-check

When an allocation is evicted from quarantine (see Stale Pointer Mitigation), the allocator checks whether the poison bytes are still intact. If any byte has been modified, it indicates that something wrote to the memory after it was freed -- a write-after-free condition.

Detection flow

free(ptr)
   |
   +-- Poison fill: memset(ptr, 0xFE, size)
   +-- Mark as freed in metadata table
   +-- Push into quarantine
   |
   ... time passes, quarantine fills up ...
   |
   Quarantine eviction:
   +-- Check poison: are all bytes still 0xFE?
   |     |
   |     +-- YES: no write-after-free, safe to reuse
   |     +-- NO:  write-after-free detected, abort
   |
   +-- Actually recycle the slot

Poison checking implementation

The poison check (hardening::poison::check_poison) reads memory in 8-byte (u64) chunks for performance, comparing against the expected pattern 0xFEFEFEFEFEFEFEFE. Remaining bytes are checked individually. This makes the check fast even for large allocations.

Zero on free

Feature flag: zero-on-free

When enabled alongside poison-on-free, memory is zeroed after the poison check passes (or unconditionally if poison checking is disabled). This ensures that no sensitive data remains in the allocation even after it leaves quarantine.

The zeroing happens just before the slot is returned to the free pool:

Quarantine eviction:
   +-- Check poison (if enabled)
   +-- Zero fill: memset(ptr, 0x00, size)
   +-- Return slot to slab free list

Double-free detection

The out-of-band metadata table tracks whether each allocation has been freed via a FLAG_FREED bit in the AllocationMeta::flags field. When free is called:

The metadata for the pointer is looked up.
If is_freed() returns true, the allocator writes a diagnostic message to stderr and calls abort().
Otherwise, the freed flag is set via mark_freed().

This detection is more reliable than glibc's inline freelist checks because the metadata is stored in a separate memory region that cannot be corrupted by a heap buffer overflow.

Heap Metadata Protection

Traditional allocators like glibc's ptmalloc2 store heap metadata (chunk sizes, freelist pointers) inline, immediately adjacent to user data. This design is efficient but means that a heap buffer overflow can corrupt the allocator's own bookkeeping, enabling powerful exploitation techniques like unlink attacks, fastbin corruption, and tcache poisoning.

compatmalloc eliminates this attack surface by storing all allocation metadata out-of-band in a separate memory region.

Out-of-band metadata table

The metadata table (hardening::metadata::MetadataTable) is a hash table backed by its own mmap region, completely separate from the slab and large allocation regions. It maps pointer addresses to AllocationMeta structs:

#![allow(unused)]
fn main() {
pub struct AllocationMeta {
    pub requested_size: usize,  // The size the caller asked for
    pub checksum_value: u64,    // Integrity checksum for corruption detection
    pub flags: u8,              // State flags (e.g., FLAG_FREED)
}
}

Why this matters

With inline metadata, an attacker who can overflow a heap buffer by even a single byte may be able to:

Modify the size of the next chunk, enabling overlapping allocations.
Corrupt freelist pointers, redirecting allocations to attacker-controlled addresses.
Forge fake chunks to confuse the allocator's validation checks.

With out-of-band metadata, none of these attacks work. The metadata lives in a different virtual memory region, so overflowing a user allocation cannot reach it.

Implementation details

Hash table design

The metadata table uses open addressing with linear probing:

Keys are the pointer address cast to usize.
Initial capacity is 16,384 entries.
Load factor threshold is 75%. When exceeded, the table grows by 2x via a new mmap and full rehash.
Hash function uses a multiplicative hash (key * 0x9E3779B97F4A7C15, the golden ratio constant) with a xor-shift mix for good distribution.
Deletion uses backward-shift deletion (not tombstones) to maintain probe chain integrity.

Concurrency

The table is protected by a raw mutex (sync::RawMutex, implemented via Linux futex). All operations (insert, get, remove, mark_freed) acquire the lock for their duration.

Memory isolation

The table's backing memory is allocated via mmap(MAP_PRIVATE | MAP_ANONYMOUS), placing it at an address chosen by the kernel. This address is independent of the slab and large allocation regions, providing spatial separation.

Growth

When the load factor exceeds 75%, a new region of double the capacity is mapped, all entries are rehashed into it, and the old region is unmapped. This operation is performed under the lock to ensure consistency.

Lookup on every free

Every call to free looks up the pointer in the metadata table to:

Check the FLAG_FREED bit for double-free detection.
Retrieve the requested_size for canary checking and poison filling.
Retrieve the checksum_value for integrity validation.

This adds a hash table lookup to every free operation, but the table is kept small relative to the number of live allocations, and the multiplicative hash provides good cache behavior.

Tradeoffs

Benefit	Cost
Immune to heap metadata corruption attacks	Extra memory for the hash table (~25 bytes per live allocation)
Reliable double-free detection	Hash table lookup on `malloc`, `free`, and `realloc`
Canary and size tracking without inline headers	Mutex contention under heavy multi-threaded allocation

Stale Pointer Mitigation

A stale pointer is a pointer that once referred to a valid allocation but now points to memory that has been freed and potentially reallocated for a different purpose. Stale pointers are the root cause of use-after-free vulnerabilities: if the memory is reallocated, the stale pointer now aliases a live object, and reads/writes through it corrupt unrelated data.

compatmalloc mitigates stale pointer exploitation through quarantine -- a bounded queue that delays the reuse of freed memory.

The quarantine

Feature flag: quarantine

When memory is freed, it is not immediately returned to the slab allocator's free list. Instead, it is pushed into a FIFO quarantine queue. The memory remains allocated (from the OS perspective) but is not available for new allocations. When the quarantine is full, the oldest entry is evicted and its slot is finally returned to the free list.

How it helps

Without quarantine, a freed slot can be immediately reused by the next malloc of the same size class. An attacker can trigger this reliably by controlling the timing of allocations and frees. With quarantine:

Temporal separation. Hundreds of frees must occur before a specific slot is reused, making timing-based heap grooming attacks much harder.
Write-after-free detection window. While memory is in quarantine, it remains poisoned. If anything writes to it during this window, the poison check on eviction will detect the corruption.
Reduced exploit reliability. Even if an attacker can trigger a use-after-free, the window during which the freed memory is reused for a useful (to the attacker) object is dramatically reduced.

Implementation

The quarantine (hardening::quarantine::Quarantine) is a fixed-capacity ring buffer with 256 slots per arena, protected by the arena lock.

                  head                        tail
                   |                            |
    [ evicted ] [ entry ] [ entry ] [ ... ] [ entry ] [ empty ] [ empty ]
                   |__________________________________|
                        queued (not yet reusable)

Eviction policy

Entries are evicted when either condition is met:

Byte budget exceeded. The total bytes in quarantine plus the new entry would exceed max_bytes. Oldest entries are evicted until the budget is satisfied.
Slot count exceeded. The ring buffer is full (256 entries). The oldest entry is evicted.

The byte budget defaults to 4 MiB (DEFAULT_QUARANTINE_BYTES) and can be configured via the COMPATMALLOC_QUARANTINE_SIZE environment variable.

Eviction processing

When an entry is evicted from quarantine:

If write-after-free-check is enabled, the poison bytes are verified.
If zero-on-free is enabled, the memory is zeroed.
The slot is returned to the slab allocator's free list for reuse.

Concurrency

The quarantine is embedded in each arena and protected by the arena lock. No separate quarantine lock is needed. A free call pushes one entry and potentially evicts older entries while the arena lock is held.

Configuration

Environment variable	Default	Description
`COMPATMALLOC_QUARANTINE_SIZE`	`4194304` (4 MiB)	Maximum bytes held in quarantine

Setting the quarantine size to 0 effectively disables quarantine (entries are evicted immediately), though the feature flag must also be disabled to eliminate the overhead entirely.

Setting a larger quarantine size increases the delay before memory is reused, improving detection probability at the cost of higher memory usage.

Tradeoffs

Benefit	Cost
Delays memory reuse, breaking heap grooming attacks	Increased resident memory (up to `quarantine_size` bytes held in reserve)
Enables write-after-free detection during quarantine window	One mutex acquisition per `free` call
Makes exploit timing unreliable	Slight increase in `free` latency

Guard Pages

Guard pages are regions of virtual memory marked as inaccessible (PROT_NONE) that the allocator places around allocation regions. Any read or write that crosses the boundary of an allocation into a guard page triggers an immediate hardware fault (segfault), providing deterministic detection of buffer overflows and underflows.

How guard pages work

Feature flag: guard-pages

When guard pages are enabled, the allocator inserts inaccessible pages at the boundaries of memory regions:

Large allocations

Each large allocation (>16 KiB) gets its own mmap region with the following layout:

+-------------------+---------------------+-------------------+
|   Guard page      |    User data        |   Guard page      |
|   (PROT_NONE)     |    (PROT_READ |     |   (PROT_NONE)     |
|   4096 bytes      |     PROT_WRITE)     |   4096 bytes      |
+-------------------+---------------------+-------------------+
^                   ^                                         ^
|                   |                                         |
base           user_ptr                              base + total_size

A buffer overflow past the end of the user data hits the rear guard page and faults. A buffer underflow (writing before the allocation) hits the front guard page.

Slab regions

Slab regions use the same pattern: guard pages are placed before and after the contiguous block of slots. This means that an overflow past the last slot in a slab, or an underflow before the first slot, will hit a guard page. However, overflows between adjacent slots within the same slab will not be caught by guard pages (canaries provide detection for those cases).

Implementation

Guard pages are implemented using platform memory protection primitives:

Linux: mprotect(addr, PAGE_SIZE, PROT_NONE) on the guard regions after mapping the full region with mmap.
The guard pages consume virtual address space but no physical memory (the kernel does not back PROT_NONE pages with RAM).

The overhead functions are defined in hardening::guard_pages:

#![allow(unused)]
fn main() {
// Per slab region: one guard page before + one after
pub const fn slab_guard_overhead() -> usize {
    PAGE_SIZE * 2  // 8192 bytes when enabled
}

// Per large allocation: one guard page before + one after
pub const fn large_guard_overhead() -> usize {
    PAGE_SIZE * 2
}
}

When the guard-pages feature is disabled, these functions return 0 and no guard pages are mapped.

What guard pages catch

Scenario	Detected?
Linear buffer overflow past end of large allocation	Yes -- hits rear guard page
Linear buffer underflow before large allocation	Yes -- hits front guard page
Overflow past the last slot in a slab	Yes -- hits rear guard page
Overflow between adjacent slots in same slab	No -- caught by canaries instead
Wild pointer write to an arbitrary address	Only if it happens to land on a guard page

Virtual memory cost

Guard pages consume virtual address space but not physical RAM. On 64-bit Linux, the virtual address space is 128 TiB, so the overhead is negligible. The per-region cost is:

Large allocations: +8 KiB virtual per allocation (2 pages).
Slab regions: +8 KiB virtual per slab (2 pages, amortized across all slots in the slab).

For a slab with 64 slots of 1024 bytes each (64 KiB data), the guard page overhead is 8 KiB / 64 KiB = 12.5% of virtual address space. For smaller size classes with more slots per slab, the overhead is proportionally lower.

Interaction with other features

Guard pages complement the other hardening features:

Canaries detect overflows within a slab (between adjacent slots) that guard pages cannot catch.
Poison filling detects use-after-free, which guard pages do not address.
Out-of-band metadata prevents corruption of allocator state, which guard pages alone cannot guarantee for within-slab overflows.

Together, these features provide comprehensive coverage: guard pages handle boundary overflows with hardware enforcement, canaries handle intra-slab overflows with software checks, and metadata isolation prevents allocator state corruption regardless of overflow direction.

ARM Memory Tagging (MTE)

On ARM64 processors with Memory Tagging Extension (ARMv8.5-A+), compatmalloc uses hardware memory tagging to replace several software hardening mechanisms with zero-cost hardware enforcement.

How it works

MTE assigns a 4-bit tag (values 1-15) to each 16-byte memory granule. Every pointer also carries a tag in its top byte. On every memory access, the CPU checks that the pointer tag matches the memory tag — a mismatch triggers a synchronous fault.

compatmalloc uses MTE as follows:

On malloc: the slot is tagged with a random hardware tag via the IRG (Insert Random Tag) instruction. The returned pointer carries this tag.
On free: the slot is re-tagged with a different random tag via tag_freed. Any dangling pointers still carrying the old tag will fault on access.

Runtime detection

MTE support is always compiled on aarch64 targets. At startup, compatmalloc checks for MTE hardware via getauxval(AT_HWCAP2) and enables it in synchronous mode via prctl(PR_SET_TAGGED_ADDR_CTRL). If MTE is not available, the allocator falls back to software hardening with no overhead from the detection check.

Slab backing memory is mapped with PROT_MTE when MTE is available to enable tag storage.

What MTE replaces

When MTE is active, the following software mechanisms are skipped:

Software mechanism	What it does	MTE equivalent
Canary write (malloc)	Fills gap bytes with checksum-derived pattern	Hardware tag covers the entire slot
Canary check (free)	Verifies gap bytes are uncorrupted	Tag mismatch faults on any out-of-bounds access
Poison fill (free)	Fills freed memory with 0xCD pattern	Re-tagging prevents access to freed memory
Zero-on-free	Zeros freed memory to prevent info leak	Re-tagging prevents reads of freed memory

The following mechanisms are kept with MTE because they are orthogonal:

Mechanism	Why it stays
Quarantine	Delays slot reuse; MTE re-tagging detects access, but quarantine makes exploitation harder even if the 1/15 tag collision occurs
Guard pages	Protects against large overflows at page boundaries; MTE operates at 16-byte granularity
Slot randomization	Reduces heap spray predictability; orthogonal to tag-based detection
Double-free detection	Atomic CAS flag (`try_mark_freed`) runs before any MTE operations; MTE is not involved
Metadata integrity check	Checksum verification on out-of-band metadata; independent of MTE

Coverage comparison

Threat	Software hardening	MTE
Heap buffer overflow	Canary detects on free	Faults immediately on access
Heap buffer underflow	Front canary detects on free	Faults immediately on access
Use-after-free read	Poison corrupts data; zero-on-free clears it	Faults immediately (freed memory re-tagged)
Use-after-free write	Poison check detects on quarantine eviction	Faults immediately
Double free	Atomic CAS flag aborts immediately	Atomic CAS flag aborts immediately (same mechanism)
Info leak (freed data)	Zero-on-free clears freed slots	Re-tagging prevents reads (data not cleared)

MTE provides strictly better detection timing for overflow, underflow, and use-after-free: faults occur at the moment of the invalid access rather than on the next free() or quarantine eviction.

Trade-offs

Probabilistic detection: MTE uses 15 possible tag values (4 bits, excluding tag 0). When a slot is freed and re-tagged, there is a 1/15 (~6.7%) chance the new tag matches the old tag, which would not detect a stale access. Software canaries are deterministic but only checked at free time.

No data clearing: MTE prevents access to freed memory but does not zero or poison the contents. If the 1/15 tag collision occurs, stale data could be read. Software zero-on-free eliminates this possibility entirely.

Hardware requirement: MTE requires ARMv8.5-A or later with OS kernel support. Compatible Linux platforms include AWS Graviton 3+ and Android devices with Pixel 8+ (or equivalent Armv9 SoCs). Apple Silicon has the hardware capability but macOS does not currently expose MTE to userspace. On hardware without MTE, the software fallback provides equivalent coverage at higher cost.

Performance impact

MTE eliminates the per-operation cost of canary writes, canary checks, poison fills, and zero-on-free. On MTE-capable hardware, this removes the dominant per-allocation overhead sources while maintaining equivalent or better security coverage.

Benchmarks

compatmalloc prioritizes security over raw performance. This page describes the performance characteristics, overhead sources, and how to run benchmarks to measure the impact on your workloads.

Latest CI Results (x86_64)

Auto-generated by CI on 2026-04-12 04:07 UTC from commit 7fc983c. Results are from GitHub Actions runners (shared infrastructure) and may vary between runs. Each allocator is run 3 times; the best (lowest latency) result is kept.

Multi-Allocator Comparison

Allocator	Weighted Overhead	Latency (64B)	Throughput 1T	Ratio	Throughput 4T	Ratio	Peak RSS
compatmalloc	+13.3900%	14.2 ns	66.58 Mops/s	.87x	147.25 Mops/s	.90x	15212 KB
glibc	0%	11.8 ns	76.26 Mops/s	1.00x	163.24 Mops/s	1.00x	10788 KB
jemalloc	+41.9800%	8.9 ns	102.97 Mops/s	1.35x	258.07 Mops/s	1.58x	41644 KB
mimalloc	+18.3900%	8.1 ns	81.18 Mops/s	1.06x	186.93 Mops/s	1.14x	22860 KB
passthrough	+75.1600%	21.5 ns	44.50 Mops/s	.58x	18.30 Mops/s	.11x	11008 KB
scudo	+297.4300%	49.6 ns	19.99 Mops/s	.26x	40.28 Mops/s	.24x	14588 KB

Ratio interpretation: Latency ratio < 1.0 = faster than glibc. Throughput ratio > 1.0 = faster than glibc.

Hardened allocators: compatmalloc, scudo. These have security features (guard pages, quarantine, etc.) that add overhead vs. pure-performance allocators.

Peak RSS measured via /usr/bin/time -v during a single benchmark run. Hardening features (quarantine, guard pages) increase memory usage.

malloc/free Latency by Size (glibc)

  size=      16:     12.6 ns
  size=      32:     11.9 ns
  size=      64:     11.8 ns
  size=     128:     11.8 ns
  size=     256:     11.8 ns
  size=     512:     12.0 ns
  size=    1024:     11.9 ns
  size=    4096:     24.7 ns
  size=   16384:     24.6 ns
  size=   65536:     25.0 ns
  size=  262144:     25.0 ns
  size=      16:     15.9 ns
  size=      64:     16.3 ns
  size=     256:     26.9 ns
  size=    1024:     29.0 ns
  size=    4096:     91.6 ns
  size=   65536:    758.6 ns

malloc/free Latency by Size (compatmalloc)

  size=      16:     14.9 ns
  size=      32:     14.2 ns
  size=      64:     14.2 ns
  size=     128:     14.3 ns
  size=     256:     14.3 ns
  size=     512:     14.8 ns
  size=    1024:     14.6 ns
  size=    4096:     14.4 ns
  size=   16384:     14.5 ns
  size=   65536:     24.5 ns
  size=  262144:     24.8 ns
  size=      16:     14.0 ns
  size=      64:     14.0 ns
  size=     256:     15.1 ns
  size=    1024:     19.9 ns
  size=    4096:     77.1 ns
  size=   65536:    751.2 ns

Multi-threaded Throughput (glibc)

  threads=1:  76.26 Mops/sec
  threads=2: 150.24 Mops/sec
  threads=4: 163.24 Mops/sec
  threads=8: 160.46 Mops/sec

Multi-threaded Throughput (compatmalloc)

  threads=1:  66.58 Mops/sec
  threads=2: 129.54 Mops/sec
  threads=4: 147.25 Mops/sec
  threads=8: 141.87 Mops/sec

Real-World Application Overhead

Application	glibc	compatmalloc	Overhead
python-json	.0673s	.0771s	14.00%
redis	2.6586s	2.4729s	-7.00%
nginx	5.1036s	5.1030s	-1.00%
sqlite	.1564s	.1326s	-16.00%
git	.5614s	.5505s	-2.00%

Application benchmarks measure wall-clock time for real programs (Python, Redis, nginx, SQLite, Git). Overhead = (compatmalloc_time / glibc_time - 1) * 100%.

ARM64 CI Results

Auto-generated by CI on 2026-04-12 04:07 UTC from commit 7fc983c. Runner architecture: aarch64 | Best-of-3 runs.

Multi-Allocator Comparison (ARM64)

Allocator	Weighted Overhead	Latency (64B)	Throughput 1T	Ratio	Throughput 4T	Ratio	Peak RSS
compatmalloc	+22.8100%	15.1 ns	67.19 Mops/s	.81x	255.62 Mops/s	.79x	12688 KB
glibc	0%	11.7 ns	82.40 Mops/s	1.00x	321.44 Mops/s	1.00x	10272 KB
jemalloc	+34.6200%	10.1 ns	97.38 Mops/s	1.18x	368.90 Mops/s	1.14x	12112 KB
mimalloc	+24.9400%	7.2 ns	80.88 Mops/s	.98x	311.97 Mops/s	.97x	9268 KB

Ratio interpretation: Latency ratio < 1.0 = faster than glibc. Throughput ratio > 1.0 = faster than glibc.

Peak RSS measured via /usr/bin/time -v during a single benchmark run. Hardening features (quarantine, guard pages) increase memory usage.

malloc/free Latency by Size - ARM64 (glibc)

  size=      16:     11.8 ns
  size=      32:     11.7 ns
  size=      64:     11.7 ns
  size=     128:     11.9 ns
  size=     256:     11.8 ns
  size=     512:     12.0 ns
  size=    1024:     12.1 ns
  size=    4096:     24.2 ns
  size=   16384:     24.5 ns
  size=   65536:     24.5 ns
  size=  262144:     25.1 ns
  size=      16:     17.6 ns
  size=      64:     18.0 ns
  size=     256:     27.9 ns
  size=    1024:     32.1 ns
  size=    4096:     51.9 ns
  size=   65536:    380.1 ns

malloc/free Latency by Size - ARM64 (compatmalloc)

  size=      16:     14.9 ns
  size=      32:     15.0 ns
  size=      64:     15.1 ns
  size=     128:     15.2 ns
  size=     256:     15.2 ns
  size=     512:     15.2 ns
  size=    1024:     15.2 ns
  size=    4096:     15.1 ns
  size=   16384:     15.5 ns
  size=   65536:     36.1 ns
  size=  262144:     36.1 ns
  size=      16:     16.8 ns
  size=      64:     17.1 ns
  size=     256:     19.6 ns
  size=    1024:     25.0 ns
  size=    4096:     46.1 ns
  size=   65536:    398.7 ns

Multi-threaded Throughput - ARM64 (glibc)

  threads=1:  82.40 Mops/sec
  threads=2: 163.38 Mops/sec
  threads=4: 321.44 Mops/sec
  threads=8: 308.48 Mops/sec

Multi-threaded Throughput - ARM64 (compatmalloc)

  threads=1:  67.19 Mops/sec
  threads=2: 131.87 Mops/sec
  threads=4: 255.62 Mops/sec
  threads=8: 248.84 Mops/sec

Performance characteristics

Expected overhead

Compared to glibc's ptmalloc2, compatmalloc adds overhead from several sources:

Source	Per-malloc cost	Per-free cost
Metadata table insert	Hash + linear probe + mutex	--
Metadata table lookup	--	Hash + linear probe + mutex
Canary write	`memset` of gap bytes	Canary check (byte comparison)
Poison fill	--	`memset` of allocation
Quarantine push/evict	--	Mutex + ring buffer enqueue
Zero-on-free	--	`memset` of allocation (on eviction)
Guard page setup	`mprotect` (large alloc only)	--

For small allocations (16-256 bytes), the dominant costs are the metadata table operations and the canary/poison fills. For large allocations, the mmap/munmap syscalls dominate regardless of hardening.

Size class efficiency

The slab allocator uses 4-per-doubling size classes, which means internal fragmentation is at most 25% for any allocation. Size classes range from 16 bytes to 16,384 bytes (36 classes total).

Arena contention

With the default arena count (one per CPU), contention is low for most workloads. Programs with many threads performing high-frequency allocations may benefit from explicitly setting COMPATMALLOC_ARENA_COUNT to a higher value.

Running benchmarks

Microbenchmark suite

The benchmark suite is a standalone binary that measures allocator performance via LD_PRELOAD:

# Build the library and benchmark
cargo build --release
rustc -O benches/src/micro.rs -o target/release/micro

# Run with glibc (baseline)
ALLOCATOR_NAME=glibc ./target/release/micro

# Run with compatmalloc
ALLOCATOR_NAME=compatmalloc \
  LD_PRELOAD=./target/release/libcompatmalloc.so \
  ./target/release/micro

Full comparison script

To compare against multiple allocators (glibc, jemalloc, mimalloc, scudo):

./benches/scripts/run_comparison.sh

Disabling hardening for comparison

To measure the overhead of hardening features, build with no features:

cargo build --release --no-default-features
ALLOCATOR_NAME=minimal \
  LD_PRELOAD=./target/release/libcompatmalloc.so \
  ./target/release/micro

LD_PRELOAD benchmarks with external programs

For realistic benchmarks, test with real applications:

# Time a build with and without compatmalloc
time cargo build --release

time LD_PRELOAD=./target/release/libcompatmalloc.so \
  cargo build --release

# Python workload
time python3 -c "
import json
data = [{'key': str(i), 'value': list(range(100))} for i in range(10000)]
result = json.dumps(data)
parsed = json.loads(result)
"

time LD_PRELOAD=./target/release/libcompatmalloc.so python3 -c "
import json
data = [{'key': str(i), 'value': list(range(100))} for i in range(10000)]
result = json.dumps(data)
parsed = json.loads(result)
"

Tuning for performance

If the overhead is too high for your use case, you can selectively disable features:

Configuration	Approximate overhead reduction
Disable `zero-on-free`	Removes one `memset` per free
Disable `poison-on-free`	Removes one `memset` per free (and disables write-after-free check)
Reduce quarantine size	Reduces memory pressure and eviction processing
Disable `guard-pages`	Removes `mprotect` calls and reduces virtual address space usage
Disable `canaries`	Removes canary write/check per alloc/free
`COMPATMALLOC_DISABLE=1`	Bypasses all hardening (passthrough to glibc)

Weighted composite overhead

The headline "Weighted Overhead" metric computes a single overhead percentage that accounts for real-world allocation size distributions. Instead of reporting only the 64-byte latency, we weight each allocation size by its frequency in typical programs (based on jemalloc/tcmalloc telemetry data):

Size	Weight	Rationale
16B	20%	Most common (tiny objects, pointers, small structs)
32B	15%	Second most common
64B	15%	Common for small structs, string headers
128B	12%	Medium-small objects
256B	10%	Strings, small buffers
512B	8%	Buffers
1K	5%	Page-ish allocations
4K	5%	Page-aligned allocations
16K	4%	Large buffers
64K	3%	Near mmap threshold
256K	3%	Very large allocations

Formula: overhead = (Σ weight_i × (alloc_latency_i / glibc_latency_i) − 1) × 100%

A weighted overhead of +15% means compatmalloc is 15% slower than glibc across a representative workload mix. Negative values indicate compatmalloc is faster.

Methodology notes

When benchmarking allocators, keep the following in mind:

Warm up the allocator. The first few allocations may be slower due to slab initialization and metadata table growth.
Test with realistic workloads. Microbenchmarks of malloc/free loops do not represent real application behavior.
Measure RSS, not just time. Hardening features (quarantine, guard pages) increase resident memory. Use getrusage or /proc/self/status to measure VmRSS.
Account for variance. Run benchmarks multiple times and report medians. Allocator performance can be sensitive to ASLR and system load.
Best-of-3 selection. CI results use the minimum latency and maximum throughput from 3 runs. This filters out noise from shared infrastructure while reflecting the allocator's true capability.
Compare against other allocators. The comparison table includes jemalloc and mimalloc (performance-focused) alongside scudo (hardened, like compatmalloc). This provides context for the overhead of hardening features.

CVE Case Studies

This section demonstrates how compatmalloc's hardening features detect and prevent exploitation techniques used in real-world CVEs affecting glibc's heap allocator.

Methodology

For each CVE, we provide:

A minimal proof-of-concept (C program in tests/cve/) that demonstrates the exploitation technique
Side-by-side output showing behavior under glibc vs. compatmalloc
Analysis of which hardening features provide protection and their limitations

Honest assessment

compatmalloc is an open-source project aiming to improve. Where our hardening has limitations, we document them. Our software canary checks detect overflows on free(), not at the moment of overflow. Our quarantine detects write-after-free on eviction, not at the moment of write. On ARM64 hardware with Memory Tagging Extension (MTE), these limitations are eliminated — overflows and use-after-free are detected immediately at the point of access. On hardware without MTE, the software mechanisms provide equivalent coverage with delayed detection.

Case studies

CVE	CVSS	Technique	compatmalloc detection
CVE-2024-2961	8.8	Buffer overflow / tcache poisoning	Canary check + out-of-band metadata
CVE-2023-6246	7.8	Heap buffer overflow / metadata corruption	Canary check + guard pages
Double-free	--	Double-free / tcache dup	Metadata FLAG_FREED check

Running the demos

# Build compatmalloc
cargo build --release

# Run all CVE demos
./tests/cve/run_demos.sh

# Run a specific demo
gcc -o /tmp/demo tests/cve/double_free.c
# With glibc:
/tmp/demo
# With compatmalloc:
LD_PRELOAD=./target/release/libcompatmalloc.so /tmp/demo

CVE-2024-2961: iconv Buffer Overflow

Vulnerability Summary

Field	Value
CVE	CVE-2024-2961
CVSS	8.8 (High)
Affected	glibc <= 2.39
Disclosed	2024-04-17
Type	Out-of-bounds write in `iconv()`

The iconv() function in glibc overflows the output buffer by 1-3 bytes when converting strings to the ISO-2022-CN-EXT character set. The overflow occurs because escape sequence writes for SS2 and SS3 designations lack bounds checks.

Exploitation Technique

In real-world exploitation (the CNEXT exploit chain against PHP), attackers use this 1-3 byte overflow to corrupt tcache forward pointers in adjacent freed heap chunks:

Groom the heap so a freed chunk sits immediately after the iconv output buffer
Trigger the iconv overflow to modify the low byte(s) of the tcache fd pointer
The corrupted pointer redirects a subsequent malloc() to an attacker-controlled address
Write to that address to overwrite __free_hook or a GOT entry
Trigger the hook to achieve remote code execution

This technique -- tcache poisoning via buffer overflow -- works because glibc stores freelist metadata (the fd pointer) inline within freed chunks, directly adjacent to user data.

Proof of Concept

Source: tests/cve/tcache_poison.c

The PoC demonstrates the exploitation technique (1-byte write past the requested allocation size) rather than calling iconv() directly. This keeps it simple and version-independent.

gcc -o /tmp/tcache_poison tests/cve/tcache_poison.c

glibc output

=== Tcache Poisoning via 1-Byte Overflow Demo ===
    (CVE-2024-2961 exploitation technique)

[1] chunk_a = malloc(50) => 0x...
[2] chunk_b = malloc(50) => 0x...
    distance: 64 bytes

[3] free(chunk_b) => chunk_b enters tcache
[4] chunk_b tcache fd = 0x...

[5] Simulating 1-byte overflow from chunk_a into chunk_b...
[6] free(chunk_a) => queued for deferred canary check
[7] Triggering batch flush (70 frees)...

[!] 1-byte overflow was NOT detected.

Under glibc, writing 1 byte past the requested 50-byte allocation lands within glibc's usable 56-byte region of the 64-byte chunk. No detection occurs. In the real CVE, larger overflows (1-3 bytes into adjacent chunks) corrupt the tcache fd pointer.

compatmalloc output

=== Tcache Poisoning via 1-Byte Overflow Demo ===
    (CVE-2024-2961 exploitation technique)

[1] chunk_a = malloc(50) => 0x...
[2] chunk_b = malloc(50) => 0x...
    distance: -46784 bytes

[3] free(chunk_b) => chunk_b enters tcache
[4] chunk_b tcache fd = 0x4242424242424242

[5] Simulating 1-byte overflow from chunk_a into chunk_b...
[6] free(chunk_a) => queued for deferred canary check
[7] Triggering batch flush (70 frees)...
compatmalloc: heap buffer overflow detected (canary corrupted)

compatmalloc aborts immediately when the canary check detects the overflow.

What compatmalloc catches

Two independent layers of defense apply:

Canary bytes. compatmalloc places canary bytes in the padding between the requested size and the slot size. For malloc(50) in a 64-byte slot, bytes [50..64) contain canary values. The 1-byte write at offset 50 corrupts the canary, which is detected on free().
Out-of-band metadata. Even without canaries, compatmalloc stores all freelist metadata in a separate mmap region -- not inline within freed chunks. There are no fd pointers adjacent to user data to corrupt. The fundamental prerequisite of tcache poisoning (corruptible inline metadata) does not exist.
Slot randomization. Allocations are not placed adjacently in predictable order, making heap grooming significantly harder.

What compatmalloc does NOT catch

The overflow is not detected at the moment it happens. Canary checks run on free() (specifically during deferred batch verification). If the overflowed buffer is never freed, detection is delayed.
compatmalloc does not fix the iconv() bug itself. It prevents the exploitation technique (tcache poisoning) from succeeding, but the overflow still occurs in iconv().
Intra-slot overflows between adjacent slots in the same slab are caught by canaries, not guard pages. Guard pages only protect slab boundaries.

References

CVE-2023-6246: syslog Heap Buffer Overflow

Vulnerability Summary

Field	Value
CVE	CVE-2023-6246
CVSS	7.8 (High)
Affected	glibc >= 2.36
Disclosed	2024-01-30
Type	Heap-based buffer overflow in `__vsyslog_internal()`

A heap-based buffer overflow in glibc's __vsyslog_internal() function, called by syslog() and vsyslog(). When openlog() is not called (or called with ident set to NULL) and the program name (argv[0]) exceeds 1024 bytes, the function overflows a heap buffer. Discovered by Qualys.

Exploitation Technique

The exploit targets su, a common SUID-root program:

Execute su with an extremely long argv[0] (> 1024 bytes)
PAM calls syslog() on authentication failure without calling openlog() first
__vsyslog_internal() copies the program name into a heap buffer without proper bounds checking
The overflow corrupts adjacent heap chunk metadata (size, flags, prev_size)
Subsequent heap operations trigger controlled writes via techniques like unsafe unlink
Achieve local privilege escalation to root

The key enabler: glibc stores malloc metadata (chunk headers) inline, directly adjacent to user data. An overflow from one allocation silently corrupts the metadata of the next chunk.

Proof of Concept

Source: tests/cve/heap_overflow.c

The PoC simulates the syslog overflow pattern: allocate a buffer and write past its end.

gcc -o /tmp/heap_overflow tests/cve/heap_overflow.c

glibc output

=== Heap Buffer Overflow Detection Demo ===
    (CVE-2023-6246 pattern)

[1] malloc(100) => 0x...
[2] memset(0x..., 'X', 120) => overflow by 20 bytes!
[3] free(0x...)    => queued for deferred canary check
[4] Triggering batch flush (70 frees)...

[!] Heap overflow was NOT detected on free().
    Under glibc, the adjacent chunk's metadata may be
    silently corrupted, enabling exploitation.

glibc does not detect the overflow. The 20 extra bytes silently overwrite whatever follows the allocation in memory.

compatmalloc output

=== Heap Buffer Overflow Detection Demo ===
    (CVE-2023-6246 pattern)

[1] malloc(100) => 0x...
[2] memset(0x..., 'X', 120) => overflow by 20 bytes!
[3] free(0x...)    => queued for deferred canary check
[4] Triggering batch flush (70 frees)...
compatmalloc: heap buffer overflow detected (canary corrupted)

compatmalloc detects the overflow when the canary bytes are checked during batch verification.

What compatmalloc catches

Canary bytes. For malloc(100), compatmalloc returns a 112-byte slot. Bytes [100..112) contain canary values derived from a cryptographic secret. The 20-byte overflow destroys these canaries. On free(), the canary check detects the corruption and aborts.
Out-of-band metadata. Even if the overflow extends past the slot boundary, it cannot corrupt allocator metadata because metadata is stored in a separate mmap region. The fundamental unsafe-unlink exploitation technique (corrupting inline chunk headers) is not possible.
Guard pages. For overflows that extend past the end of a slab region, guard pages (PROT_NONE) trigger a hardware fault (SIGSEGV). This provides immediate detection without waiting for free().

What compatmalloc does NOT catch

The overflow is detected on free(), not at the moment of the write. Between the overflow and the canary check, the program continues executing with corrupted memory. If exploitation completes before free() is called, the canary check may come too late.
compatmalloc does not fix the syslog() bug. Using compatmalloc as LD_PRELOAD prevents heap corruption from being exploitable, but the buffer overflow in __vsyslog_internal() still occurs.
Intra-slab overflows between adjacent slots in the same slab are detected by canaries, not guard pages. An overflow that exactly fills the canary gap and stops would not be detected (though this is difficult to achieve in practice without knowing the canary secret).

References

Double-Free Detection

Overview

Double-free is one of the most fundamental heap exploitation primitives, appearing across dozens of CVEs in components that rely on glibc's allocator. It has been observed in glibc's own regcomp(), in application-level parsers, and in many other components. Rather than tracking a single CVE, this page describes the general class of double-free vulnerabilities and how compatmalloc detects them.

Exploitation Technique: Tcache Dup

When a chunk is freed twice, it appears in the tcache freelist twice, creating a cycle:

tcache[64B]: chunk_A -> chunk_A -> chunk_A -> ...  (cycle)

Two subsequent malloc() calls of the same size return the same pointer:

char *a = malloc(64);  // returns chunk_A
char *b = malloc(64);  // returns chunk_A again!
// a == b -- both point to the same memory

This enables type confusion: the program believes a and b are separate allocations, but writes through one are visible through the other. An attacker can use this to overwrite function pointers, vtable entries, or other security-sensitive data.

glibc's mitigation history

glibc version	Detection mechanism	Bypassable?
< 2.29	None	N/A -- no detection at all
2.29+	tcache key (random value stored at offset 8 in freed chunk)	Yes -- the key is stored inline and can be overwritten by a heap write primitive
2.32+	PROTECT_PTR (pointer mangling via XOR with address)	Harder but still inline -- can be bypassed with an info leak

All of glibc's mitigations store detection data inline within the freed chunk's user data region. An attacker with any heap write capability can clear or forge these values before triggering the second free().

Proof of Concept

Source: tests/cve/double_free.c

gcc -o /tmp/double_free tests/cve/double_free.c

glibc output (>= 2.29)

=== Double-Free Detection Demo ===

[1] malloc(64)  => 0x...
[2] free(0x...)    => OK
[3] free(0x...)    => double free! (should be caught)
free(): double free detected in tcache 2

Modern glibc (>= 2.29) does detect this case via the tcache key. However, the key is stored inline at chunk + 8 and can be overwritten by an attacker with a write-after-free primitive before the second free().

compatmalloc output

=== Double-Free Detection Demo ===

[1] malloc(64)  => 0x...
[2] free(0x...)    => OK
[3] free(0x...)    => double free! (should be caught)
compatmalloc: double free detected

compatmalloc aborts immediately on the second free().

What compatmalloc catches

Out-of-band FLAG_FREED check. The metadata table stores a FLAG_FREED bit for every allocation in a separate mmap region. On every free():
- Look up the pointer in the metadata table
- If FLAG_FREED is already set, abort with "double free detected"
- Otherwise, set FLAG_FREED
Cannot be bypassed by heap writes. Because the metadata table is in a separate memory region (not adjacent to user data), an attacker cannot corrupt the FLAG_FREED bit via a buffer overflow or use-after-free write. This is the fundamental advantage over glibc's inline tcache key approach.
No version-dependent behavior. The detection works identically regardless of glibc version, allocation size, or tcache state. Every free() is checked, every time.

What compatmalloc does NOT catch

Aliased pointer double-frees. If a program has two pointers to the same allocation (e.g., a = malloc(64); b = a;) and frees both, compatmalloc detects this because it tracks the allocation address, not the pointer variable. Both free(a) and free(b) resolve to the same metadata entry.
Root cause identification. The abort happens at the second free() call, not at the point where the bug was introduced. For complex programs, the stack trace at the abort may not directly reveal why the double-free occurred.
Deliberate double-free patterns. Some (buggy) programs intentionally double-free and rely on glibc silently accepting it. These programs will abort under compatmalloc. This is by design -- double-free is always a bug.