Memory Layout

OpenVMM has to decide where every byte of guest physical address space goes: RAM, MMIO windows for emulated and PCIe devices, paravisor private memory, and architectural ranges like the LAPIC or GIC. This page describes how those decisions are made.

Compatibility surface

Guest physical addresses are part of the VM's compatibility contract. Guests remember device and RAM locations across hibernation, and saved VM state references them. Changing request order, placement class, or alignment can move guest addresses and break resume on existing VMs.

Treat layout policy changes like VM ABI changes: a new default may be fine for new VMs, but existing persisted configuration must continue to resolve to the same guest physical addresses.

Two pieces

Layout resolution is split into two pieces that you should think about separately:

A pure address-space allocator in vm_topology::layout. It knows nothing about chipsets, firmware, VTLs, PCI, or the host. Callers describe what they need in terms of ranges, sizes, alignments, and a placement class, and the allocator returns deterministic guest physical addresses.
A worker resolver in openvmm_core::worker::memory_layout. This is where OpenVMM's policy lives: which platform ranges are pinned, what alignments NUMA nodes get, how PCIe ECAM is sized, and so on. The resolver describes the VM to the allocator, runs it, and builds the resulting MemoryLayout that the rest of the VM worker uses to look up RAM, MMIO, PCI ECAM, and PCI MMIO ranges.

Keeping the allocator policy-free means its behavior can be exhaustively tested in isolation, and the worker can be reasoned about as a list of requests that fully describes the VM.

The allocator

LayoutBuilder accepts four kinds of input:

Input	Purpose
`reserve(range)`	Block allocation at this address but do not include it in the layout top.
`fixed(range)`	A range whose address is already decided. Blocks allocation and counts as part of the layout.
`ram(size, alignment)`	Ordinary guest RAM. The only request type that may be split across multiple extents.
`request(size, alignment, placement)`	A single contiguous range, placed dynamically. The `placement` chooses one of three phases below.

reserve and fixed differ only in how they affect the layout top — the address one past the highest guest-visible byte. fixed ranges raise it; reserve ranges do not. This matters because the layout top determines where post-MMIO requests (such as paravisor private memory) start: a reserved hole high up in the address space should not push them even higher.

When allocate() runs, it processes requests in a fixed phase order. Each phase pulls from whatever address space the earlier phases left free:

Reserved ranges are removed from the free space.
Fixed ranges are removed from the free space.
Placement::Mmio32 requests are packed top down below 4 GiB, so RAM can start at GPA 0 and grow upward through the lowest free space.
RAM requests are placed bottom up, in caller order, splitting around any holes left by the earlier phases. The first request starts at GPA 0; each subsequent request starts at or above the highest address used by previous RAM requests, so later requests never backfill fragments earlier ones skipped. RAM is the only splittable kind.
Placement::Mmio64 requests are packed bottom up starting at the end of RAM. This makes the layout top a function of requested topology rather than a precomputed high MMIO bucket size.
Placement::PostMmio requests are placed after everything else (excluding reserved ranges from the "everything else"). They are for ranges that should not affect the guest-visible top of memory.

Within Mmio32 and Mmio64, requests are sorted by alignment (largest first), then size (largest first), then caller order. This keeps large, strictly-aligned device windows from being fragmented by small devices. RAM and PostMmio use caller order verbatim: RAM order is the NUMA vnode assignment, and PostMmio carries policy that should not be reordered by alignment.

Note

The allocator does not take host physical-address width as an input. The layout is computed as a pure function of VM configuration; the worker checks the resulting layout top against host capabilities after the fact. This keeps guest physical addresses from shifting when the same VM moves to a host with a different physical-address width.

Worker policy

The worker resolver in openvmm_core::worker::memory_layout issues requests in this order:

Low-RAM reserve (reserve) — when a nonzero ram_start_address is requested, the range 0..ram_start_address is reserved so RAM starts above it. This is used on aarch64 Linux direct boot to skip the low GPA region that conflicts with the host MSI-doorbell IOVA that iommufd reserves. Being a reserve (not fixed), it blocks allocation without raising the layout top.
Chipset low MMIO (fixed) — a window pinned to end at 4 GiB, advertised to firmware as \_SB.VMOD._CRS. The window always covers at least the per-architecture reserved zone (LAPIC, IOAPIC, GIC, PL011, battery, TPM, etc.) so guests can arbitrate fixed-address children against this window. The caller-requested size may extend it lower.

Architecture Minimum range (architectural reserved zone)

x86_64 0xFE00_0000..0x1_0000_0000

aarch64 0xEF00_0000..0x1_0000_0000
Chipset high MMIO (Mmio64) — the corresponding high range. 2 MB alignment.
PCIe ECAM (Mmio64), one block per PCI segment. Root complexes that share a segment are grouped so the segment gets a single contiguous ECAM block (keeping the MCFG bus-0 base consistent for every RC in the segment), which is then subdivided into per-RC sub-ranges. The block size is derived from the segment's combined bus window as (max_bus - min_bus + 1) * 1 MB (32 devices × 8 functions × 4 KiB per config space). ECAM is placed above 4 GiB so the low MMIO window stays free for devices that require 32-bit addressing. Every ECAM range must start at or above a fixed 256 MiB minimum: the ACPI MCFG table reports the bus-0 base as ecam_start - start_bus * 1 MiB, and since start_bus is a u8 up to 255 MiB of headroom may be required. A resolved ECAM below this minimum fails VM construction rather than producing an unrepresentable MCFG entry.
PCIe per-root-complex ranges, one set per root complex:
- Low MMIO (Mmio32), 2 MB aligned. A caller can pin this to a fixed range instead of supplying a size, for assigned-device, IOMMU, and physical-topology passthrough.
- High MMIO (Mmio64), 1 GB aligned. A caller can pin this to a fixed range as well. Per-BAR alignment would guarantee the entire window is usable for one large BAR, but burns address space on hosts with tight physical-address widths.
- CXL HDM and CHBCR (Mmio64) — when a root complex is CXL-enabled, a Host-managed Device Memory window and a Host Bridge Component Registers window are placed, each aligned to its CXL-spec size.
Virtio-mmio slots (Mmio32) — one contiguous region sized slot_count * 4 KiB, when any slots are configured.
IOMMU MMIO (Mmio32) — a VM has at most one IOMMU type; one region is placed per configured instance, sized by type: SMMUv3 128 KiB, AMD-Vi 16 KiB, Intel VT-d 4 KiB.
RAM, in vnode order. The first request becomes vnode 0, the second vnode 1, and so on. Each vnode starts at or above the highest address used by prior vnodes; vnode N+1 never backfills a fragment that vnode N skipped. This keeps vnode ordering equal to address ordering and turns vnode layout into a clean compatibility surface — adding a new fixed or reserved range below RAM end can only shift the first vnode whose own span actually covers it. Alignment depends on request size:

RAM request size Alignment

< 1 GB 2 MB

≥ 1 GB 1 GB

Alignment matters because RAM extents that start on a huge-page boundary can be mapped with 2 MB or 1 GB huge pages in host and guest page tables, avoiding the memory overhead and construction cost of thousands of smaller page table entries and reducing TLB pressure at runtime. Sub-GB nodes use 2 MB so small NUMA nodes do not waste a full GB of address space.
VTL2 chipset MMIO (PostMmio) — VTL2's own VMBus / chipset MMIO region, when VTL2 is configured. Placed after VTL0 so enabling VTL2 does not move any VTL0 address.
VTL2 private memory (PostMmio) — when the IGVM file requests layout-mode VTL2 memory, the worker takes only its size and alignment from the IGVM relocation header. The IGVM file's relocation min/max bounds are not fed in as constraints here; they are validated later by the IGVM loader against the selected base. Treating them as constraints here would over-constrain layout and could put holes in VTL0 just to accommodate an IGVM file we will reject anyway.
VTL2 framebuffer (PostMmio) — a page-aligned region for VTL2 graphics, when configured. Its resolved GPA base is returned separately for the framebuffer device.

Architecture	Minimum range (architectural reserved zone)
x86_64	`0xFE00_0000..0x1_0000_0000`
aarch64	`0xEF00_0000..0x1_0000_0000`

RAM request size	Alignment
< 1 GB	2 MB
≥ 1 GB	1 GB

After allocate() succeeds, the worker collects the resolved ranges into the MemoryLayout's MMIO, PCI ECAM, and PCI MMIO gap vectors, then checks the highest placed-range address (which includes VTL2 private memory and VTL2 chipset MMIO) against the host's physical-address width.

RAM splitting

RAM is the only splittable request. When contiguous free space is available, the full requested size is placed at an aligned start address — alignment constrains where the extent starts, not how large it is. A 1.5 GB request with 1 GB alignment in open space produces a single [0, 1.5 GB) extent with no wasted space.

Splitting only happens when a fixed or reserved range interrupts the free space. In that case the alignment also acts as the split granularity: partial chunks are rounded down to the alignment before continuing. This keeps every RAM extent on a huge-page boundary so the host and guest can use large pages (reducing page table overhead and TLB pressure), and avoids sub-alignment fragments that would complicate the NUMA and compatibility surface.

The practical effect is that 1 GB-aligned RAM stays in 1 GB-aligned chunks. A small fixed hole just above the 1 GB boundary will not cause a "nearly 1 GB" RAM extent to be placed in the interrupted range; instead, RAM resumes at the next 1 GB boundary.

Examples

These examples use compact synthetic configurations. Each one is covered by tests in vm_topology::layout or openvmm_core::worker::memory_layout.

A fixed MMIO range splits RAM

4 GB of RAM with a 1 GB fixed MMIO range from 1 GB to 2 GB:

Input	Range
RAM request	4 GB
Fixed MMIO	`0x4000_0000..0x8000_0000`

Output	Range
RAM	`0x0000_0000..0x4000_0000`
MMIO	`0x4000_0000..0x8000_0000`
RAM	`0x8000_0000..0x1_4000_0000`

Total RAM is still 4 GB — the fixed range is occupied address space, not RAM.

GB-aligned RAM stays GB-aligned

2 GB of RAM with a tiny fixed hole just above the 1 GB boundary should not produce a sub-GB RAM fragment:

Input	Range
RAM request	2 GB, 1 GB alignment
Fixed MMIO	`0x4010_0000..0x4020_0000`

Output	Range
RAM	`0x0000_0000..0x4000_0000`
Fixed MMIO	`0x4010_0000..0x4020_0000`
RAM	`0x8000_0000..0xC000_0000`

The splitter places one full 1 GB chunk, refuses to use the interrupted sub-GB fragment, and resumes at the next 1 GB boundary.

Small NUMA nodes use 2 MB alignment

Two 512 MB NUMA nodes:

Input	Size
vnode 0 RAM	512 MB
vnode 1 RAM	512 MB

Output	Range
vnode 0 RAM	`0x0000_0000..0x2000_0000`
vnode 1 RAM	`0x2000_0000..0x4000_0000`

With 1 GB alignment each node would burn a full GB of address space. Request order is the vnode assignment, so swapping the requests swaps the NUMA layout.

VTL2 does not move VTL0

Starting from 2 GB of VTL0 RAM and a fixed 1 GB MMIO hole:

VTL0 output	Range
RAM	`0x0000_0000..0x4000_0000`
MMIO	`0x4000_0000..0x8000_0000`
RAM	`0x8000_0000..0xC000_0000`

Adding a 2 MB VTL2 private-memory request leaves the VTL0 layout identical and places VTL2 after the VTL0-visible top:

Private output	Range
VTL2	`0xC000_0000..0xC020_0000`

MemoryLayout::end_of_layout() reports the top of all stored ranges, including VTL2 chipset MMIO when present. MemoryLayout::vtl2_range() reports the VTL2 private memory range separately.

Reserved holes do not raise the layout top

A reserved range blocks allocation but is not a guest-visible resource, so it does not push later post-MMIO ranges higher:

Input	Range
RAM request	2 GB
Reserved hole	`0xFD_0000_0000..0xFD_4000_0000`
Post-MMIO request	1 MB

Output	Range
RAM	`0x0000_0000..0x8000_0000`
Post-MMIO	`0x8000_0000..0x8010_0000`

Trailing reserved ranges are omitted from the returned allocation list, but a reserved range that sits between real allocations is reported so callers can see the full occupied map.

When to update this page

Update this page when any of these change:

the allocator's phase order or any phase's placement direction
the semantics of reserve, fixed, ram, or request
the architectural reserved zones or their per-architecture addresses
the worker's RAM alignment policy or the ram_start_address reserve
PCIe ECAM sizing, segment grouping, the ECAM minimum address, or per-BAR alignment policy
CXL HDM/CHBCR or IOMMU MMIO placement
VTL2 chipset MMIO, VTL2 private-memory, or VTL2 framebuffer placement
the host physical-address validation step
MemoryLayout::end_of_layout() or MemoryLayout::vtl2_range() semantics

The OpenVMM Guide