Coding conventions
One of our major goals with OpenVMM is to provide a high quality coding experience for contributors, starting first-and-foremost by having a consistent set of coding conventions in the project.
Do your part and keep OpenVMM clean!
rustfmt
Checked Automatically: Yes (via cargo xtask fmt rustfmt
)
OpenVMM source must be formatted using rustfmt, which automatically and mechanically applies standard formatting to all the code. This eliminates time spent discussing or reviewing stylistic issues in pull requests.
The CI will run rustfmt --check
to enforce consistent formatting, and will
fail if it notices any discrepancies.
Unfortunately, rustfmt
isn't infinitely customizable, and there are several
rules that are must be manually enforced:
- All lines must end with LF, not CRLF.
- Top-level
use
imports should be non-nested.
Some of these manually-enforced conventions were introduced late into OpenVMM's development, and there may be chunks of the codebase that do not adhere to these conventions.
If you're working in a file and notice that it isn't following a certain convention, please take a moment to fix it!
Assuming you've followed the suggested dev env setup
and set up rust-analyzer
to format-on-save, you should rarely have to think
about formatting in .rs
files.
House Rules
Checked Automatically: Yes (via cargo xtask fmt house-rules
)
"House-rules" are a set of misc code lints that are specific to OpenVMM, which are enforced using a custom in-house tool:
- enforce the presence of the standard Microsoft copyright header
- enforce in-repo crate names don't use '-' in their name (use '_' instead)
- enforce Cargo.toml files don't include autogenerated "see more keys" comments
- enforce Cargo.toml files don't contain author or version fields
- enforce files end with a single trailing newline
- deny usage of
#[repr(packed)]
(you want#[repr(C, packed)]
) - justify usage of
cfg(target_arch = ...)
(useguest_arch
instead!) - justify usage of
allow(unsafe_code)
with an UNSAFETY comment
Some of these lints are self explanatory, whereas others are described in more detail elsewhere on this page.
Unused Cargo.toml
Dependencies
Checked Automatically: Yes (via cargo xtask fmt unused-deps
)
We have an in-repo fork of
cargo-machete
that ensures
Cargo.toml
files only include dependencies that are actually being used.
Avoiding unused dependencies makes it easier to reason about what a crate is doing just by looking at its dependencies, and also helps cut-down on incremental compile times.
Formatting (Cargo.toml
)
Checked Automatically: No
When defining dependencies in Cargo.toml
files, please organize dependencies
into the following groups:
- crate-specific "subcrates"
- crates under vm/
- crates under vm/vmcore/
- crates under support/
- external dependencies
The rationale here is that crates should be grouped according to how related to how widely applicable they are. i.e: crates from crates.io and support are widely applicable outside of OpenVMM, whereas crates under vmcore/ only make sense within the context of OpenVMM, and crate-specific subcrates are - by definition - only applicable to the crate they are being imported from.
Additionally, we make use of the
workspace dependencies
feature to ensure that all our dependencies stay in sync. This requires defining
dependencies in your crate's Cargo.toml
file and in the project's root Cargo.toml
.
So, for example:
[package]
name = "openvmm"
[dependencies]
# crate-specific subcrates
openvmm_core.workspace = true
...
# /vmcore
vmcore.workspace = true
...
# /vm/devices
firmware_uefi_custom_vars.workspace = true
storvsp.workspace = true
...
# /support
guid.workspace = true
inspect.workspace = true
inspect_proto.workspace = true
...
# external dependencies
anyhow.workspace = true
cfg-if.workspace = true
clap.workspace = true
...
Linting (via clippy
)
Checked Automatically: Yes
OpenVMM uses cargo clippy
to
supplement rustc's built-in lints.
Assuming you've followed the guide and set up rust-analyzer
to
use clippy, you should see clippy lints
appear inline when working on Rust code.
The CI runs cargo clippy
on every crate in the repo prior to building the
project, and will fast-fail if it catches any warnings / errors.
Suppressing Lints
In general, lints should be fixed by modifying the code to satisfy the lint.
However, there are cases where a lint may need to be allow
'd inline.
In these cases, you must provide a inline comment providing reasonable justification for the suppressed lint.
e.g:
#![allow(unused)] fn main() { // x86_64-unknown-linux-musl targets have a different type defn for // `libc::cmsghdr`, hence why these lints are being suppressed. #[allow(clippy::needless_update, clippy::useless_conversion)] libc::cmsghdr { cmsg_level: libc::SOL_SOCKET, cmsg_type: libc::SCM_RIGHTS, cmsg_len: (size_of::<libc::cmsghdr>() + size_of_val(fds)) .try_into() .unwrap(), ..std::mem::zeroed() } }
OpenVMM's clippy
Configuration
We stick fairly close to the default set of rustc / clippy lints, though there are some default lints that we've decided to disabled project wide, and other non-default lints which we've explicitly opted into.
See the [workspace.lints]
sections of OpenVMM's root
Cargo.toml
for a list of globally enabled/disabled lints, along with justification as to
why certain lints have been enabled/disabled.
Unsafe Code Policy
When possible, try to avoid introducing new unsafe
code!
Before rolling your own unsafe
code, check to see if a safe abstraction
already exists, either in-tree, on crates.io*, or in the standard library.
*subject to an unsafe-code audit
Rather than synthesizing our own unsafe code conventions, we follow the guidelines outlined in the following two resources:
In a nutshell:
unsafe
functions are required to include/// # Safety
documentation describing any preconditions the caller must uphold when calling the function.unsafe {}
blocks are required to include a// SAFETY:
comment describing how the preconditions for calling theunsafe
function(s) within the block are being satisfied.allow(unsafe_code)
annotations are required to include an// UNSAFETY:
comment justifying why the code in question needs to useunsafe
. This annotation must be placed at the module or crate level.
These requirements are enforced by CI, and will cause the build to fail if required documentation is missing.
Editing a file containing unsafe code will trigger CI to automatically add the OpenVMM Unsafe Approvers group to your PR. This is to ensure that all unsafe code is audited for correctness by area experts.
Uses of cfg(target_arch = ...)
must be justified
Checked Automatically: Yes (via cargo xtask fmt house-rules
)
Unless you're working on something that's genuinely tied to the host's CPU
architecture, you should use cfg(guest_arch = ...)
instead of cfg(target_arch = ...)
.
OpenVMM is a multi-architecture VMM framework, capable of running running x64 guests on a x64 host, as well as Aarch64 guests on Aarch64 hosts (with the potential of adding additional platforms in the future).
At the moment, OpenVMM requires that the host architecture and guest architecture match. That said, it's possible that at some point in the future, OpenVMM may also support mismatched guest/host architectures, via an emulated CPU virtualization backend (akin to QEMU).
Having an emulated CPU backend would enable OpenVMM to support such useful scenarios as:
- running Arch64 guests on x86 machines
- running x86 guests on ARM
- running something exotic (e.g: RISC-V) on x86/ARM machines
- ...assuming we had the bandwidth to implement + maintain something like that
- running OpenVMM on systems without hardware-accelerated CPU virtualization enabled
With these scenarios in mind, it would be short-sighted to rely entirely on
cfg(target_arch = ...)
to gate guest-facing arch-specific functionality, as it
would inexorably tie the guest's arch to the host's arch, making any future
initiatives to pry the two apart significantly more difficult!
As such, the OpenVMM
repo includes infrastructure to specify a custom,
OpenVMM-specific cfg(guest_arch = ...)
cfg parameter.
By default, cfg(guest_arch = ...)
will act the same way as cfg(target_arch = ...)
, but it can be swapped to a different architecture by setting the
OPENVMM_GUEST_TARGET
env var at compile-time.
There are very few reasons to use cfg(target_arch = ...)
within the OpenVMM
repo, and to enforce this rule, we have an in-house xtask fmt house-rules
check that lints each use of cfg(target_arch = ...)
to include a
"justification" for why it's being used.
e.g: cfg(target_arch = ...)
would be applicable when feature-gating a CPU
intrinsic (such as CPUID, or a SIMD instruction), or when implementing a
*-sys
crate where the underlying C API/ABI varies between architectures.
...otherwise, use cfg(guest_arch = ...)
!
Avoid Default
when using zerocopy::FromZeroes
Checked Automatically: No
The rule:
- A type can
derive(Default)
XORderive(FromZeroes)
. - A type that is
FromZeroes
can alsoimpl Default
, but it must be a conscious, explicit choice, with justification (read: inline comment) as to why that particular default value was chosen.
The why:
- The all-zero type is often not a semantically valid
Default
value for a type - There are plenty of types that don't have a
Default
value, but do have a valid all-zero repr
Additional context
As per the Rust docs for Default::default
:
fn default() -> Self
Returns the “default value” for a type.
Default values are often some kind of initial value, identity value, or anything else that may make sense as a default.
Notably, default should not be used as some shorthand to "zero initialize"
values! For most non-trivial structs, the all-zero representation is not a
semantically valid Default
!
This is true in many contexts... but one that's particularly relevant in OpenVMM is that of FFI via C-style APIs and ABIs.
In C, it's very common for types to undergo multi-stage instantiation, where they are initially allocated as all-zero, and then get "filled in" by some secondary init code. Notably: it's quite rare for that initial "all-zero" struct to be a valid instance of the type!
Example: C FFI
For example, a common pattern in C libraries might look something like:
struct Handle {
uint16_t opaque_handle;
}
struct Handle handle = {0};
init_handle(&handle);
update_handle(handle, options);
do_thing(handle);
In this example: it would be an error to invoke update_handle
or do_thing
with an all-zero handle, as the type hadn't finished being fully initialized.
If we wanted to use this library from Rust, a "naive" approach would be to do something like:
#![allow(unused)] fn main() { #[repr(C)] #[derive(Default)] struct Handle { opaque_handle: u16, } let mut handle = Handle::default(); // BAD! unsafe { init_handle(&handle) }; unsafe { update_handle(handle, options) }; unsafe { do_thing(handle) }; }
While this technically works... using Default
here is kinda bogus!
After all - Handle::default()
doesn't actually call init_handle
, which means
the value returned by default()
doesn't match the "promise" of the trait!
Namely: the returned value is not semantically valid yet!
In many other Rust codebase, this "overloading" of Default
to represent both
semantically valid value and all-zero values (in FFI) is par for the course,
as once structs get more complicated, having a derive
that is able to fully
init a "uninitialized" struct in-memory is quite handy...
In OpenVMM, we don't do this. Instead, we use a separate trait to init all-zero structs.
In OpenVMM, we use FromZeroes
and FromZeroes::new_zeroed()
to work with types
that have valid all-zero representations, without implying that those types
also have valid all-zero default values!
So, for the example above:
#![allow(unused)] fn main() { #[repr(C)] #[derive(zerocopy::FromZeroes)] struct Handle { opaque_handle: u16 } let mut handle = Handle::new_zeroed(); // GOOD! unsafe { init_handle(&handle) }; }
Now, it's impossible for code elsewhere to obtain a Handle
via
Handle::default
, and mistakenly forget to invoke init_handle
on it.
...but if it so happens that we do want a Default
impl for Handle
, we can
do so by manually implementing derive(Default)
ourselves:
#![allow(unused)] fn main() { // Default + FromZeroes: `default` returns fully initialized handle impl Default for Handle { fn default() -> Handle { let mut handle = Handle::new_zeroed(); unsafe { init_handle(&handle) }; handle } } }
Avoid Requiring Debug
on Traits
Checked Automatically: No
TL;DR: Don't do this:
#![allow(unused)] fn main() { trait MyTrait: std::fmt::Debug }
Implementations of the standard library's Debug
trait can be surprisingly large,
and the final binary size of OpenHCL and related binaries is a major concern
for us. Unused implementations of this trait are usually removed during the
optimization process (like all dead code), making this a non-issue. However when
traits, and more specifically trait objects, are involved, the compiler has a
much more difficult time proving that implementations are unused. This can
result in large amounts of functionally dead code ending up in the final
binaries.
If you need to implement Debug
for a struct containing such a trait object, you
will need to do so manually, so that you can skip over that field.
Moreover, it's usually not good form to leave tracing statements that log a
struct's Debug
representation in production. Prefer tracing just the fields
you're interested in, and/or connecting objects to the Inspect
graph.
Crate Naming
Checked Automatically: Yes (via cargo xtask fmt house-rules
)
Crates must be named with underscores, not dashes and underscores used in folder names.
- Bad:
my-cool-crate
- Good:
my_cool_crate
Rust does not allow dashes in imports, with any dashes getting replaced with underscores when used in the code. Avoiding dashes altogether makes it easier to grep for crate names, and makes things more consistent across the repo.
This convention is enforced by CI
Do not name crates with the words "base, util, common" or other terms that are overly general.
For example, consider a crate that provides a common data structure used by multiple devices:
- Bad:
devices_common
- Good:
range_map
Libraries that contain the following eventually become a mishmash of unrelated functionality that is located there for convenience. This blog post goes more in-depth as to why.
Instead, name things based on what they logically provide, like functionality or data types.