BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275
Virtual Machine Privilege Level (VMPL) feature in the SEV-SNP
architecture allows a guest VM to divide its address space into four
levels. The level can be used to provide the hardware isolated
abstraction layers with a VM. The VMPL0 is the highest privilege, and
VMPL3 is the least privilege. Certain operations must be done by the
VMPL0 software, such as:
* Validate or invalidate memory range (PVALIDATE instruction)
* Allocate VMSA page (RMPADJUST instruction when VMSA=1)
The initial SEV-SNP support assumes that the guest is running on VMPL0.
Let's add function in the MemEncryptSevLib that can be used for checking
whether guest is booted under the VMPL0.
Cc: Michael Roth <michael.roth@amd.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275
Many of the integrity guarantees of SEV-SNP are enforced through the
Reverse Map Table (RMP). Each RMP entry contains the GPA at which a
particular page of DRAM should be mapped. The guest can request the
hypervisor to add pages in the RMP table via the Page State Change VMGEXIT
defined in the GHCB specification section 2.5.1 and 4.1.6. Inside each RMP
entry is a Validated flag; this flag is automatically cleared to 0 by the
CPU hardware when a new RMP entry is created for a guest. Each VM page
can be either validated or invalidated, as indicated by the Validated
flag in the RMP entry. Memory access to a private page that is not
validated generates a #VC. A VM can use the PVALIDATE instruction to
validate the private page before using it.
During the guest creation, the boot ROM memory is pre-validated by the
AMD-SEV firmware. The MemEncryptSevSnpValidateSystemRam() can be called
during the SEC and PEI phase to validate the detected system RAM.
One of the fields in the Page State Change NAE is the RMP page size. The
page size input parameter indicates that either a 4KB or 2MB page should
be used while adding the RMP entry. During the validation, when possible,
the MemEncryptSevSnpValidateSystemRam() will use the 2MB entry. A
hypervisor backing the memory may choose to use the different page size
in the RMP entry. In those cases, the PVALIDATE instruction should return
SIZEMISMATCH. If a SIZEMISMATCH is detected, then validate all 512-pages
constituting a 2MB region.
Upon completion, the PVALIDATE instruction sets the rFLAGS.CF to 0 if
instruction changed the RMP entry and to 1 if the instruction did not
change the RMP entry. The rFlags.CF will be 1 only when a memory region
is already validated. We should not double validate a memory
as it could lead to a security compromise. If double validation is
detected, terminate the boot.
Cc: Michael Roth <michael.roth@amd.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jiewen Yao <Jiewen.yao@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275
Commit 85b8eac59b added support to ensure
that MMIO is only performed against the un-encrypted memory. If MMIO
is performed against encrypted memory, a #GP is raised.
The AmdSevDxe uses the functions provided by the MemEncryptSevLib to
clear the memory encryption mask from the page table. If the
MemEncryptSevLib is extended to include VmgExitLib then depedency
chain will look like this:
OvmfPkg/AmdSevDxe/AmdSevDxe.inf
-----> MemEncryptSevLib class
-----> "OvmfPkg/BaseMemEncryptSevLib/DxeMemEncryptSevLib.inf" instance
-----> VmgExitLib class
-----> "OvmfPkg/VmgExitLib" instance
-----> LocalApicLib class
-----> "UefiCpuPkg/BaseXApicX2ApicLib/BaseXApicX2ApicLib.inf" instance
-----> TimerLib class
-----> "OvmfPkg/AcpiTimerLib/DxeAcpiTimerLib.inf" instance
-----> PciLib class
-----> "OvmfPkg/DxePciLibI440FxQ35/DxePciLibI440FxQ35.inf" instance
-----> PciExpressLib class
-----> "MdePkg/BasePciExpressLib/BasePciExpressLib.inf" instance
The LocalApicLib provides a constructor that gets called before the
AmdSevDxe can clear the memory encryption mask from the MMIO regions.
When running under the Q35 machine type, the call chain looks like this:
AcpiTimerLibConstructor () [AcpiTimerLib]
PciRead32 () [DxePciLibI440FxQ35]
PciExpressRead32 () [PciExpressLib]
The PciExpressRead32 () reads the MMIO region. The MMIO regions are not
yet mapped un-encrypted, so the check introduced in the commit
85b8eac59b raises a #GP.
The AmdSevDxe driver does not require the access to the extended PCI
config space. Accessing a normal PCI config space, via IO port should be
sufficent. Use the module-scope override to make the AmdSevDxe use the
BasePciLib instead of BasePciExpressLib so that PciRead32 () uses the
IO ports instead of the extended config space.
Cc: Michael Roth <michael.roth@amd.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jiewen Yao <Jiewen.yao@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
SEV-SNP firmware allows a special guest page to be populated with
guest CPUID values so that they can be validated against supported
host features before being loaded into encrypted guest memory to be
used instead of hypervisor-provided values [1].
Add handling for this in the CPUID #VC handler and use it whenever
SEV-SNP is enabled. To do so, existing CPUID handling via VmgExit is
moved to a helper, GetCpuidHyp(), and a new helper that uses the CPUID
page to do the lookup, GetCpuidFw(), is used instead when SNP is
enabled. For cases where SNP CPUID lookups still rely on fetching
specific CPUID fields from hypervisor, GetCpuidHyp() is used there as
well.
[1]: SEV SNP Firmware ABI Specification, Rev. 0.8, 8.13.2.6
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jiewen Yao <Jiewen.yao@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275
An SEV-SNP guest requires that private memory (aka pages mapped encrypted)
must be validated before being accessed.
The validation process consist of the following sequence:
1) Set the memory encryption attribute in the page table (aka C-bit).
Note: If the processor is in non-PAE mode, then all the memory accesses
are considered private.
2) Add the memory range as private in the RMP table. This can be performed
using the Page State Change VMGEXIT defined in the GHCB specification.
3) Use the PVALIDATE instruction to set the Validated Bit in the RMP table.
During the guest creation time, the VMM encrypts the OVMF_CODE.fd using
the SEV-SNP firmware provided LAUNCH_UPDATE_DATA command. In addition to
encrypting the content, the command also validates the memory region.
This allows us to execute the code without going through the validation
sequence.
During execution, the reset vector need to access some data pages
(such as page tables, SevESWorkarea, Sec stack). The data pages are
accessed as private memory. The data pages are not part of the
OVMF_CODE.fd, so they were not validated during the guest creation.
There are two approaches we can take to validate the data pages before
the access:
a) Enhance the OVMF reset vector code to validate the pages as described
above (go through step 2 - 3).
OR
b) Validate the pages during the guest creation time. The SEV firmware
provides a command which can be used by the VMM to validate the pages
without affecting the measurement of the launch.
Approach #b seems much simpler; it does not require any changes to the
OVMF reset vector code.
Update the OVMF metadata with the list of regions that must be
pre-validated by the VMM before the boot.
Cc: Michael Roth <michael.roth@amd.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
RFC: https://bugzilla.tianocore.org/show_bug.cgi?id=3429
Intel's Trust Domain Extensions (Intel TDX) refers to an Intel technology
that extends Virtual Machines Extensions (VMX) and Multi-Key Total Memory
Encryption (MKTME) with a new kind of virutal machines guest called a
Trust Domain (TD). A TD is desinged to run in a CPU mode that protects the
confidentiality of TD memory contents and the TD's CPU state from other
software, including the hosting Virtual-Machine Monitor (VMM), unless
explicitly shared by the TD itself.
Note: Intel TDX is only available on X64, so the Tdx related changes are
in X64 path. In IA32 path, there may be null stub to make the build
success.
This patch includes below major changes.
1. Ia32/IntelTdx.asm
IntelTdx.asm includes below routines used in ResetVector
- IsTdx
Check if the running system is Tdx guest.
- InitTdxWorkarea
It initialize the TDX_WORK_AREA. Because it is called by both BSP and
APs and to avoid the race condition, only BSP can initialize the
WORK_AREA. AP will wait until the field of TDX_WORK_AREA_PGTBL_READY
is set.
- ReloadFlat32
After reset all CPUs in TDX are initialized to 32-bit protected mode.
But GDT register is not set. So this routine loads the GDT then jump
to Flat 32 protected mode again.
- InitTdx
This routine wrap above 3 routines together to do Tdx initialization
in ResetVector phase.
- IsTdxEnabled
It is a OneTimeCall to probe if TDX is enabled by checking the
CC_WORK_AREA.
- CheckTdxFeaturesBeforeBuildPagetables
This routine is called to check if it is Non-TDX guest, TDX-Bsp or
TDX-APs. Because in TDX guest all the initialization is done by BSP
(including the page tables). APs should not build the tables.
- TdxPostBuildPageTables
It is called after Page Tables are built by BSP.
byte[TDX_WORK_AREA_PGTBL_READY] is set by BSP to indicate APs can
leave spin and go.
2. Ia32/PageTables64.asm
As described above only the TDX BSP build the page tables. So
PageTables64.asm is updated to make sure only TDX BSP build the
PageTables. TDX APs will skip the page table building and set Cr3
directly.
3. Ia16/ResetVectorVtf0.asm
In Tdx all CPUs "reset" to run on 32-bit protected mode with flat
descriptor (paging disabled). But in Non-Td guest the initial state of
CPUs is 16-bit real mode. To resolve this conflict, BITS 16/32 is used
in the ResetVectorVtf0.asm. It checks the 32-bit protected mode or 16-bit
real mode, then jump to the corresponding entry point.
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Min Xu <min.m.xu@intel.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
RFC: https://bugzilla.tianocore.org/show_bug.cgi?id=3429
In TDX when host VMM creates a new guest TD, some initial set of
TD-private pages are added using the TDH.MEM.PAGE.ADD function. These
pages typically contain Virtual BIOS code and data along with some clear
pages for stacks and heap. In the meanwhile, some configuration data
need be measured by host VMM. Tdx Metadata is designed for this purpose
to indicate host VMM how to do the above tasks.
More detailed information of Metadata is in [TDVF] Section 11.
Tdx Metadata describes the information about the image for VMM use.
For example, the base address and length of the TdHob, Bfv, Cfv, etc.
The offset of the Metadata is stored in a GUID-ed structure which is
appended in the GUID-ed chain from a fixed GPA (0xffffffd0).
In this commit there are 2 new definitions of BFV & CFV.
Tdx Virtual Firmware (TDVF) includes one Firmware Volume (FV) known
as the Boot Firmware Volume (BFV). The FV format is defined in the
UEFI Platform Initialization (PI) spec. BFV includes all TDVF
components required during boot.
TDVF also include a configuration firmware volume (CFV) that is
separated from the BFV. The reason is because the CFV is measured in
RTMR, while the BFV is measured in MRTD.
In practice BFV is the code part of Ovmf image (OVMF_CODE.fd). CFV is
the vars part of Ovmf image (OVMF_VARS.fd).
Since AMD SEV has already defined some SEV specific memory region in
MEMFD. TDX re-uses some of the memory regions defined by SEV.
- MailBox : PcdOvmfSecGhcbBackupBase|PcdOvmfSecGhcbBackupSize
- TdHob : PcdOvmfSecGhcbBase|PcdOvmfSecGhcbSize
[TDVF] https://software.intel.com/content/dam/develop/external/us/en/
documents/tdx-virtual-firmware-design-guide-rev-1.pdf
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Min Xu <min.m.xu@intel.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
RFC: https://bugzilla.tianocore.org/show_bug.cgi?id=3429
Previously WORK_AREA_GUEST_TYPE was cleared in SetCr3ForPageTables64.
This is workable for Legacy guest and SEV guest. But it doesn't work
after Intel TDX is introduced. It is because all TDX CPUs (BSP and APs)
start to run from 0xfffffff0, thus WORK_AREA_GUEST_TYPE will be cleared
multi-times if it is TDX guest. So the clearance of WORK_AREA_GUEST_TYPE
is moved to Main16 entry point in Main.asm.
Note: WORK_AREA_GUEST_TYPE is only defined for ARCH_X64.
For Intel TDX, its corresponding entry point is Main32 (which will be
introduced in next commit in this patch-set). WORK_AREA_GUEST_TYPE will
be cleared there.
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Min Xu <min.m.xu@intel.com>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Microvm has no LPC bridge, so drop the PciSioSerialDxe driver.
Use SerialDxe instead, with ioport hardcoded to 0x3f8 aka com1 aka ttyS0.
With this tianocore boots to uefi shell prompt on the serial console.
Direct kernel boot can be used too.
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3599
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jiewen Yao <Jiewen.yao@intel.com>