Add the 'AsmRelocateApLoopStartGeneric' for X64 processors except 64-bit
AMD processors with SEV-ES.
Remove the unused arguments of AsmRelocateApLoopStartGeneric, updated
the stack offset.
Create PageTable for the allocated reserved memory.
Only keep 4GB limitation of memory allocation for the case APs still
need to be transferred to 32-bit mode before OS.
Cc: Guo Dong <guo.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Sean Rhodes <sean@starlabs.systems>
Cc: James Lu <james.lu@intel.com>
Cc: Gua Guo <gua.guo@intel.com>
Signed-off-by: Yuanhao Xie <yuanhao.xie@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Ray Ni <ray.ni@intel.com>
During the finalization of Mp initialization before booting into the OS,
depending on whether Mwait is supported or not, AsmRelocateApLoop
places Aps in MWAIT-loop or HLT-loop.
Since paging is necessary for long mode, the original implementation of
moving APs to 32-bit was to disable paging to ensure that the booting
does not crash.
The current modification creates a page table in reserved memory,
avoiding switching modes and reclaiming memory by OS. This modification
is only for 64 bit mode.
More specifically, we keep the AMD logic as the original code flow,
extract and update the Intel-related code, where the APs would stay
in 64-bit, and run in a Mwait or Hlt loop until the OS wake them up.
Signed-off-by: Ray Ni <ray.ni@intel.com>
Signed-off-by: Yuanhao Xie <yuanhao.xie@intel.com>
Reviewed-by: Ray Ni <ray.ni@intel.com>
AsmRelocateApLoop is replicated for future Intel Logic Extraction,
further brings AP into 64-bit, and enables paging.
Signed-off-by: Yuanhao Xie <yuanhao.xie@intel.com>
Reviewed-by: Ray Ni <ray.ni@intel.com>
To remove the dependency of CPU register, 4/8 byte at the top of the
stack is occupied for CpuMpData. BIST information is also taken care
here. This modification is only for PEI phase, since in DXE phase
CpuMpData is accessed via global variable.
Signed-off-by: Yuanhao Xie <yuanhao.xie@intel.com>
Cc: Eric Dong <eric.dong@intel.com>
Reviewed-by: Ray Ni <ray.ni@intel.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
When switch bsp, old bsp and new bsp put CR0/CR4 into stack, and put IDT
and GDT register into a structure. After they exchange their stack, they
restore these registers. This logic is now implemented by assembly code.
This patch aims to reuse (Save/Restore)VolatileRegisters function to
replace such assembly code for better code readability.
Cc: Eric Dong <eric.dong@intel.com>
Reviewed-by: Ray Ni <ray.ni@intel.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Signed-off-by: Zhiguang Liu <zhiguang.liu@intel.com>
Currently, when waking up AP, IDT table of AP will be set in 16 bit code,
and assume the IDT table base is 32 bit. However, the IDT table is created
by BSP. Issue will happen if the BSP allocates memory above 4G for BSP's
IDT table. Moreover, even the IDT table location is below 4G, the handler
function inside the IDT table is 64 bit, and it won't take effect until
CPU transfers to 64 bit long mode. There is no benefit to set IDT table in
such an early phase.
To avoid such issue, this patch moves the LIDT instruction into 64 bit
code.
Cc: Eric Dong <eric.dong@intel.com>
Reviewed-by: Ray Ni <ray.ni@intel.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Signed-off-by: Zhiguang Liu <zhiguang.liu@intel.com>
The AP vector consists of 2 parts:
1. the initial 16-bit code that should be under 1MB and page aligned.
2. the 32-bit/64-bit code that can be anywhere in the memory with any
alignment.
The need of part #2 is because the memory under 1MB is temporary
"stolen" for use and will "give" back after all AP wake up. The range
of memory is not marked as code page in page table. CPU may trigger
exception as soon as NX is enabled.
The part #2 memory allocation can be done in the MpInitLibInitialize.
Signed-off-by: Ray Ni <ray.ni@intel.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
Today's implementation allocates below 1MB memory for the 16bit, 32bit
and 64bit code.
But it's not necessary since now the 32bit and 64bit code run at high
memory no matter in PEI and DXE phase.
The patch simplifies the logic to remove the code that handles the
case when WakeupBufferHigh is 0.
It also reduce the memory foot print under 1MB by allocating
memory for 16bit code only.
MP_CPU_EXCHANGE_INFO is still under 1MB which is immediate
after the 16bit code.
Signed-off-by: Ray Ni <ray.ni@intel.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
global in NASM file is used for symbols that are
referenced in C files.
Remove unneeded global keyword in NASM file.
Signed-off-by: Ray Ni <ray.ni@intel.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
Today's implementation assumes PEI phase runs at 32bit so
the execution-disable feature is not applicable.
It's not always TRUE.
The patch allocates 32bit&64bit code buffer for PEI phase as well.
Signed-off-by: Ray Ni <ray.ni@intel.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
During AP bringup, just after switching to long mode, APs will do some
cpuid calls to verify that the extended topology leaf (0xB) is available
so they can fetch their x2 APIC IDs from it. In the case of SEV-ES,
these cpuid instructions must be handled by direct use of the GHCB MSR
protocol to fetch the values from the hypervisor, since a #VC handler
is not yet available due to the AP's stack not being set up yet.
For SEV-SNP, rather than relying on the GHCB MSR protocol, it is
expected that these values would be obtained from the SEV-SNP CPUID
table instead. The actual x2 APIC ID (and 8-bit APIC IDs) would still
be fetched from hypervisor using the GHCB MSR protocol however, so
introducing support for the SEV-SNP CPUID table in that part of the AP
bring-up code would only be to handle the checks/validation of the
extended topology leaf.
Rather than introducing all the added complexity needed to handle these
checks via the CPUID table, instead let the BSP do the check in advance,
since it can make use of the #VC handler to avoid the need to scan the
SNP CPUID table directly, and add a flag in ExchangeInfo to communicate
the result of this check to APs.
Cc: Eric Dong <eric.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Ray Ni <ray.ni@intel.com>
Suggested-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3324
The SEV-ES stacks currently share a page with the reset code and data.
Separate the SEV-ES stacks from the reset vector code and data to avoid
possible stack overflows from overwriting the code and/or data.
When SEV-ES is enabled, invoke the GetWakeupBuffer() routine a second time
to allocate a new area, below the reset vector and data.
Both the PEI and DXE versions of GetWakeupBuffer() are changed so that
when PcdSevEsIsEnabled is true, they will track the previous reset buffer
allocation in order to ensure that the new buffer allocation is below the
previous allocation. When PcdSevEsIsEnabled is false, the original logic
is followed.
Fixes: 7b7508ad78
Cc: Eric Dong <eric.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Cc: Marvin Häuser <mhaeuser@posteo.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <3cae2ac836884b131725866264e0a0e1897052de.1621024125.git.thomas.lendacky@amd.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
MpInitLib contains a function MicrocodeDetect() which is called by
all threads as an AP procedure.
Today this function contains below code:
if (CurrentRevision != LatestRevision) {
AcquireSpinLock(&CpuMpData->MpLock);
DEBUG ((
EFI_D_ERROR,
"Updated microcode signature [0x%08x] does not match \
loaded microcode signature [0x%08x]\n",
CurrentRevision, LatestRevision
));
ReleaseSpinLock(&CpuMpData->MpLock);
}
When the if-check is passed, the code may call into PEI services:
1. AcquireSpinLock
When the PcdSpinTimeout is not 0, TimerLib
GetPerformanceCounterProperties() is called. And some of the
TimerLib implementations would get the information cached in
HOB. But AP procedure cannot call PEI services to retrieve the
HOB list.
2. DEBUG
Certain DebugLib relies on ReportStatusCode services and the
ReportStatusCode PPI is retrieved through the PEI services.
DebugLibSerialPort should be used.
But when SerialPortLib is implemented to depend on PEI services,
even using DebugLibSerialPort can still cause AP calls PEI
services resulting hang.
It causes a lot of debugging effort on the platform side.
There are 2 options to fix the problem:
1. make sure platform DSC chooses the proper DebugLib and set the
PcdSpinTimeout to 0. So that AcquireSpinLock and DEBUG don't call
PEI services.
2. remove the AcquireSpinLock and DEBUG call from the procedure.
Option #2 is preferred because it's not practical to ask every
platform DSC to be written properly.
Following option #2, there are two sub-options:
2.A. Just remove the if-check.
2.B. Capture the CurrentRevision and ExpectedRevision in the memory
for each AP and print them together from BSP.
The patch follows option 2.B.
Signed-off-by: Ray Ni <ray.ni@intel.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
When AP firstly wakes up, MpFuncs.nasm contains below logic to assign
an unique ApIndex to each AP according to who comes first:
---ASM---
TestLock:
xchg [edi], eax
cmp eax, NotVacantFlag
jz TestLock
mov ecx, esi
add ecx, ApIndexLocation
inc dword [ecx]
mov ebx, [ecx]
Releaselock:
mov eax, VacantFlag
xchg [edi], eax
---ASM END---
"lock inc" cannot be used to increase ApIndex because not only the
global ApIndex should be increased, but also the result should be
stored to a local general purpose register EBX.
This patch learns from the NASM implementation of
InternalSyncIncrement() to use "XADD" instruction which can increase
the global ApIndex and store the original ApIndex to EBX in one
instruction.
With this patch, OVMF when running in a 255 threads QEMU spends about
one second to wakeup all APs. Original implementation needs more than
10 seconds.
Signed-off-by: Ray Ni <ray.ni@intel.com>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Reviewed-by: Michael D Kinney <michael.d.kinney@intel.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=3179
When BSP first time wakes all APs, each AP atomically increases
CpuMpData->CpuCount and CpuMpData->FinishedCount.
Each AP atomically increases CpuMpData->NumApsExecuting
in early assembly code and decreases it before it enters to HLT or
MWAIT state.
Putting them together, the 3 variables are changed in the following order:
1. NumApsExecuting++ // in assembly
2. CpuCpunt++
4. FinishedCount++
3. NumApsExecuting-- // in C
BSP waits for a certain timeout and then polls NumApsExecuting
until it drops to zero. It assumes all APs are waken up concurrently
and NumApsExecuting only drops to zero when all APs have checked in.
Then it additionally waits for FinishedCount == CpuCount - 1. (FinishedCount doesn't include BSP while CpuCount includes BSP.)
There is no need to additionally wait for
FinishedCount == CpuCount - 1 because when NumApsExecuting == 0,
the number of increament of FinishedCount and CpuCount should equal.
This patch simplifies the code to remove "CpuCount++" in
ApWakeupFunction() and
assigns FinishedCount + 1 to CpuCount after WakeUpAP().
Signed-off-by: Ray Ni <ray.ni@intel.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=3182
Fix the order of operations in ApWakeupFunction() when PcdCpuApLoopMode
is set to HLT mode that uses INIT-SIPI-SIPI to wake APs. In this mode,
volatile state is restored and saved each time a INIT-SIPI-SIPI is sent
to an AP to request a function to be executed on the AP. When the
function is completed the volatile state of the AP is saved. However,
the counters NumApsExecuting and FinishedCount are updated before
the volatile state is saved. This allows for a race condition window
for the BSP that is waiting on these counters to request a new
INIT-SIPI-SIPI before all the APs have completely saved their volatile
state. The fix is to save the AP volatile state before updating the
NumApsExecuting and FinishedCount counters.
Cc: Eric Dong <eric.dong@intel.com>
Reviewed-by: Ray Ni <ray.ni@intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Reviewed-by: Star Zeng <star.zeng@intel.com>
Signed-off-by: Michael D Kinney <michael.d.kinney@intel.com>
This patch fixed the hang in UEFICpuPkg when it is dispatched above 4GB.
In UEFI BIOS case CpuInfoInHob is provided to DXE under 4GB from PEI.
When using UEFI payload and bootloaders, CpuInfoInHob will be allocated
above 4GB since it is not provided from bootloader. so we need update
the code to make sure this hob could be accessed correctly in this case.
Signed-off-by: Guo Dong <guo.dong@intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Ray Ni <ray.ni@intel.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3008
The QemuFlashPtrWrite() flash services runtime uses the GHCB and VmgExit()
directly to perform the flash write when running as an SEV-ES guest. If an
interrupt arrives between VmgInit() and VmgExit(), the Dr7 read in the
interrupt handler will generate a #VC, which can overwrite information in
the GHCB that QemuFlashPtrWrite() has set. This has been seen with the
timer interrupt firing and the CpuExceptionHandlerLib library code,
UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/
Xcode5ExceptionHandlerAsm.nasm and
ExceptionHandlerAsm.nasm
reading the Dr7 register while QemuFlashPtrWrite() is using the GHCB. In
general, it is necessary to protect the GHCB whenever it is used, not just
in QemuFlashPtrWrite().
Disable interrupts around the usage of the GHCB by modifying the VmgInit()
and VmgDone() interfaces:
- VmgInit() will take an extra parameter that is a pointer to a BOOLEAN
that will hold the interrupt state at the time of invocation. VmgInit()
will get and save this interrupt state before updating the GHCB.
- VmgDone() will take an extra parameter that is used to indicate whether
interrupts are to be (re)enabled. Before exiting, VmgDone() will enable
interrupts if that is requested.
Fixes: 437eb3f7a8
Cc: Eric Dong <eric.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Acked-by: Eric Dong <eric.dong@intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <c326a4fd78253f784b42eb317589176cf7d8592a.1604685192.git.thomas.lendacky@amd.com>
The AP reset vector stack allocation is only required if running as an
SEV-ES guest. Since the reset vector allocation is below 1MB in memory,
eliminate the requirement for bare-metal systems and non SEV-ES guests
to allocate the extra stack area, which can be large if the
PcdCpuMaxLogicalProcessorNumber value is large, and also remove the
CPU_STACK_ALIGNMENT alignment.
Fixes: 7b7508ad78 ("UefiCpuPkg: Allow AP booting under SEV-ES")
Cc: Garrett Kirkendall <garrett.kirkendall@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <21345cdbc906519558202b3851257ca07b9239ba.1600884239.git.thomas.lendacky@amd.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
[lersek@redhat.com: supply missing space character after "PcdGet32"]
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
Before UEFI transfers control to the OS, it must park the AP. This is
done using the AsmRelocateApLoop function to transition into 32-bit
non-paging mode. For an SEV-ES guest, a few additional things must be
done:
- AsmRelocateApLoop must be updated to support SEV-ES. This means
performing a VMGEXIT AP Reset Hold instead of an MWAIT or HLT loop.
- Since the AP must transition to real mode, a small routine is copied
to the WakeupBuffer area. Since the WakeupBuffer will be used by
the AP during OS booting, it must be placed in reserved memory.
Additionally, the AP stack must be located where it can be accessed
in real mode.
- Once the AP is in real mode it will transfer control to the
destination specified by the OS in the SEV-ES AP Jump Table. The
SEV-ES AP Jump Table address is saved by the hypervisor for the OS
using the GHCB VMGEXIT AP Jump Table exit code.
Cc: Eric Dong <eric.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
Typically, an AP is booted using the INIT-SIPI-SIPI sequence. This
sequence is intercepted by the hypervisor, which sets the AP's registers
to the values requested by the sequence. At that point, the hypervisor can
start the AP, which will then begin execution at the appropriate location.
Under SEV-ES, AP booting presents some challenges since the hypervisor is
not allowed to alter the AP's register state. In this situation, we have
to distinguish between the AP's first boot and AP's subsequent boots.
First boot:
Once the AP's register state has been defined (which is before the guest
is first booted) it cannot be altered. Should the hypervisor attempt to
alter the register state, the change would be detected by the hardware
and the VMRUN instruction would fail. Given this, the first boot for the
AP is required to begin execution with this initial register state, which
is typically the reset vector. This prevents the BSP from directing the
AP startup location through the INIT-SIPI-SIPI sequence.
To work around this, the firmware will provide a build time reserved area
that can be used as the initial IP value. The hypervisor can extract this
location value by checking for the SEV-ES reset block GUID that must be
located 48-bytes from the end of the firmware. The format of the SEV-ES
reset block area is:
0x00 - 0x01 - SEV-ES Reset IP
0x02 - 0x03 - SEV-ES Reset CS Segment Base[31:16]
0x04 - 0x05 - Size of the SEV-ES reset block
0x06 - 0x15 - SEV-ES Reset Block GUID
(00f771de-1a7e-4fcb-890e-68c77e2fb44e)
The total size is 22 bytes. Any expansion to this block must be done
by adding new values before existing values.
The hypervisor will use the IP and CS values obtained from the SEV-ES
reset block to set as the AP's initial values. The CS Segment Base
represents the upper 16 bits of the CS segment base and must be left
shifted by 16 bits to form the complete CS segment base value.
Before booting the AP for the first time, the BSP must initialize the
SEV-ES reset area. This consists of programming a FAR JMP instruction
to the contents of a memory location that is also located in the SEV-ES
reset area. The BSP must program the IP and CS values for the FAR JMP
based on values drived from the INIT-SIPI-SIPI sequence.
Subsequent boots:
Again, the hypervisor cannot alter the AP register state, so a method is
required to take the AP out of halt state and redirect it to the desired
IP location. If it is determined that the AP is running in an SEV-ES
guest, then instead of calling CpuSleep(), a VMGEXIT is issued with the
AP Reset Hold exit code (0x80000004). The hypervisor will put the AP in
a halt state, waiting for an INIT-SIPI-SIPI sequence. Once the sequence
is recognized, the hypervisor will resume the AP. At this point the AP
must transition from the current 64-bit long mode down to 16-bit real
mode and begin executing at the derived location from the INIT-SIPI-SIPI
sequence.
Another change is around the area of obtaining the (x2)APIC ID during AP
startup. During AP startup, the AP can't take a #VC exception before the
AP has established a stack. However, the AP stack is set by using the
(x2)APIC ID, which is obtained through CPUID instructions. A CPUID
instruction will cause a #VC, so a different method must be used. The
GHCB protocol supports a method to obtain CPUID information from the
hypervisor through the GHCB MSR. This method does not require a stack,
so it is used to obtain the necessary CPUID information to determine the
(x2)APIC ID.
The new 16-bit protected mode GDT entry is used in order to transition
from 64-bit long mode down to 16-bit real mode.
A new assembler routine is created that takes the AP from 64-bit long mode
to 16-bit real mode. This is located under 1MB in memory and transitions
from 64-bit long mode to 32-bit compatibility mode to 16-bit protected
mode and finally 16-bit real mode.
Cc: Eric Dong <eric.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2198
When starting APs in an SMP configuration, the AP needs to know if it is
running as an SEV-ES guest in order to assign a GHCB page.
Add a field to the CPU_MP_DATA structure that will indicate if SEV-ES is
enabled. This new field is set during MP library initialization with the
PCD value PcdSevEsIsEnabled. This flag can then be used to determine if
SEV-ES is enabled.
Cc: Eric Dong <eric.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Dong <eric.dong@intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Regression-tested-by: Laszlo Ersek <lersek@redhat.com>