Bugzilla: 3668 (https://bugzilla.tianocore.org/show_bug.cgi?id=3668)
The Arm True Random Number Generator (TRNG) library defines an
interface to access the entropy source on a platform. On platforms
that do not have access to an entropy source, a NULL instance of
the TRNG library may be useful to satisfy the build dependency.
Therefore, add a NULL instance of the Arm TRNG library.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Liming Gao <gaoliming@byosoft.com.cn>
RVCT is obsolete and no longer used.
Remove support for it.
Signed-off-by: Rebecca Cran <quic_rcran@quicinc.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
REF? https://bugzilla.tianocore.org/show_bug.cgi?id=3912
UefiCpuPkg define a new Protocol with the new services
SmmWaitForAllProcessor(), which can be used by SMI handler
to optionally wait for other APs to complete SMM rendezvous in
relaxed AP mode.
VariableSmm and VariableStandaloneMM driver in MdeModulePkg need
to use this services but MdeModulePkg can't depend on UefiCpuPkg.
Thus, the solution is moving SmmCpuRendezvouslib.h from UefiCpuPkg
to MdePkg and creating SmmCpuRendezvousLib NullLib version
implementation in MdePkg as dependency for the pkg that can't
depend on UefiCpuPkg.
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Cc: Eric Dong <eric.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Michael Kubacki <mikuback@linux.microsoft.com>
Cc: Siyuan Fu <siyuan.fu@intel.com>
Signed-off-by: Zhihao Li <zhihao.li@intel.com>
Acked-by: Liming Gao <gaoliming@byosoft.com.cn>
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3902
Bad IO performance in SEC phase is observed after TDX features was
introduced. (after commit b6b2de8848 - "MdePkg: Support mmio for
Tdx guest in BaseIoLibIntrinsic").
This is because IsTdxGuest() will be called in each MMIO operation.
It is trying to cache the result of the probe in the efi data segment.
However, that doesn't work in SEC, because the data segment is read only
(so the write seems to succeed but a read will always return the
original value), leading to us calling TdIsEnabled() check for every
mmio we do, which is causing the slowdown because it's very expensive.
This patch is to call CcProbe instead of TdIsEnabled in IsTdxGuest.
Null instance of CcProbe always returns CCGuestTypeNonEncrypted. Its
OvmfPkg version returns the guest type in Ovmf work area.
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Cc: Zhiguang Liu <zhiguang.liu@intel.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Liming Gao <gaoliming@byosoft.com.cn>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Min Xu <min.m.xu@intel.com>
RFC: https://bugzilla.tianocore.org/show_bug.cgi?id=3429
Intel TDX architecture does not prescribe a specific software convention
to perform I/O from the guest TD. Guest TD providers have many choices to
provide I/O to the guest. The common I/O models are emulated devices,
para-virtualized devices, SRIOV devices and Direct Device assignments.
TDVF chooses para-virtualized I/O (Choice-A) which use the TDG.VP.VMCALL
function to invoke the funtions provided by the host VMM to perform I/O.
Another choice (Choice-B) is the emulation performed by the #VE handler.
There are 2 benefits of para-virtualized I/O:
1. Performance.
VMEXIT/VMENTRY is skipped so that the performance is better than #VE
handler.
2. De-couple with #VE handler.
Choice-B depends on the #VE handler which means I/O is not available
until #VE handler is installed. For example, in PEI phase #VE handler
is installed in CpuMpPei, while communication with Qemu (via I/O port)
happen earlier than it.
IoLibInternalTdx.c provides the helper functions for Tdx guest.
IoLibInternalTdxNull.c provides the null version of the helper functions.
It is included in the Non-X64 IoLib so that the build will not be broken.
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Cc: Zhiguang Liu <zhiguang.liu@intel.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liming Gao <gaoliming@byosoft.com.cn>
Reviewed-by: Jiewen Yao <jiewen.yao@intel.com>
Signed-off-by: Min Xu <min.m.xu@intel.com>
Make BaseRngLib more generic by moving x86-specific functionality into
'Rand' and adding files under 'AArch64' to support the optional ARMv8.5
RNG instruction RNDR that is a part of FEAT_RNG.
Signed-off-by: Rebecca Cran <rebecca@nuviainc.com>
Reviewed-by: Liming Gao <gaoliming@byosoft.com.cn>
Reviewed-by: Sami Mujawar <sami.mujawar@arm.com>
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=3325
1. AsmReadMsr64() in X64/GccInlinePriv.c
AsmReadMsr64 can return uninitialized value if FilterBeforeMsrRead
returns False. This causes build error with the CLANG toolchain.
2. AsmWriteMsr64() in X64/GccInlinePriv.c
In the case that FilterBeforeMsrWrite changes Value and returns True,
The original Value, not the changed Value, is written to the MSR.
This behavior is different from the one of AsmWriteMsr64() in
X64/WriteMsr64.c for the MSFT toolchain.
Signed-off-by: Takuto Naito <naitaku@gmail.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Cc: Zhiguang Liu <zhiguang.liu@intel.com>
Reviewed-by: Liming Gao <gaoliming@byosoft.com.cn>
CpuPause() might allow the CPU to go into a lower power state
state while we spin.
On X86, CpuPause() executes a PAUSE instruction which the Intel
and AMD specs describe as follows:
Intel:
"PAUSE: An additional function of the PAUSE instruction is to reduce
the power consumed by a processor while executing a spin loop. A
processor can execute a spin-wait loop extremely quickly, causing the
processor to consume a lot of power while it waits for the resource it
is spinning on to become available. Inserting a pause instruction in a
spin-wait loop greatly reduces the processor?s power consumption."
AMD:
"PAUSE: Improves the performance of spin loops, by providing a hint to
the processor that the current code is in a spin loop. The processor
may use this to optimize power consumption while in the spin loop.
Architecturally, this instruction behaves like a NOP instruction."
On RISC-V and ARM64, CpuPause() executes a NOP, which is no worse than
the tight loop we have.
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Michael D Kinney <michael.d.kinney@intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Liming Gao <gaoliming@byosoft.com.cn>
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=3168
This interface provides an abstration layer to allow MM modules to access
requested areas that are outside of MMRAM. On MM model that blocks all
non-MMRAM accesses, areas requested through this API will be mapped or
unblocked for accessibility inside MM environment.
For MM modules that need to access regions outside of MMRAMs, the agents
that set up these regions are responsible for invoking this API in order
for these memory areas to be accessible from inside MM.
Example usages:
1. To enable runtime cache feature for variable service, Variable MM
module will need to access the allocated runtime buffer. Thus the agent
sets up these buffers, VariableSmmRuntimeDxe, will need to invoke this
API to make these regions accessible by Variable MM.
2. For TPM ACPI table to communicate to physical presence handler, the
corresponding NVS region has to be accessible from inside MM. Once the
NVS region are assigned, it needs to be unblocked thourgh this API.
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Cc: Zhiguang Liu <zhiguang.liu@intel.com>
Cc: Hao A Wu <hao.a.wu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kun Qin <kun.q@outlook.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Liming Gao <gaoliming@byosoft.com.cn>
Message-Id: <MWHPR06MB31028AF0D0785B93E4E7CF63F3969@MWHPR06MB3102.namprd06.prod.outlook.com>
The Raspberry Pi platform with Secure Boot enabled currently fails to build
with error:
Module type [DXE_RUNTIME_DRIVER] is not supported by library instance
[/home/appveyor/projects/rpi4/edk2/MdePkg/Library/DxeRngLib/DxeRngLib.inf]
Add the missing class to fix this issue.
Signed-off-by: Pete Batard <pete@akeo.ie>
Reviewed-by: Samer El-Haj-Mahmoud <Samer.El-Haj-Mahmoud@arm.com>
Reviewed-by: Andrei Warkentin <awarkentin@vmware.com>
Reviewed-by: Liming Gao <gaoliming@byosoft.com.cn>
The current SSE2 implementation of the ZeroMem(), SetMem(),
SetMem16(), SetMem32 and SetMem64 functions is writing 16 bytes per 16
bytes. It hurts the performances so bad that this is even slower than
a simple 'rep stos' (4% slower) in regular DRAM.
To take full advantages of the 'movntdq' instruction it is better to
"queue" a total of 64 bytes in the write combining buffers. This
patch implement such a change. Below is a table where I measured
(with 'rdtsc') the time to write an entire 100MB RAM buffer. These
functions operate almost two times faster.
| Function | Arch | Untouched | 64 bytes | Result |
|----------+------+-----------+----------+--------|
| ZeroMem | Ia32 | 17765947 | 9136062 | 1.945x |
| ZeroMem | X64 | 17525170 | 9233391 | 1.898x |
| SetMem | Ia32 | 17522291 | 9137272 | 1.918x |
| SetMem | X64 | 17949261 | 9176978 | 1.956x |
| SetMem16 | Ia32 | 18219673 | 9372062 | 1.944x |
| SetMem16 | X64 | 17523331 | 9275184 | 1.889x |
| SetMem32 | Ia32 | 18495036 | 9273053 | 1.994x |
| SetMem32 | X64 | 17368864 | 9285885 | 1.870x |
| SetMem64 | Ia32 | 18564473 | 9241362 | 2.009x |
| SetMem64 | X64 | 17506951 | 9280148 | 1.886x |
Signed-off-by: Jeremy Compostella <jeremy.compostella@intel.com>
Reviewed-by: Liming Gao <gaoliming@byosoft.com.cn>
Correct the memory offsets used in REG_ONE/REG_PAIR macros to
synchronize them with definition of the BASE_LIBRARY_JUMP_BUFFER
structure on AArch64.
The REG_ONE macro declares only a single 64-bit register be
read/written; however, the subsequent offset is 16 bytes larger,
creating an unused memory gap in the middle of the structure and
causing SetJump/LongJump functions to read/write 8 bytes of memory
past the end of the jump buffer struct.
Signed-off-by: Jan Bobek <jbobek@nvidia.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@arm.com>
Acked-by: Michael D Kinney <michael.d.kinney@intel.com>
Acked-by: Liming Gao <gaoliming@byosoft.com.cn>