Documentation: Update index.md and move files

* Add more subdirectories and index.mds.
* Move "getting started" and "lessons" into sub-directories.
* Move "NativeRaminit" into northbridge/intel/sandybridge folder.
* Move "MultiProcessorInit" into soc/intel/icelake folder.
* Reference new files

Change-Id: I78c3ec0e8bcc342686277ae141a88d0486680978
Signed-off-by: Philipp Deppenwiese <zaolin@das-labor.org>
Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com>
Reviewed-on: https://review.coreboot.org/26262
Reviewed-by: Patrick Georgi <pgeorgi@google.com>
Reviewed-by: Philipp Deppenwiese <zaolin.daisuki@gmail.com>
Tested-by: build bot (Jenkins) <no-reply@coreboot.org>
This commit is contained in:
Philipp Deppenwiese
2018-05-13 15:47:18 +02:00
committed by Patrick Georgi
parent e462585c94
commit 438b463a8f
27 changed files with 73 additions and 24 deletions

View File

@@ -1,85 +0,0 @@
# Intel Common Code Block Publishing EFI_MP_SERVICES_PPI
## Introduction
This documentation is intended to document the purpose for creating EFI service
Interface inside coreboot space to perform CPU feature programming on Application
Processors for Intel 9th Gen (Cannon Lake) and beyond CPUs.
Today coreboot is capable enough to handle multi-processor initialization on IA platforms.
The multi-processor initialization code has to take care of lots of duties:
1. Bringing all cores out of reset
2. Load latest microcode on all cores
3. Sync latest MTRR snapshot between BSP and APs
4. Perform sets of CPU feature programming
* CPU Power & Thermal Management
* Overclocking
* Intel Trusted Execution Technology
* Intel Software Guard Extensions
* Intel Processor Trace etc.
This above CPU feature programming lists are expected to grow with current and future
CPU complexity and there might be some cases where certain feature programming mightbe
closed source in nature.
Platform code might need to compromise on those closed source nature of CPU programming
if we don't plan to provide an alternate interface which can be used by coreboot to
get-rid of such close source CPU programming.
## Proposal
As coreboot is doing CPU multi-processor initialization for IA platform before FSP-S
initialization and having all possible information about cores in terms of maximum number
of cores, APIC ids, stack size etc. Its also possible for coreboot to extend its own
support model and create a sets of APIs which later can be used by FSP to run CPU feature
programming using coreboot published APIs.
Due to the fact that FSP is using EFI infrastructure and need to relying on install/locate
PPI to perform certain API call, hence coreboot has to created MP services APIs known as
EFI_MP_SERVICES_PPI as per PI specification volume 1, section 8.3.9.
More details here: [PI_Spec_1_6]
### coreboot to publish EFI_MP_SERVICES_PPI APIs
```eval_rst
+------------------------------+------------------------------------------------------------------+
| API | Description |
+==============================+==================================================================+
| PeiGetNumberOfProcessors | Get the number of CPU's. |
+------------------------------+------------------------------------------------------------------+
| PeiGetProcessorInfo | Get information on a specific CPU. |
+------------------------------+------------------------------------------------------------------+
| PeiStartupAllAPs | Activate all of the application processors. |
+------------------------------+------------------------------------------------------------------+
| PeiStartupThisAP | Activate a specific application processor. |
+------------------------------+------------------------------------------------------------------+
| PeiSwitchBSP | Switch the boot strap processor. |
+------------------------------+------------------------------------------------------------------+
| PeiEnableDisableAP | Enable or disable an application processor. |
+------------------------------+------------------------------------------------------------------+
| PeiWhoAmI | Identify the currently executing processor. |
+------------------------------+------------------------------------------------------------------+
```
## Code Flow
Here is proposed design flow with coreboot has implemented EFI_MP_SERVICES_PPI API and FSP will make
use of the same to perform some CPU feature programming.
**coreboot-FSP MP init flow**
![coreboot-fsp mp init flow][coreboot_publish_mp_service_api]
[coreboot_publish_mp_service_api]: coreboot_publish_mp_service_api.png
## Benefits
1. coreboot was using SkipMpInit=1 which will skip entire FSP CPU feature programming.
With proposed model, coreboot will make use of SkipMpInit=0 which will allow to run all
Silicon recommended CPU programming.
2. CPU feature programming inside FSP will be more transparent than before as its using
coreboot interfaces to execute those programming.
3. coreboot will have more control over running those feature programming as API optimization
handled by coreboot.
[PI_Spec_1_6]: http://www.uefi.org/sites/default/files/resources/PI_Spec_1_6.pdf

Binary file not shown.

Before

Width:  |  Height:  |  Size: 120 KiB

File diff suppressed because it is too large Load Diff

View File

@@ -1,135 +0,0 @@
# Sandy Bridge Raminit
## Introduction
This documentation is intended to document the closed source memory controller
hardware for Intel 2nd Gen (Sandy Bride) and 3rd Gen (Ivy Bridge) core-i CPUs.
The memory initialization code has to take care of lots of duties:
1. Selection of operating frequency
* Selection of common timings
* Applying frequency specific compensation values
* Read training of all populated channels
* Write training of all populated channels
* Adjusting delay networks of address and command signals
* DQS training of all populated channels
* Programming memory map
* Report DRAM configuration
* Error handling
## Definitions
```eval_rst
+---------+-------------------------------------------------------------------+------------+--------------+
| Symbol | Description | Units | Valid region |
+=========+===================================================================+============+==============+
| SCK | DRAM system clock cycle time | s | |
+---------+-------------------------------------------------------------------+------------+--------------+
| tCK | DRAM system clock cycle time | 1/256th ns | |
+---------+-------------------------------------------------------------------+------------+--------------+
| DCK | Data clock cycle time: The time between two SCK clock edges | s | |
+---------+-------------------------------------------------------------------+------------+--------------+
| timA | IO phase: The phase delay of the IO signals | 1/64th DCK | [0-512) |
+---------+-------------------------------------------------------------------+------------+--------------+
| SPD | Manufacturer set memory timings located on an EEPROM on every DIMM| bytes | |
+---------+-------------------------------------------------------------------+------------+--------------+
| REFCK | Reference clock, either 100 or 133 | Mhz | 100, 133 |
+---------+-------------------------------------------------------------------+------------+--------------+
| MULT | DRAM PLL multiplier | | [3-12] |
+---------+-------------------------------------------------------------------+------------+--------------+
| XMP | Extreme Memory Profiles | | |
+---------+-------------------------------------------------------------------+------------+--------------+
```
## (Inoffical) register documentation
- [Sandy Bride - Register documentation](SandyBridge_registers.md)
## Frequency selection
- [Sandy Bride - Frequency selection](Sandybridge_freq.md)
## Read training
- [Sandy Bride - Read training](Sandybridge_read.md)
### SMBIOS type 17
The SMBIOS specification allows to report the memory configuration in use.
On GNU/Linux you can run `# dmidecode -t 17` to view it.
Example output of dmidecode:
```
Handle 0x0045, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: ChannelB-DIMM0
Bank Locator: BANK 2
Type: DDR3
Type Detail: Synchronous
Speed: 933 MHz
Manufacturer: 0420
Serial Number: 00000000
Asset Tag: 9876543210
Part Number: F3-1866C9-8GSR
Rank: 2
Configured Clock Speed: 933 MHz
```
The memory frequency printed by dmidecode is the active memory frequency. It's
**not** the double datarate and it's **not** the one encoded maximum frequency
in each DIMM's SPD.
> **Note:** This feature is available since coreboot 4.4
### MRC cache
The name *MRC cache* might be missleading as in case of *Native ram init*
there's no MRC, but for historical reasons it's still named *MRC cache*.
The MRC cache is part of flash memory that is writeable by coreboot.
At the end of the boot process coreboot will write the RAM training results to
flash for future use, as RAM training is time intensive. Storing the results
allows to boot faster on normal boot and allows to support S3 resume,
as the RAM training results can't be stored in RAM (you need to configure
the memory controller first to access RAM).
The MRC cache needs to be invalidated in case the memory configuration has
been changed. To detect a changed memory configuration the CRC16 of each DIMM
is stored to MRC cache.
> **Note:** This feature is available since coreboot 4.4
### Error handling
As of writing the only supported error handling is to disable the failing
channel and restart the memory training sequence. It's very likely to succeed,
as memory channels operate independent of each other.
In case no DIMM could be initilized coreboot will halt. The screen will stay
black until you power of your device. On some platforms there's additional
feedback to indicate such an event.
If you find `dmidecode -t 17` to report only half of the memory installed,
it's likely that a fatal memory init failure had happened.
It is assumed, that a working board with less physical memory, is much better,
than a board that doesn't boot at all.
> **Note:** This feature is available since coreboot 4.5
Try to swap memory modules and or try to use a different vendor. If nothing
helps you could have a look at capter [Debuggin] or report a ticket
at [ticket.coreboot.org]. Please provide a full RAM init log,
that has been captured using EHCI debug.
To enable extensive RAM training logging enable the Kconfig option
`DEBUG_RAM_SETUP`
#### Lenovo Thinkpads
Lenovo Thinkpads do have an additional feature to indicate that RAM init has
failed and coreboot has died (it calls die() on fatal error, thus the name).
The Kconfig options
`H8_BEEP_ON_DEATH`
`H8_FLASH_LEDS_ON_DEATH`
enable blinking LEDs and enable a beep to indicate death.
> **Note:** This feature is available since coreboot 4.7
## Debugging
It's recommended to use an external debugger, such as serial or EHCI debug
dongle. In case of failing memory init the board might not boot at all,
preventing you from using CBMEM.

View File

@@ -1,165 +0,0 @@
# Frequency selection
## Introduction
This chapter explains the frequency selection done on Sandybride and Ivybridge.
## Definitions
```eval_rst
+---------+-------------------------------------------------------------------+------------+--------------+
| Symbol | Description | Units | Valid region |
+=========+===================================================================+============+==============+
| SCK | DRAM system clock cycle time | s | |
+---------+-------------------------------------------------------------------+------------+--------------+
| tCK | DRAM system clock cycle time | 1/256th ns | |
+---------+-------------------------------------------------------------------+------------+--------------+
| DCK | Data clock cycle time: The time between two SCK clock edges | s | |
+---------+-------------------------------------------------------------------+------------+--------------+
| SPD | Manufacturer set memory timings located on an EEPROM on every DIMM| bytes | |
+---------+-------------------------------------------------------------------+------------+--------------+
| REFCK | Reference clock, either 100 or 133 | MHz | 100, 133 |
+---------+-------------------------------------------------------------------+------------+--------------+
| MULT | DRAM PLL multiplier | | [3-12] |
+---------+-------------------------------------------------------------------+------------+--------------+
| XMP | Extreme Memory Profiles | | |
+---------+-------------------------------------------------------------------+------------+--------------+
```
## SPD
The [SPD](https://de.wikipedia.org/wiki/Serial_Presence_Detect "Serial Presence Detect")
located on every DIMM is factory program with various timings. One of them
specifies the maximum clock frequency the DIMM should be used with. The
operating frequency is stores as fixed point value (tCK), rounded to the next
smallest supported operating frequency. Some
[SPD](https://de.wikipedia.org/wiki/Serial_Presence_Detect "Serial Presence Detect")
contains additional and optional
[XMP](https://de.wikipedia.org/wiki/Extreme_Memory_Profile "Extreme Memory Profile")
data, that stores so called "performance" modes, that advertises higher clock
frequencies.
## XMP profiles
At time of writing coreboot's raminit is able to parse XMP profile 1 and 2.
Only **XMP profile 1** is being used in case it advertises:
* 1.5V operating voltage
* The channel's installed DIMM count doesn't exceed the XMP coded limit
In case the XMP profile doesn't fullfill those limits, the regular SPD will be
used.
> **Note:** XMP Profiles are supported since coreboot 4.4.
It is possible to ignore the max DIMM count limit set by XMP profiles.
By activating Kconfig option `NATIVE_RAMINIT_IGNORE_XMP_MAX_DIMMS` it is
possible to install two DIMMs per channel, even if XMP tells you not to do.
> **Note:** Ignoring XMP Profiles limit is supported since coreboot 4.7.
## Soft fuses
Every board manufacturer does program "soft" fuses to indicate the maximum
DRAM frequency supported. However, those fuses don't set a limit in hardware
and thus are called "soft" fuses, as it is possible to ignore them.
> **Note:** Ignoring the fuses might cause system instability !
On Sandy Bride *CAPID0_A* is being read, and on Ivybridge *CAPID0_B* is being
read. coreboot reads those registers and honors the limit in case the Kconfig
option `CONFIG_NATIVE_RAMINIT_IGNORE_MAX_MEM_FUSES` wasn't set.
Power users that want to let their RAM run at DRAM's "stock" frequency need to
enable the Kconfig symbol.
It is possible to override the soft fuses limit by using a board-specific
[devicetree](#devicetree) setting.
> **Note:** Ignoring max mem freq. fuses is supported since coreboot 4.7.
## Hard fuses
"Hard" fuses are programmed by Intel and limit the maximum frequency that can
be used on a given CPU/board/chipset. At time of writing there's no register
to read this limit, before trying to set a given DRAM frequency. The memory PLL
won't lock, indicating that the chosen memory multiplier isn't available. In
this case coreboot tries the next smaller memory multiplier until the PLL will
lock.
## Devicetree
The devicetree register `max_mem_clock_mhz` overrides the "soft" fuses set
by the board manufacturer.
By using this register it's possible to force a minimum operating frequency.
## Reference clock
While Sandybride supports 133 MHz reference clock (REFCK), Ivy Bridge also
supports 100 MHz reference clock. The reference clock is multiplied by the DRAM
multiplier to select the DRAM frequency (SCK) by the following formula:
REFCK * MULT = 1 / DCK
> **Note:** Since coreboot 4.6 Ivy Bridge supports 100MHz REFCK.
## Sandy Bride's supported frequencies
```eval_rst
+------------+-----------+------------------+-------------------------+---------------+
| SCK [Mhz] | DDR [Mhz] | Mutiplier (MULT) | Reference clock (REFCK) | Comment |
+============+===========+==================+=========================+===============+
| 400 | DDR3-800 | 3 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 533 | DDR3-1066 | 4 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 666 | DDR3-1333 | 5 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 800 | DDR3-1600 | 6 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 933 | DDR3-1866 | 7 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 1066 | DDR3-2166 | 8 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
```
## Ivybridge's supported frequencies
```eval_rst
+------------+-----------+------------------+-------------------------+---------------+
| SCK [Mhz] | DDR [Mhz] | Mutiplier (MULT) | Reference clock (REFCK) | Comment |
+============+===========+==================+=========================+===============+
| 400 | DDR3-800 | 3 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 533 | DDR3-1066 | 4 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 666 | DDR3-1333 | 5 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 800 | DDR3-1600 | 6 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 933 | DDR3-1866 | 7 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 1066 | DDR3-2166 | 8 | 133 MHz | |
+------------+-----------+------------------+-------------------------+---------------+
| 700 | DDR3-1400 | 7 | 100 MHz | '1 |
+------------+-----------+------------------+-------------------------+---------------+
| 800 | DDR3-1600 | 8 | 100 MHz | '1 |
+------------+-----------+------------------+-------------------------+---------------+
| 900 | DDR3-1800 | 9 | 100 MHz | '1 |
+------------+-----------+------------------+-------------------------+---------------+
| 1000 | DDR3-2000 | 10 | 100 MHz | '1 |
+------------+-----------+------------------+-------------------------+---------------+
| 1100 | DDR3-2200 | 11 | 100 MHz | '1 |
+------------+-----------+------------------+-------------------------+---------------+
| 1200 | DDR3-2400 | 12 | 100 MHz | '1 |
+------------+-----------+------------------+-------------------------+---------------+
```
> '1: since coreboot 4.6
## Multiplier selection
coreboot select the maximum frequency to operate at by the following formula:
```
if devicetree's max_mem_clock_mhz > 0:
freq_max := max_mem_clock_mhz
else:
freq_max := soft_fuse_max_mhz
for i in SPDs:
freq_max := MIN(freq_max, ddr_spd_max_mhz[i])
```
As you can see, by using DIMMs with different maximum DRAM frequencies, the
slowest DIMMs' frequency will be selected, to prevent over-clocking it.
The selected frequency gives the PLL multiplier to operate at. In case the PLL
locks (see Take me to [Hard fuses](#hard_fuses)) the frequency will be used for
all DIMMs. At this point it's not possible to change the multiplier again,
until the system has been powered off. In case the PLL doesn't lock, the next
smaller multiplier will be used until a working multiplier will be found.

View File

@@ -1,153 +0,0 @@
# Read training
## Introduction
This chapter explains the read training sequence done on Sandy Bride and
Ivy Bridge memory initialization.
Read training is done to compensate the skew between DQS and SCK and to find
the smallest supported roundtrip delay.
Every board does have a vendor depended routing topology, and can be equip
with any combination of DDR3 memory modules, that introduces different
skew between the memory lanes. With DDR3 a "Fly-By" routing topology
has been introduced, that makes the biggest part of DQS-SCK skew.
The memory code measures the actual skew and actives delay gates,
that will "compensate" the skew.
When in read training the DRAM and the controller are placed in a special mode.
On every read instruction the DRAM outputs a predefined pattern and the memory
controller samples the DQS after a given delay. As the pattern is known, the
actual delay of every lane can be measured.
The values programmed in read training effect DRAM-to-MC transfers only !
## Definitions
```eval_rst
+---------+-------------------------------------------------------------------+------------+--------------+
| Symbol | Description | Units | Valid region |
+=========+===================================================================+============+==============+
| SCK | DRAM system clock cycle time | s | |
+---------+-------------------------------------------------------------------+------------+--------------+
| tCK | DRAM system clock cycle time | 1/256th ns | |
+---------+-------------------------------------------------------------------+------------+--------------+
| DCK | Data clock cycle time: The time between two SCK clock edges | s | |
+---------+-------------------------------------------------------------------+------------+--------------+
| timA | IO phase: The phase delay of the IO signals | 1/64th DCK | [0-512) |
+---------+-------------------------------------------------------------------+------------+--------------+
| SPD | Manufacturer set memory timings located on an EEPROM on every DIMM| bytes | |
+---------+-------------------------------------------------------------------+------------+--------------+
| REFCK | Reference clock, either 100 or 133 | MHz | 100, 133 |
+---------+-------------------------------------------------------------------+------------+--------------+
| MULT | DRAM PLL multiplier | | [3-12] |
+---------+-------------------------------------------------------------------+------------+--------------+
| XMP | Extreme Memory Profiles | | |
+---------+-------------------------------------------------------------------+------------+--------------+
| DQS | Data Strobe signal used to sample all lane's DQ signals | | |
+---------+-------------------------------------------------------------------+------------+--------------+
```
## Hardware
The hardware does have delay logic blocks that can delay the DQ / DQS of a
lane/rank by one or multiple clock cylces and it does have delay logic blocks
that can delay the signal by a multiple of 1/64th DCK per lane.
All delay values can be controlled via software by writing registers in the
MCHBAR.
## IO phase
The IO phase can be adjusted in [0-512) * 1/64th DCK. Incrementing it by 64 is
the same as Incrementing IO delay by 1.
## IO delay
Delays the DQ / DQS signal by one or multiple clock cycles.
### Roundtrip time
The roundtrip time is the time the memory controller waits for data arraving
after a read has been issued. Due to clock-domain crossings, multiple
delay instances and phase interpolators, the signal runtime to DRAM and back
to memory controller defaults to 55 DCKs. The real roundtrip time has to be
measured.
After a read command has been issued, a counter counts down until zero has been
reached and activates the input buffers.
The following pictures shows the relationship between those three values.
The picture was generated from 16 IO delay values times 64 timA values.
The highest IO delay was set on the right-hand side, while the last block
on the left-hand side has zero IO delay.
#### roundtrip 55 DCKs
![timA for lane0 - lane3, roundtrip 55][timA_lane0-3_rt55]
[timA_lane0-3_rt55]: timA_lane0-3_rt55.png
#### roundtrip 54 DCKs
![timA for lane0 - lane3, roundtrip 54][timA_lane0-3_rt54]
[timA_lane0-3_rt54]: timA_lane0-3_rt54.png
#### roundtrip 53 DCKs
![timA for lane0 - lane3, roundtrip 53][timA_lane0-3_rt53]
[timA_lane0-3_rt53]: timA_lane0-3_rt53.png
As you can see the signal has some jitter as every sample was taken in a
different loop iteration. The result register only contains a single bit per
lane.
## Algorithm
### Steps
The algorithm finds the roundtrip time, IO delay and IO phase. The IO phase
will be adjusted to match the falling edge of the preamble of each lane.
The roundtrip time is adjusted to an minimal value, that still includes the
preamble.
### Synchronize to data phase
The first measurement done in read-leveling samples all DQS values for one
phase [0-64) * 1/64th DCK. It then searches for the middle of the low data
symbol and adjusts timA to the found phase and thus the following measurements
will be aligned to the low data symbol.
The code assumes that the initial roundtrip time causes the measurement to be
in the alternating pattern data phase.
### Finding the preamble
After adjusting the IO phase to the middle of one data symbol the preamble will
be located. Unlike the data phase, which is an alternating pattern (010101...),
the preamble consists of two high data cycles.
The code decrements the IO delay/RTT and samples the DQS signal with timA
untouched. As it has been positioned in the middle of the data symbol, it'll
read as either "low" or "high".
If it's "low" we are still in the data phase.
If it's "high" we have found the preamble.
The roundtrip time and IO delay will be adjusted until all lanes are aligned.
The resulting IO delay is visible in the picture below.
**roundtrip time: 49 DCKs, IO delay (at blue point): 6 DCKs**
![timA for lane0 - lane3, finding minimum roundtrip time][timA_lane0-3_discover_420x]
[timA_lane0-3_discover_420x]: timA_lane0-3_discover_420x.png
**Note: The sampled data has been shifted by timA. The preamble is now
in phase.**
## Fine adjustment
As timA still points the middle of the data symbol an offset of 32 is added.
It now points the falling edge of the preamble.
The fine adjustment is to reduce errors introduced by jitter. The phase is
adjusted from `timA - 25` to `timA + 25` and the DQS signal is sampled 100
times. The fine adjustment finds the middle of each rising edge (it's actual
the falling edge of the preamble) to get the final IO phase. You can see the
result in the picture below.
![timA for lane0 - lane3, fine adjustment][timA_lane0-3_adjust_fine]
[timA_lane0-3_adjust_fine]: timA_lane0-3_adjust_fine.png
Lanes 0 - 2 will be adjusted by a phase of -10, while lane 3 is already correct.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 103 KiB