Skip to content

Conversation

gaspar-ilom
Copy link
Contributor

@gaspar-ilom gaspar-ilom commented Mar 9, 2025

T440p/w541 only : Untested: only for external flashers. -> 10s spent in romstage, lot of debug info: truncates cbmem up to cbmem being 1mb big still

Quickly hacked together. Probably I still missed something, but I will let you have a look.
Takes upstream NRI patch train from https://review.coreboot.org/c/coreboot/+/64186/9 and changes Heads coreboot configs for t440p/w541 to test results on top of coreboot 24.12


  • @gaspar-ilom cherry-pick tlaurion@e42b913
  • edit board configs docs
  • understand what happens in the 10s spent under romstage, where user left in the dark without bootplash as for other boards
  • Decide if 19s of boot time prior of being under Heads is good enough (vs 50s with preppy's MRC blob under master)
  • Document
  • merge

@tlaurion See also #1825 (comment) and #1711 (comment)

TODOs:

Before merge:

So t440p/w541 board owners, it would be a time to compare before/after this PR roms:

  • sudo systemd-analyze blame
  • suspend/resume
  • cbmem -t or master vs this PR
  • sluggishness/unresponsiveness felt/measured

Board owners:

AFTER MERGE:

  • remove references in board owners haswell docs telling MRC blobs are needed
  • others?

EDIT: 4cb6985 should probably be merged into #1908 as a cleanup -> done

@tlaurion
Copy link
Collaborator

tlaurion commented Mar 9, 2025

EDIT: 4cb6985 should probably be merged into #1908 as a cleanup

@gaspar-ilom done

tlaurion added a commit to tlaurion/heads that referenced this pull request Mar 9, 2025
…oreboot.modify_and_save_oldconfig_in_place

Input for linuxboot#1923

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
@tlaurion
Copy link
Collaborator

tlaurion commented Mar 9, 2025

@gaspar-ilom cherry-pick tlaurion@e42b913

Would be nice if you gave repro instructions under 1b3cd51 to dump patches in the right place for audit/repro/future patchsets needing to be cherry-picked for future work to use this as ref

@tlaurion
Copy link
Collaborator

tlaurion commented Mar 9, 2025

Wow @gaspar-ilom ! That was fast!

Edited OP for testing!

@MattClifton76
Copy link

Does this need tested? I have everything still out and setup if it doesn't work.

@tlaurion
Copy link
Collaborator

tlaurion commented Mar 9, 2025

Does this need tested? I have everything still out and setup if it doesn't work.

@MattClifton76 :
Testing needed Indeed! Only for t440p and w541. Changes suggested for coreboot configs are anesthetics on my side, the builds of Circleci succeeded: this will tell us of NRI state (no more memory blobs for memory init)

See OP for comparisons needs. Suspend/resume needs to work, and if no regression on performance are notable, this will pass the tests here.

Other changes needed are docs related basically.

Edit: @MattClifton76 you do not seem to be board owner for neither of the boards though.

gaspar-ilom pushed a commit to gaspar-ilom/heads that referenced this pull request Mar 9, 2025
…oreboot.modify_and_save_oldconfig_in_place

Input for linuxboot#1923

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
@gaspar-ilom
Copy link
Contributor Author

* [ ]  edit board configs docs

Is that what you had in mind? I could not find any references in the files under boards/. Do you think we should mention NRI there?

* [ ]  remove references in board owners haswell docs telling MRC blobs are needed

Is that referring to heads-wiki?

@gaspar-ilom
Copy link
Contributor Author

Would be nice if you gave repro instructions under 1b3cd51 to dump patches in the right place for audit/repro/future patchsets needing to be cherry-picked for future work to use this as ref

Hmm, do you think I should change the commit message or where exactly do you mean?

The steps I did:

  1. In coreboot: git checkout 5d291de6011a56bfd767c4bcdfdc3aa6ee87a2dd && git format-patch 7b36319fd9..HEAD --start-number=10. 7b36319fd9 is the last commit before the haswell-NRI patch train. Then move the resulting patch files under respective patches/coreboot-24.12/ directory in heads
  2. Remove make dependencies on the mrc.bin blobs and the scripts to create these blobs from the board config and in the blob directory.
  3. Run make menuconfig for the two haswell boards. (I have done that in coreboot. Hence the diff in tlaurion@e42b913 Load the old config file. Under chipset: select "[NOT COMPLETE] Use native raminit". Save in place.

I think that's it. Add to the commit message?

@MattClifton76
Copy link

MattClifton76 commented Mar 9, 2025

image
Fresh Boot
Screenshot_2025-03-09_19-46-07
From suspend
Screenshot_2025-03-09_19-58-07

Flashed, re-ownership, LUKs, Qubes, suspend to ram appears to work on Qubes when selected from the menu.

@tlaurion
Copy link
Collaborator

tlaurion commented Mar 10, 2025

image Fresh Boot Screenshot_2025-03-09_19-46-07 From suspend Screenshot_2025-03-09_19-58-07

Flashed, re-ownership, LUKs, Qubes, suspend to ram appears to work on Qubes when selected from the menu.

Awesome! so this works with tlaurion@e42b913, really good news! Thanks @MattClifton76

@tlaurion
Copy link
Collaborator

tlaurion commented Mar 10, 2025

* [ ]  edit board configs docs

Is that what you had in mind? I could not find any references in the files under boards/. Do you think we should mention NRI there?

* [ ]  remove references in board owners haswell docs telling MRC blobs are needed

Is that referring to heads-wiki?

@gaspar-ilom : Quick searches (neglect heads-wiki, can be done later, there is no t440p/w541 disassemble guide nor anything else under Wiki: were promised by original port guys who moved to other things after the merge. One day, those will be contributed back, or not, from board owners wanting to contribute back :) )


user@localhost:~/heads-wiki$ find ./ -name "*.md" | xargs grep -Rni mrc
./About/Heads-threat-model.md:169: (Intel's MRC and ME firmware, for instance), but the bulk of the vendor
./About/FAQ.md:104:Maybe. x230 has very few (MRC) since it has native vga init.
user@localhost:~/heads-wiki$ cd ~/heads
user@localhost:~/heads$ grep -Rni mrc boards/
user@localhost:~/heads$ grep -Rni mrc blobs/
blobs/t440p/README.md:10:- `mrc.bin` - Consists of Intel’s Memory Reference Code (MRC) and [is used to initialize the DRAM](https://doc.coreboot.org/northbridge/intel/haswell/mrc.bin.html).
blobs/t440p/README.md:17:When building any T440p board variant with `make`, the build system will download a copy of the MRC and Intel ME. We extract `mrc.bin` from a Chromebook firmware image and `me.bin` from a Lenovo firmware update.
blobs/w541/README.md:10:- `mrc.bin` - Consists of Intel’s Memory Reference Code (MRC) and [is used to initialize the DRAM](https://doc.coreboot.org/northbridge/intel/haswell/mrc.bin.html).
blobs/w541/README.md:17:When building any W541 board variant with `make`, the build system will download a copy of the MRC and Intel ME. We extract `mrc.bin` from a Chromebook firmware image and `me.bin` from a Lenovo firmware update.
user@localhost:~/heads$ 

@tlaurion
Copy link
Collaborator

tlaurion commented Mar 10, 2025

Would be nice if you gave repro instructions under 1b3cd51 to dump patches in the right place for audit/repro/future patchsets needing to be cherry-picked for future work to use this as ref

Hmm, do you think I should change the commit message or where exactly do you mean?

The steps I did:

1. In coreboot: `git checkout 5d291de6011a56bfd767c4bcdfdc3aa6ee87a2dd && git format-patch 7b36319fd9..HEAD --start-number=10`. 7b36319fd9 is the last commit before the haswell-NRI patch train. Then move the resulting patch files under respective `patches/coreboot-24.12/` directory in heads

2. Remove make dependencies on the mrc.bin blobs and the scripts to create these blobs from the board config and in the blob directory.

3. Run make menuconfig for the two haswell boards. (I have done that in coreboot. Hence the diff in [tlaurion@e42b913](https://github.com/tlaurion/heads/commit/e42b913a122626471a438c3c4eec72f340aa5940) Load the old config file. Under chipset: select "[NOT COMPLETE] Use native raminit". Save in place.

I think that's it. Add to the commit message?

Yes, that's what I try to do with everything I do so that commit messages always contains a "repro" section where relevant, so others can arrive to the same result. Here, one would have to replicate exactly your steps to make sure that patches you took from coreboot, and the coreboot patches you put in the patch directory, matches. That is the job of the person that merges the patchwork to reproduce, otherwise we trust blindly, which is not recommended for security projects. Patches should be bit by bit the same, and patched with another patch if we need to change something from where we got it. That also shows upstream what to modify if they want to replicate what was done here, and for CircleCI to arrive at the final hash which even coreboot devs would be able to replicate. It was once suggested that Heads became a base for testing patches for boards used by real users. This is kind of what we are doing here. Awesome and quick work @gaspar-ilom :) those board owners should give you a tip if you tell where to do so! I love just guiding here, seems like you get the gist of Heads here! Thank you for your collaborations!

Note on your point 3, and my commit message for a206e15
./docker_repro.sh make BOARD=UNTESTED_XYZ-maximized coreboot.modify_and_save_oldconfig_in_place

@tlaurion
Copy link
Collaborator

tlaurion commented Mar 10, 2025

@gaspar-ilom since @MattClifton76 confirmed things work on t440p, you could as well do
./docker_repro.sh make BOARD=UNTESTED_XYZ board.move_untested_to_tested

Which from Makefile does

board.move_untested_to_tested:
        @echo "Moving $(BOARD) from UNTESTED to tested status"
        @NEW_BOARD=$$(echo $(BOARD) | sed 's/^UNTESTED_//'); \
        INCLUDE_BOARD=$$(grep "include \$$(pwd)/boards/" boards/$(BOARD)/$(BOARD).config | sed 's/.*boards\/\(.*\)\/.*/\1/'); \
        NEW_INCLUDE_BOARD=$$(echo $$INCLUDE_BOARD | sed 's/^UNTESTED_//'); \
        echo "Updating config file: boards/$(BOARD)/$(BOARD).config"; \
        sed -i 's/$(BOARD)/'$${NEW_BOARD}'/g' boards/$(BOARD)/$(BOARD).config; \
        sed -i 's/'$$INCLUDE_BOARD'/'$$NEW_INCLUDE_BOARD'/g' boards/$(BOARD)/$(BOARD).config; \
        echo "Renaming config file to $${NEW_BOARD}.config"; \
        mv boards/$(BOARD)/$(BOARD).config boards/$(BOARD)/$${NEW_BOARD}.config; \
        echo "Renaming board directory to $${NEW_BOARD}"; \
        mv boards/$(BOARD) boards/$${NEW_BOARD}; \
        echo "Updating .circleci/config.yml"; \
        sed -i "s/$(BOARD)/$${NEW_BOARD}/g" .circleci/config.yml; \
        echo "Operation completed for $(BOARD) -> $${NEW_BOARD}"

Each time I have to do something that I feel I will have to redo in the future, I add helpers. Either in global Makefile or in modules/* makefiles

The current helpers are

user@localhost:~/heads$ make 
Display all 108 possibilities? (y or n)
all                                                          hidapi                                                       mbedtls.clean
bash                                                         hidapi.clean                                                 modules.clean
bash.clean                                                   initrd                                                       msrtools
board.move_tested_to_unmaintained                            initrd.clean                                                 msrtools.clean
board.move_tested_to_untested                                inject_gpg                                                   musl-cross-make
board.move_unmaintained_to_tested                            json-c                                                       musl-cross-make.clean
board.move_untested_to_tested                                json-c.clean                                                 ncurses
board.move_untested_to_unmaintained                          kexec                                                        ncurses.clean
busybox                                                      kexec.clean                                                  npth
busybox.clean                                                libaio                                                       npth.clean
cairo                                                        libaio.clean                                                 packages
cairo.clean                                                  libassuan                                                    payload
coreboot-24.12                                               libassuan.clean                                              pciutils
coreboot-24.12.clean                                         libgcrypt                                                    pciutils.clean
coreboot.modify_and_save_oldconfig_in_place                  libgcrypt.clean                                              pinentry
coreboot.modify_defconfig_in_place                           libgpg-error                                                 pinentry.clean
coreboot.save_in_defconfig_format_in_place                   libgpg-error.clean                                           pixman
coreboot.save_in_oldconfig_format_in_place                   libksba                                                      pixman.clean
cryptsetup2                                                  libksba.clean                                                popt
cryptsetup2.clean                                            libpng                                                       popt.clean
dropbear                                                     libpng.clean                                                 qrencode
dropbear.clean                                               libusb                                                       qrencode.clean
e2fsprogs                                                    libusb.clean                                                 real.clean
e2fsprogs.clean                                              linux                                                        real.gitclean
echo_modules                                                 linuxboot.run                                                real.gitclean_keep_packages
exfatprogs                                                   linux.clean                                                  real.gitclean_keep_packages_and_build
exfatprogs.clean                                             linux.modify_and_save_defconfig_in_place                     real.remove_canary_files-extract_patch_rebuild_what_changed
fbwhiptail                                                   linux.modify_and_save_oldconfig_in_place                     run
fbwhiptail.clean                                             linux.prompt_for_new_config_options_for_kernel_version_bump  tpmtotp
flashprog                                                    linux.save_in_defconfig_format_in_place                      tpmtotp.clean
flashprog.clean                                              linux.save_in_olddefconfig_format_in_place                   util-linux
flashtools                                                   linux.save_in_versioned_defconfig_format                     util-linux.clean
flashtools.clean                                             linux.save_in_versioned_oldconfig                            zlib
FORCE                                                        lvm2                                                         zlib.clean
gpg2                                                         lvm2.clean                                                   zstd
gpg2.clean                                                   mbedtls                                                      zstd.clean

@tlaurion
Copy link
Collaborator

tlaurion commented Mar 10, 2025

@gaspar-ilom since @MattClifton76 confirmed things work on t440p, you could as well do ./docker_repro.sh make BOARD=UNTESTED_XYZ board.move_untested_to_tested

Which from Makefile does

board.move_untested_to_tested:
        @echo "Moving $(BOARD) from UNTESTED to tested status"
        @NEW_BOARD=$$(echo $(BOARD) | sed 's/^UNTESTED_//'); \
        INCLUDE_BOARD=$$(grep "include \$$(pwd)/boards/" boards/$(BOARD)/$(BOARD).config | sed 's/.*boards\/\(.*\)\/.*/\1/'); \
        NEW_INCLUDE_BOARD=$$(echo $$INCLUDE_BOARD | sed 's/^UNTESTED_//'); \
        echo "Updating config file: boards/$(BOARD)/$(BOARD).config"; \
        sed -i 's/$(BOARD)/'$${NEW_BOARD}'/g' boards/$(BOARD)/$(BOARD).config; \
        sed -i 's/'$$INCLUDE_BOARD'/'$$NEW_INCLUDE_BOARD'/g' boards/$(BOARD)/$(BOARD).config; \
        echo "Renaming config file to $${NEW_BOARD}.config"; \
        mv boards/$(BOARD)/$(BOARD).config boards/$(BOARD)/$${NEW_BOARD}.config; \
        echo "Renaming board directory to $${NEW_BOARD}"; \
        mv boards/$(BOARD) boards/$${NEW_BOARD}; \
        echo "Updating .circleci/config.yml"; \
        sed -i "s/$(BOARD)/$${NEW_BOARD}/g" .circleci/config.yml; \
        echo "Operation completed for $(BOARD) -> $${NEW_BOARD}"

Each time I have to do something that I feel I will have to redo in the future, I add helpers. Either in global Makefile or in modules/* makefiles

The current helpers are

user@localhost:~/heads$ make 
Display all 108 possibilities? (y or n)
all                                                          hidapi                                                       mbedtls.clean
bash                                                         hidapi.clean                                                 modules.clean
bash.clean                                                   initrd                                                       msrtools
board.move_tested_to_unmaintained                            initrd.clean                                                 msrtools.clean
board.move_tested_to_untested                                inject_gpg                                                   musl-cross-make
board.move_unmaintained_to_tested                            json-c                                                       musl-cross-make.clean
board.move_untested_to_tested                                json-c.clean                                                 ncurses
board.move_untested_to_unmaintained                          kexec                                                        ncurses.clean
busybox                                                      kexec.clean                                                  npth
busybox.clean                                                libaio                                                       npth.clean
cairo                                                        libaio.clean                                                 packages
cairo.clean                                                  libassuan                                                    payload
coreboot-24.12                                               libassuan.clean                                              pciutils
coreboot-24.12.clean                                         libgcrypt                                                    pciutils.clean
coreboot.modify_and_save_oldconfig_in_place                  libgcrypt.clean                                              pinentry
coreboot.modify_defconfig_in_place                           libgpg-error                                                 pinentry.clean
coreboot.save_in_defconfig_format_in_place                   libgpg-error.clean                                           pixman
coreboot.save_in_oldconfig_format_in_place                   libksba                                                      pixman.clean
cryptsetup2                                                  libksba.clean                                                popt
cryptsetup2.clean                                            libpng                                                       popt.clean
dropbear                                                     libpng.clean                                                 qrencode
dropbear.clean                                               libusb                                                       qrencode.clean
e2fsprogs                                                    libusb.clean                                                 real.clean
e2fsprogs.clean                                              linux                                                        real.gitclean
echo_modules                                                 linuxboot.run                                                real.gitclean_keep_packages
exfatprogs                                                   linux.clean                                                  real.gitclean_keep_packages_and_build
exfatprogs.clean                                             linux.modify_and_save_defconfig_in_place                     real.remove_canary_files-extract_patch_rebuild_what_changed
fbwhiptail                                                   linux.modify_and_save_oldconfig_in_place                     run
fbwhiptail.clean                                             linux.prompt_for_new_config_options_for_kernel_version_bump  tpmtotp
flashprog                                                    linux.save_in_defconfig_format_in_place                      tpmtotp.clean
flashprog.clean                                              linux.save_in_olddefconfig_format_in_place                   util-linux
flashtools                                                   linux.save_in_versioned_defconfig_format                     util-linux.clean
flashtools.clean                                             linux.save_in_versioned_oldconfig                            zlib
FORCE                                                        lvm2                                                         zlib.clean
gpg2                                                         lvm2.clean                                                   zstd
gpg2.clean                                                   mbedtls                                                      zstd.clean

RE-EDIT: @gaspar-ilom : please cherry-pick tlaurion@af84b6a (note order of repro notes: hotp includes non-hotp board variants) :)

@tlaurion
Copy link
Collaborator

tlaurion commented Mar 10, 2025

EDITED @gaspar-ilom please cherry-pick tlaurion@1e23270

* [ ]  add @MattClifton76 to t440p owners under BOARD_TESTERS.md :)

@tlaurion
Copy link
Collaborator

@gaspar-ilom : do you still have w541 and can report everything kosher?

gaspar-ilom pushed a commit to gaspar-ilom/heads that referenced this pull request Mar 10, 2025
…oreboot.modify_and_save_oldconfig_in_place

Input for linuxboot#1923

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
@gaspar-ilom
Copy link
Contributor Author

gaspar-ilom commented Mar 10, 2025

Yes, that's what I try to do with everything I do so that commit messages always contains a "repro" section where relevant, so others can arrive to the same result. Here, one would have to replicate exactly your steps to make sure that patches you took from coreboot, and the coreboot patches you put in the patch directory, matches. That is the job of the person that merges the patchwork to reproduce, otherwise we trust blindly, which is not recommended for security projects. Patches should be bit by bit the same, and patched with another patch if we need to change something from where we got it. That also shows upstream what to modify if they want to replicate what was done here, and for CircleCI to arrive at the final hash which even coreboot devs would be able to replicate. It was once suggested that Heads became a base for testing patches for boards used by real users. This is kind of what we are doing here.

Maybe it would be worth writing a short section on commit (message) guidelines about that in the heads-wiki. Mention this and maybe a few other things such as signing and signing-off commits. It is probably also one of those recurring tasks to get contributors to do this :-)

EDIT:

Awesome and quick work @gaspar-ilom :) those board owners should give you a tip if you tell where to do so! I love just guiding here, seems like you get the gist of Heads here! Thank you for your collaborations!

If anyone wants to give a tip it should be to you or the project in general. I see this as a small contribution I can make. I am using it on my boards too after all. And tbh. this one was such a small effort compared to the porting of the T480, but you know that @tlaurion Anyway, I would not want to take a tip as I might just disappear from the project at some point. The plus side of all work for the T480 under your guidance is that I am now much more familiar with the code base and the build system. So a contribution like this one takes a lot less time.

@gaspar-ilom
Copy link
Contributor Author

@gaspar-ilom : do you still have w541 and can report everything kosher?

Still have it. Gonna test later.

@gaspar-ilom
Copy link
Contributor Author

@gaspar-ilom cherry-pick tlaurion@e42b913

Would be nice if you gave repro instructions under 1b3cd51 to dump patches in the right place for audit/repro/future patchsets needing to be cherry-picked for future work to use this as ref

done

@gaspar-ilom
Copy link
Contributor Author

gaspar-ilom commented Mar 10, 2025

@gaspar-ilom : do you still have w541 and can report everything kosher?

I have just tested with f3eb374

  • W541 works
    • but the boot is not as fast as I would have hoped. It took 16s from pressing the power button to showing the bootsplash and then a few more before the gui came up. This is way better than with the mrc.bin but does not compare to the T430 for instance where I get the boot splash almost immediately after pressing the power button. For me this is good enough.
  • T430 works.
    • Just tested this board to verify no regressions and anyway still wanted to flash after updating coreboot to 24.12.

@MattClifton76 have you measured boot time for T440p?

@tlaurion What is missing so that this can be merged? Can you do the review? If not who should we poke?

@MattClifton76
Copy link

@gaspar-ilom i just timed it, 11-12 seconds to get to splash screen. Much longer to get into qubes because of LUks and logging in. It's definitely much slower than my T480. Is there still some code refinement that needs to happen up stream? Will it get faster as coreboot matures?

gaspar-ilom pushed a commit to gaspar-ilom/heads that referenced this pull request Mar 10, 2025
…oreboot.modify_and_save_oldconfig_in_place

Input for linuxboot#1923

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
@tlaurion
Copy link
Collaborator

tlaurion commented Mar 10, 2025

@MattClifton76 @gaspar-ilom

The proper tool to get coreboot boot time measurements is cbmem -t.

Console logs are cbmem -1 (this is a one) output as log so I can have eyes here. A diff between t430 and t440p should tell us a lot here.

So

  • Heads recovery shell
  • mount-usb --mode rw
  • cbmem -t > /media/cbmem_stages_time.txt
  • cbmem -1 > /media/cbmem_console.txt
  • umount /media

Upload cbmem_stages_time.txt and cbmem_console.txt.

Also, coreboot config shows debug statements here and there. Nevermind.

We need a baseline prior of tuning things, understand where time is spent at least, and why.


@gaspar-ilom once 24.12 PR is merged, thus or should be rebased --signoff so that comment here are only relevant to NRI port effort. Nobody will look at all the changes coming from T480 to 24.12 bump to only check NRI related config changes.

I'm out of time today but this is where the fun starts. One would have to explore the changes and kconfig dependencies and what can be done here.


12s at each boot means memory is most probably retrained at each boot. High level analysis of upstream patchwork stipulates that it should not be the case.

This is where collaboration upstream starts. Either here or subsequent PR.

But first, we need to understand config options related to NRI, but at first glance there is none outside of tweaking numerical values... But also Kconfig options says s3 suspend resume is not working..... While it is. I thing it got just merged to stop bitorotting and conflicting with other code base. Future of NRI start here (while be aware that memory training code is the most complicated part, and really hard to reverse. So amazing work here already.)

My main concern here is to understand what happens when there is no bootsplash. Bootsplash shown on romstage. So hypothesis is that mrc cache is not reused. But that needs to be proven with logs.

@MattClifton76
Copy link

@MattClifton76 @gaspar-ilom

The proper tool to get coreboot boot measurements is cbmem -t

So

* Heads recovery shell

* mount-usb --mode rw

* cbmem -t > /media/cbmem_stages_time.txt

* umount /media

Upload cbmem_stages_time.txt content here

Also, coreboot config shows ebug statements here and there.

We need a baseline prior of tuning things

@gaspar-ilom once 24.12 PR is merged, thus or should be rebased --signoff so that comment here are only relevant to NRI port effort. Nobody will look at all the changes coming from T480 to 24.12 bump to only check NRI related config changes.

I'm out of time today but this is where the fun starts. One would have to explore the changes and kconfig dependencies and what can be done here.

12s at each boot means memory is most probably retrained at each boot. High level analysis of upstream patchwork stipulates that it should not be the case.

This is where collaboration upstream starts. Either here or subsequent PR. But first, we need to understand config options related to NRI, but at first glance there is none outside of tweaking numerical values...

Please also share cbmem -1 (this is a one) output as log so I can have eyes here. A diff between t430 and t440p should tell us a lot here.

@tlaurion here is the requested txt file.
cbmem_stages_time.txt

@tlaurion
Copy link
Collaborator

@MattClifton76 @gaspar-ilom

The proper tool to get coreboot boot measurements is cbmem -t

So

* Heads recovery shell

* mount-usb --mode rw

* cbmem -t > /media/cbmem_stages_time.txt

* umount /media

Upload cbmem_stages_time.txt content here

Also, coreboot config shows ebug statements here and there.

We need a baseline prior of tuning things

@gaspar-ilom once 24.12 PR is merged, thus or should be rebased --signoff so that comment here are only relevant to NRI port effort. Nobody will look at all the changes coming from T480 to 24.12 bump to only check NRI related config changes.

I'm out of time today but this is where the fun starts. One would have to explore the changes and kconfig dependencies and what can be done here.

12s at each boot means memory is most probably retrained at each boot. High level analysis of upstream patchwork stipulates that it should not be the case.

This is where collaboration upstream starts. Either here or subsequent PR. But first, we need to understand config options related to NRI, but at first glance there is none outside of tweaking numerical values...

Please also share cbmem -1 (this is a one) output as log so I can have eyes here. A diff between t430 and t440p should tell us a lot here.

@tlaurion here is the requested txt file.
cbmem_stages_time.txt

You were a bit too quick, I added cbmem -1 but not in my steps... Logs would help, and config gives timestamps there as well, figurenwill be complete and I will be able to compare with x230 posting same logs.

Sorry for the edit while you were doing that. See #1923 (comment) again.

@Th3Fanbus
Copy link

So, w541_usb_ge96f426.log has a raminit log where I don't see anything unusual (it's just four dual-rank, reference card F DDR3 SO-DIMMs getting trained). As there aren't any timestamps during raminit, I can't tell from the log file where the long boot times may come from. Even with logging, I would expect boot times to be less than 5 seconds.

If you watch the EHCI debug console in real time, are the romstage messages getting printed unusually slowly? Messages should be going pretty fast (compare against ramstage logs), unless there's big pauses somewhere. If so, it would be great to know at which point in the log those big pauses occur (note down the messages right before each pause).

@tlaurion
Copy link
Collaborator

Result under tlaurion@e96f426 @gaspar-ilom

@tlaurion @Th3Fanbus here are the logs with increased debug level and ram init debugging enabled:
w541_usb_ge96f426.log
Btw, it takes way longer to boot than with the previous commit with a lower debug level.

@gaspar-ilom @Th3Fanbus

From log

[DEBUG]  MRC: Checking cached data update for 'RW_MRC_CACHE'.
[DEBUG]  flash size 0x2800000 bytes
[INFO ]  SF: Detected 00 0000 with sector size 0x1000, total 0x2800000
[ERROR]  SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!!
[NOTE ]  MRC: no data in 'RW_MRC_CACHE'
[DEBUG]  MRC: cache data 'RW_MRC_CACHE' needs update.
[DEBUG]  SF: Successfully written 2 bytes @ 0x20000
[DEBUG]  SF: Successfully written 2 bytes @ 0x20002
[DEBUG]  SF: Successfully written 20 bytes @ 0x20020
[DEBUG]  SF: Successfully written 4092 bytes @ 0x20034
[DEBUG]  MRC: updated 'RW_MRC_CACHE'.

That's what cought my attention. @Th3Fanbus?

ping @Th3Fanbus I do not think this is normal and seems to prevent MRC cache to be saved and reused no?

@tlaurion
Copy link
Collaborator

tlaurion commented Apr 11, 2025

So, w541_usb_ge96f426.log has a raminit log where I don't see anything unusual (it's just four dual-rank, reference card F DDR3 SO-DIMMs getting trained). As there aren't any timestamps during raminit, I can't tell from the log file where the long boot times may come from. Even with logging, I would expect boot times to be less than 5 seconds.

If you watch the EHCI debug console in real time, are the romstage messages getting printed unusually slowly? Messages should be going pretty fast (compare against ramstage logs), unless there's big pauses somewhere. If so, it would be great to know at which point in the log those big pauses occur (note down the messages right before each pause).

@Th3Fanbus @gaspar-ilom :
Can't ts be used to prepend every serial console line received by app ?

I find ts really useful in situations like this, where program output doesn't include timestamp where I need them. The result looks like ([...] redacted):

sudo wyng-util-qubes [...] | ts

[...]
Apr 11 10:38:56 Last updated 2025-04-11 09:57:46.894347 (-04:00)
Apr 11 10:41:50 
Apr 11 10:41:50 Preparing snapshots in '/var/lib/qubes/'...
Apr 11 10:41:50   Queuing full scan of import 'boot'
Apr 11 10:43:22 Acquiring deltas.
Apr 11 10:43:23 
Apr 11 10:43:23 Sending backup session 20250411-103854:
[...]

Which show clearly where time was spent between each line printed on console.
minicom | ts > serial_nri_coreboot_output.txt ?

@Th3Fanbus
Copy link

Result under tlaurion@e96f426 @gaspar-ilom

@tlaurion @Th3Fanbus here are the logs with increased debug level and ram init debugging enabled:
w541_usb_ge96f426.log
Btw, it takes way longer to boot than with the previous commit with a lower debug level.

@gaspar-ilom @Th3Fanbus
From log

[DEBUG]  MRC: Checking cached data update for 'RW_MRC_CACHE'.
[DEBUG]  flash size 0x2800000 bytes
[INFO ]  SF: Detected 00 0000 with sector size 0x1000, total 0x2800000
[ERROR]  SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!!
[NOTE ]  MRC: no data in 'RW_MRC_CACHE'
[DEBUG]  MRC: cache data 'RW_MRC_CACHE' needs update.
[DEBUG]  SF: Successfully written 2 bytes @ 0x20000
[DEBUG]  SF: Successfully written 2 bytes @ 0x20002
[DEBUG]  SF: Successfully written 20 bytes @ 0x20020
[DEBUG]  SF: Successfully written 4092 bytes @ 0x20034
[DEBUG]  MRC: updated 'RW_MRC_CACHE'.

That's what cought my attention. @Th3Fanbus?

ping @Th3Fanbus I do not think this is normal and seems to prevent MRC cache to be saved and reused no?

What exactly doesn't seem normal? I already said that [ERROR] SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!! seems wrong: it detects as if the flash chip size is much larger than CONFIG_ROM_SIZE, were any flash chips replaced?

The MRC cache itself seems to be populated properly, remember this is a first-time boot so the cache was empty.

@tlaurion
Copy link
Collaborator

@Th3Fanbus

What exactly doesn't seem normal? I already said that [ERROR] SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!! seems wrong: it detects as if the flash chip size is much larger than CONFIG_ROM_SIZE, were any flash chips replaced?

No. That's exactly the question here.

If you look at the full oldconfig file in PR, romsize fits bios region. Unfortunately I cannot help more here than repeating that this seems to be the cause of issue, and that MRC training de esnt seem to stick and constantly be retrained, but post first boot logs are missing.

@gaspar-ilom ping on piping the logs through | ts on host receiving the logs so we have timestamps for each line of cbmem output over serial.

Nothing much else I can say here.

@Th3Fanbus
Copy link

@Th3Fanbus

What exactly doesn't seem normal? I already said that [ERROR] SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!! seems wrong: it detects as if the flash chip size is much larger than CONFIG_ROM_SIZE, were any flash chips replaced?

No. That's exactly the question here.

If you look at the full oldconfig file in PR, romsize fits bios region. Unfortunately I cannot help more here than repeating that this seems to be the cause of issue, and that MRC training de esnt seem to stick and constantly be retrained, but post first boot logs are missing.

@gaspar-ilom ping on piping the logs through | ts on host receiving the logs so we have timestamps for each line of cbmem output over serial.

Nothing much else I can say here.

@gaspar-ilom did you happen to replace the flash chips on this computer with bigger ones? I need to understand why the logs contain the error I quoted earlier.

@tlaurion
Copy link
Collaborator

tlaurion commented May 2, 2025

@Th3Fanbus

What exactly doesn't seem normal? I already said that [ERROR] SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!! seems wrong: it detects as if the flash chip size is much larger than CONFIG_ROM_SIZE, were any flash chips replaced?

No. That's exactly the question here.
If you look at the full oldconfig file in PR, romsize fits bios region. Unfortunately I cannot help more here than repeating that this seems to be the cause of issue, and that MRC training de esnt seem to stick and constantly be retrained, but post first boot logs are missing.
@gaspar-ilom ping on piping the logs through | ts on host receiving the logs so we have timestamps for each line of cbmem output over serial.
Nothing much else I can say here.

@gaspar-ilom did you happen to replace the flash chips on this computer with bigger ones? I need to understand why the logs contain the error I quoted earlier.

Will try to speed this up a little. Long delays and long context switches for things I'm not knowledgeable into are not easy for me here.

So Last commit from which logs were extracted by @gaspar-ilom were 6dfe541 in follow up comment #1923 (comment) (even if commit id mismatches @gaspar-ilom )

Result under tlaurion@e96f426 @gaspar-ilom

@tlaurion @Th3Fanbus here are the logs with increased debug level and ram init debugging enabled:

w541_usb_ge96f426.log

Btw, it takes way longer to boot than with the previous commit with a lower debug level.

Where my comment at #1923 (comment) pinpointed

[ERROR] SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!!

I checked all the comments of this PR @Th3Fanbus and recall your comment on ME being at fault #1923 (comment) but cannot find your comment on the size difference error. Sorry if i'm still missing it.

@gaspar-ilom uses w541 and roms built from CircleCI to test and report to this PR to make things reproducible for all board owners.

Therefore, from that commit 6dfe541's coreboot's oldconfig file

My question still stands; where "SF size 0x2800000" is obtained from. @gaspar-ilom correct me if i'm wrong, but you use roms from this PR right and stock SPI

@Th3Fanbus
Copy link

@tlaurion Sorry, the above does not tell me anything I didn't know already.

@tlaurion
Copy link
Collaborator

tlaurion commented May 2, 2025

@tlaurion Sorry, the above does not tell me anything I didn't know already.

I just wanted to speed up between @gaspar-ilom answers, putting the hypothesis else where than

@gaspar-ilom did you happen to replace the flash chips on this computer with bigger ones? I need to understand why the logs contain the error I quoted earlier.

and restated I do not understand either that error happening twice in logs


[DEBUG]  Executing raminit task RAMINITEND
[DEBUG]  Waiting for mc_init_done acknowledgement... DONE!

[DEBUG]  ME: Requested 0MB UMA
[DEBUG]  ME: FW Partition Table      : OK
[DEBUG]  ME: Bringup Loader Failure  : NO
[DEBUG]  ME: Firmware Init Complete  : NO
[DEBUG]  ME: Manufacturing Mode      : YES
[DEBUG]  ME: Boot Options Present    : NO
[DEBUG]  ME: Update In Progress      : NO
[DEBUG]  ME: Current Working State   : Initializing
[DEBUG]  ME: Current Operation State : Bring up
[DEBUG]  ME: Current Operation Mode  : Debug
[DEBUG]  ME: Error Code              : No Error
[DEBUG]  ME: Progress Phase          : BUP Phase
[DEBUG]  ME: Power Management Event  : Pseudo-global reset
[DEBUG]  ME: Progress Phase State    : 0x4d
[DEBUG]  CBMEM:
[DEBUG]  IMD: root @ 0x7f7ff000 254 entries.
[DEBUG]  IMD: root @ 0x7f7fec00 62 entries.
[DEBUG]  FMAP: area COREBOOT found @ 30200 (12385792 bytes)
[DEBUG]  External stage cache:
[DEBUG]  IMD: root @ 0x7fbff000 254 entries.
[DEBUG]  IMD: root @ 0x7fbfec00 62 entries.
[DEBUG]  FMAP: area RW_MRC_CACHE found @ 20000 (65536 bytes)
[DEBUG]  MRC: Checking cached data update for 'RW_MRC_CACHE'.
[DEBUG]  flash size 0x2800000 bytes
[INFO ]  SF: Detected 00 0000 with sector size 0x1000, total 0x2800000
[ERROR]  SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!!
[NOTE ]  MRC: no data in 'RW_MRC_CACHE'
[DEBUG]  MRC: cache data 'RW_MRC_CACHE' needs update.

[INFO ]  Found TPM 1.2 ST33ZP24 (0x0000) by ST Microelectronics (0x104a)
[DEBUG]  PNP: 0c31.0 enabled
[DEBUG]  scan_bus: bus PCI: 00:00:1f.0 finished in 39 msecs
[DEBUG]  PCI: 00:00:1f.3 scanning...
[DEBUG]  scan_bus: bus PCI: 00:00:1f.3 finished in 0 msecs
[DEBUG]  scan_bus: bus DOMAIN: 00000000 finished in 364 msecs
[DEBUG]  scan_bus: bus Root Device finished in 382 msecs
[INFO ]  done
[DEBUG]  BS: BS_DEV_ENUMERATE run times (exec / console): 4 / 393 ms
[DEBUG]  BM-LOCKDOWN: Enabling boot media protection scheme 'readonly' using CTRL...
[DEBUG]  flash size 0x2800000 bytes
[INFO ]  SF: Detected 00 0000 with sector size 0x1000, total 0x2800000
[ERROR]  SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!!
[INFO ]  spi_flash_protect: FPR 0 is enabled for range 0x00000000-0x00bfffff
[INFO ]  BM-LOCKDOWN: Enabled bootmedia protection
[DEBUG]  BS: BS_DEV_RESOURCES entry times (exec / console): 0 / 39 ms
[INFO ]  Timestamp - device configuration: 163000581575
[DEBUG]  found VGA at PCI: 00:00:02.0
[DEBUG]  Setting up VGA for PCI: 00:00:02.0

@gaspar-ilom
Copy link
Contributor Author

Sorry @Th3Fanbus and @tlaurion this project got sidelined. I will provide what you asked for as soon as I find the time. In the meanwhile anyone who cares about these boards is free to contribute as documented here.

@Th3Fanbus
Copy link

Sorry @Th3Fanbus and @tlaurion this project got sidelined. I will provide what you asked for as soon as I find the time. In the meanwhile anyone who cares about these boards is free to contribute as documented here.

I made https://review.coreboot.org/87830 which should print how long each NRI task took. I expect this to be pretty quick in spite of all the logging.

tlaurion added 2 commits June 16, 2025 11:21
Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…M_SETUP=y

This is merged https://review.coreboot.org/c/coreboot/+/87830 to help troubleshoot linuxboot#1923 (comment)

As of now, unknowns are:
- if NRI can work with Heads neutered ME per https://github.com/linuxboot/heads/blob/master/blobs/t440p/download-clean-me and https://github.com/linuxboot/heads/blob/master/blobs/w541/download-clean-me
- What is causing '[ERROR]  SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!!'
- NRI per-task spent time; this is what this commit should shed some light for both w541 and t440p

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
@tlaurion
Copy link
Collaborator

tlaurion commented Jun 16, 2025

@gaspar-ilom I took liberty to push in your branch doing git push gaspar-ilom HEAD:haswell-nri; first time I do that, didn't knew I could.

CircleCI should detect (cache layers checks) new patches/coreboot-24.12/0025-haswell-NRI-measure_per_task_execution_time.patch; reuse layer 1 cache (musl-cross-make) and rebuild layer 2 (coreboot) and layer 3 (all build artifcats part of workspace reusal up to save_cache).

TLDR: bcf27a5 contains merged master + @Th3Fanbus merged patch from https://review.coreboot.org/c/coreboot/+/87830 (hinted from #1923 (comment)) to hopefully move this faster. Once again, we cannot expect people coming to Heads and wanting to flash t440p/w541 to tolerate 30s (30s or 50s?) booting with preppy's MRC blob, while using NRI conclusions as of now (this PR is in DEBUG without full state and problems of it known) to be fully usable.

https://review.coreboot.org/c/coreboot/+/87830 aims to provide logs in the form:

   +------------------+------------+
    | Task             |      msecs |
    +------------------+------------+
    | PROCSPD          |        503 |
    | INITMPLL         |         33 |
    | CONVTIM          |         43 |
    | CONFMC           |          1 |
    | MEMMAP           |         39 |
    | JEDECINIT        |          1 |
    | PRETRAIN         |         23 |
    | SOT              |        394 |
    | RCVET            |       1448 |
    | RDMPRT           |       1088 |
    | JWRL             |       1975 |
    | OPTCOMP          |          0 |
    | POSTTRAIN        |          0 |
    | ACTIVATE         |          0 |
    | SAVE_TRAIN       |          0 |
    | SAVE_NONT        |          0 |
    | RAMINITEND       |          4 |
    +------------------+------------+
    | Total            |       5558 |
    +------------------+------------+

So that @gaspar-ilom's UART+RPI pico debug setup (see #1923 (comment)) can provide logs from this PR's w541 coreboot 24.12+ patches/coreboot-24.12/NRI patches+ w541 coreboot oldconfig+Neutered ME debug logs with CONFIG_DEBUG_RAM_SETUP=y.

Note once more as under commit log bcf27a5 unknowns:

  • What is causing '[ERROR] SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!!'
  • NRI can work with neutered ME

10-4 (bitrotting PR is no fun. merging master+adapt under ce63b85 took around 1h with changes + local build test. Not sure I will rebase/merge master in the future if this PR doesn't progress soon to a merging state myself).

@tlaurion
Copy link
Collaborator

@gaspar-ilom : master's patch+revert patches+reapply logic was improved in the past months and was merged back in your PR.

So doing

  • ./docker_repro.sh make BOARD=EOL_UNTESTED_w541-maximized real.remove_canary_files-extract_patch_rebuild_what_changed
  • ./docker_repro.sh make BOARD=EOL_UNTESTED_w541-maximized

Should suffice producing local reproducible rom to CI to test and produce debug logs for bcf27a5 (While CI is producing clean ROM while I wrote those lines)

@tlaurion
Copy link
Collaborator

Sorry @Th3Fanbus and @tlaurion this project got sidelined. I will provide what you asked for as soon as I find the time. In the meanwhile anyone who cares about these boards is free to contribute as documented here.

Unfortunately I'm not sure we have anyone else under Heads currently willing to reproduce coreboot debug dongle you done under #1923 (comment). Hopefully others will come but as I said under #1923 (comment), I unfortunately think this is my last attempt to help current technical board owners to get NRI under Heads for w541/t440p.

@gaspar-ilom
Copy link
Contributor Author

@gaspar-ilom did you happen to replace the flash chips on this computer with bigger ones? I need to understand why the logs contain the error I quoted earlier.

@Th3Fanbus I have not replaced the flash chips and neither did anyone else afaict. The chips definitely have the original size 4 and 8MB.

@gaspar-ilom
Copy link
Contributor Author

Thanks @tlaurion your work on this PR.

Here are the usb logs I captured on a W541 with the circleci build for bcf27a5:

@Th3Fanbus I hope this helps. It includes the changes https://review.coreboot.org/c/coreboot/+/87830 and thus hopefully improved logs. I do not have the time (and knowledge) to analyze the logs, but I hope this helps anyway. Let me know if I missed anything.

@tlaurion
Copy link
Collaborator

tlaurion commented Jun 17, 2025

To speed up things once more:

Thanks @tlaurion your work on this PR.

Here are the usb logs I captured on a W541 with the circleci build for bcf27a5:

excerpt:
[DEBUG]  +------------------+------------+
[DEBUG]  | Task             |      msecs |
[DEBUG]  +------------------+------------+
[DEBUG]  | PROCSPD          |       2491 |
[DEBUG]  | INITMPLL         |        183 |
[DEBUG]  | CONVTIM          |        251 |
[DEBUG]  | CONFMC           |          5 |
[DEBUG]  | MEMMAP           |        240 |
[DEBUG]  | JEDECINIT        |          8 |
[DEBUG]  | PRETRAIN         |        554 |
[DEBUG]  | SOT              |       3473 |
[DEBUG]  | RCVET            |      11811 |
[DEBUG]  | RDMPRT           |       8889 |
[DEBUG]  | JWRL             |      22270 |
[DEBUG]  | ACTIVATE         |          3 |
[DEBUG]  | SAVE_TRAIN       |         28 |
[DEBUG]  | SAVE_NONT        |          0 |
[DEBUG]  | RAMINITEND       |         22 |
[DEBUG]  +------------------+------------+
[DEBUG]  | Total            |      50235 |
[DEBUG]  +------------------+------------+

excerpt:

[DEBUG]  +------------------+------------+
[DEBUG]  | Task             |      msecs |
[DEBUG]  +------------------+------------+
[DEBUG]  | PROCSPD          |       2490 |
[DEBUG]  | INITMPLL         |        183 |
[DEBUG]  | CONVTIM          |        251 |
[DEBUG]  | CONFMC           |          5 |
[DEBUG]  | MEMMAP           |        240 |
[DEBUG]  | JEDECINIT        |          8 |
[DEBUG]  | PRETRAIN         |        554 |
[DEBUG]  | SOT              |       3472 |
[DEBUG]  | RCVET            |      12050 |
[DEBUG]  | RDMPRT           |       8875 |
[DEBUG]  | JWRL             |      22261 |
[DEBUG]  | ACTIVATE         |          3 |
[DEBUG]  | SAVE_TRAIN       |         28 |
[DEBUG]  | SAVE_NONT        |          0 |
[DEBUG]  | RAMINITEND       |         22 |
[DEBUG]  +------------------+------------+
[DEBUG]  | Total            |      50449 |
[DEBUG]  +------------------+------------+
  • Yet another boot but this time I piped the output through ts as suggested here. I did not stop the time, but it seems like the boot process is even slower when USB debugging, so I am not sure how valuable the absolute times are, but maybe relative time spent in different phases of NRI can be analyzed that way: w541_subsequent_boot_bcf27a5_ts.log

So as can be seen above, memory training happens at each boot, MRC cache is not reused (I pointed above it didn't seem to be saved, but that is for @Th3Fanbus to say what is the truth here, I cannot help much more than what I've done before)

@Th3Fanbus I hope this helps. It includes the changes https://review.coreboot.org/c/coreboot/+/87830 and thus hopefully improved logs. I do not have the time (and knowledge) to analyze the logs, but I hope this helps anyway. Let me know if I missed anything.

@Th3Fanbus this log will help as if you were in front of computer with ts provided timestamp.
Let us know what we can do.

Known

  • No SPI chip replaced, as expected (ROM is produced by PR oldconfig; wouldn't have made sense that SPI was of different size) So error quoted previously from unknown source, config seems right and is direct link here once more.
  • MRC cache is not used (second boot even takes longer than first boot)
  • Why ME cannot be neutered here needs to be known; otherwise we should most probably give up on the NRI effort here. Please advise @Th3Fanbus

@gaspar-ilom thanks for your time and sharing the logs. I think logs with ts output from w541_subsequent_boot_bcf27a5_ts.log should tell us more, hopefully.

@crazyfox-ua
Copy link

Hey, Team!

I'd like to offer some testing efforts from my side, but my experience in coreboot debugging is limited, so probably sort of step-by-step guidance will be needed.

Hardware: T440p

  • i7-4910MQ
  • Intel HD 4600
  • 1 x AMD AE32G1339S1-UO 2Gb 1333MHz (9-9-9-24) 1.5v
  • 2 x Zifei 8GB 1866MHz PC3L-14900 1.35V
  • 2 x Micron 16GB 1600MHz PC3L-12800S (MT16KTF2G64HZ-1G6A1)
  • FTDI232 dongle

Previously I was on Coreboot 25.03 (Edk2) with NRI enabled.

2Gb or 2x8Gb worked flawlessly.

1x16Gb was able to load Tianocore Menu, but with some artifacts on the upper side of display. SuperGrub2 able to boot, Linux EFI or Windows 11 unable to boot (hangs after few seconds)

2x16Gb able to load Tianocore Menu, but upper side of display is black, SuperGrub2 able to boot, Linux EFI and Windows still hangs.

Yesterday I've updated to CB 25.06 (Edk2) with NRI enabled.

2Gb works fine

2Gb + 8Gb works fine

2x8Gb hangs on Coreboot logo (Tianocore is not loaded)

Any combinations with 16Gb (2 + 16, 8 + 16, 16 + 16) able to load Tianocore Menu with artifacts on display, SuperGrub2 able to boot (still with artifacts), but Linux EFI and Windows still hangs.

So, if I may be helpful to the project - could someone guide me how to collect debugging info?

@MattClifton76
Copy link

Hey, Team!

I'd like to offer some testing efforts from my side, but my experience in coreboot debugging is limited, so probably sort of step-by-step guidance will be needed.

Hardware: T440p

  • i7-4910MQ

  • Intel HD 4600

  • 1 x AMD AE32G1339S1-UO 2Gb 1333MHz (9-9-9-24) 1.5v

  • 2 x Zifei 8GB 1866MHz PC3L-14900 1.35V

  • 2 x Micron 16GB 1600MHz PC3L-12800S (MT16KTF2G64HZ-1G6A1)

  • FTDI232 dongle

Previously I was on Coreboot 25.03 (Edk2) with NRI enabled.

2Gb or 2x8Gb worked flawlessly.



1x16Gb was able to load Tianocore Menu, but with some artifacts on the upper side of display. SuperGrub2 able to boot, Linux EFI or Windows 11 unable to boot (hangs after few seconds)



2x16Gb able to load Tianocore Menu, but upper side of display is black, SuperGrub2 able to boot, Linux EFI and Windows still hangs.

Yesterday I've updated to CB 25.06 (Edk2) with NRI enabled.

2Gb works fine



2Gb + 8Gb works fine



2x8Gb hangs on Coreboot logo (Tianocore is not loaded)



Any combinations with 16Gb (2 + 16, 8 + 16, 16 + 16) able to load Tianocore Menu with artifacts on display, SuperGrub2 able to boot (still with artifacts), but Linux EFI and Windows still hangs.

So, if I may be helpful to the project - could someone guide me how to collect debugging info?

With an FT232H dongle, scroll up I believe @gaspar-ilom posted a little guide.

@gaspar-ilom
Copy link
Contributor Author

So, if I may be helpful to the project - could someone guide me how to collect debugging info?

@crazyfox-ua I think you need an FT232H as pointed out by @Th3Fanbus here.

I have posted a guide in this comment

@crazyfox-ua
Copy link

Thanks @gaspar-ilom, @MattClifton76 for assistance!
Here is logs from the same hardware:

  1. For CB 25.06 with MRC blob enabled (system boots with no issues) coreboot_mrc_2x8.log

  2. For CB 25.06 with NRI enabled (system hangs on CB logo or on black screen with minor artifacts) coreboot_nri_2x8.log

Not sure which part may be useful, so adding complete log.

@Th3Fanbus
Copy link

@crazyfox-ua Hmmm, interesting that NRI would fail like that. I've checked the logs and the only thing I've noticed are some differences in the memory map, but NRI's memory map should be more optimal (it requires using less MTRRs to configure cacheability of the various memory ranges).

I've noticed the NRI log did not have CONFIG_DEBUG_RAM_SETUP=y (it's in the Debugging menu in menuconfig) so it is missing quite a bit of the debug output (including the data plots), could you please try getting another log with that option enabled?

@crazyfox-ua
Copy link

@Th3Fanbus here is log with debug enabled: coreboot_debug_nri_2x8.log.

@Th3Fanbus
Copy link

@crazyfox-ua Thank you. I'm stumped: I don't see anything wrong in the log (wrong enough to cause problems with the payload and/or OS). I wonder if the issues also happen with a single 8 GiB or 16 GiB SO-DIMM (even though I wouldn't expect dual-channel operation to be the culprit).

@crazyfox-ua
Copy link

@Th3Fanbus thanks for confirming, for me that behavior also looks strange.

For now, it's failing to boot in any of following combinations: 8+8, 16+16, 16, 8, 2+16, 8+16.
But, it's booting absolutely fine with 2 or 2+8 (just to remind - fails with single 8).

Guessing, that it isn't directly related to RAM init itself, but somewhat affecting Shared Video RAM and causing those artifacts and boot freezes.

Is there anything else I may collect to debug such behavior?

@tlaurion
Copy link
Collaborator

@Th3Fanbus thanks for confirming, for me that behavior also looks strange.

For now, it's failing to boot in any of following combinations: 8+8, 16+16, 16, 8, 2+16, 8+16.
But, it's booting absolutely fine with 2 or 2+8 (just to remind - fails with single 8).

Guessing, that it isn't directly related to RAM init itself, but somewhat affecting Shared Video RAM and causing those artifacts and boot freezes.

Is there anything else I may collect to debug such behavior?

@Th3Fanbus gentle ping

@Th3Fanbus
Copy link

Sorry, I haven't been able to make any progress on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch Haswell boards to NRI (Native Ram Initialization)

5 participants