-
-
Notifications
You must be signed in to change notification settings - Fork 198
Use Native Ram Init (NRI) on haswell boards (not chromebox's preppy borrowed MRC blob) #1923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@gaspar-ilom done |
…oreboot.modify_and_save_oldconfig_in_place Input for linuxboot#1923 Signed-off-by: Thierry Laurion <insurgo@riseup.net>
@gaspar-ilom cherry-pick tlaurion@e42b913 Would be nice if you gave repro instructions under 1b3cd51 to dump patches in the right place for audit/repro/future patchsets needing to be cherry-picked for future work to use this as ref |
Wow @gaspar-ilom ! That was fast! Edited OP for testing! |
Does this need tested? I have everything still out and setup if it doesn't work. |
@MattClifton76 : See OP for comparisons needs. Suspend/resume needs to work, and if no regression on performance are notable, this will pass the tests here. Other changes needed are docs related basically. Edit: @MattClifton76 you do not seem to be board owner for neither of the boards though. |
…oreboot.modify_and_save_oldconfig_in_place Input for linuxboot#1923 Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Is that what you had in mind? I could not find any references in the files under
Is that referring to heads-wiki? |
Hmm, do you think I should change the commit message or where exactly do you mean? The steps I did:
I think that's it. Add to the commit message? |
Awesome! so this works with tlaurion@e42b913, really good news! Thanks @MattClifton76
|
@gaspar-ilom : Quick searches (neglect heads-wiki, can be done later, there is no t440p/w541 disassemble guide nor anything else under Wiki: were promised by original port guys who moved to other things after the merge. One day, those will be contributed back, or not, from board owners wanting to contribute back :) )
|
Yes, that's what I try to do with everything I do so that commit messages always contains a "repro" section where relevant, so others can arrive to the same result. Here, one would have to replicate exactly your steps to make sure that patches you took from coreboot, and the coreboot patches you put in the patch directory, matches. That is the job of the person that merges the patchwork to reproduce, otherwise we trust blindly, which is not recommended for security projects. Patches should be bit by bit the same, and patched with another patch if we need to change something from where we got it. That also shows upstream what to modify if they want to replicate what was done here, and for CircleCI to arrive at the final hash which even coreboot devs would be able to replicate. It was once suggested that Heads became a base for testing patches for boards used by real users. This is kind of what we are doing here. Awesome and quick work @gaspar-ilom :) those board owners should give you a tip if you tell where to do so! I love just guiding here, seems like you get the gist of Heads here! Thank you for your collaborations! Note on your point 3, and my commit message for a206e15 |
@gaspar-ilom since @MattClifton76 confirmed things work on t440p, you could as well do Which from Makefile does
Each time I have to do something that I feel I will have to redo in the future, I add helpers. Either in global Makefile or in modules/* makefiles The current helpers are
|
RE-EDIT: @gaspar-ilom : please cherry-pick tlaurion@af84b6a (note order of repro notes: hotp includes non-hotp board variants) :) |
EDITED @gaspar-ilom please cherry-pick tlaurion@1e23270
|
@gaspar-ilom : do you still have w541 and can report everything kosher? |
dd7979f
to
f3eb374
Compare
…oreboot.modify_and_save_oldconfig_in_place Input for linuxboot#1923 Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Maybe it would be worth writing a short section on commit (message) guidelines about that in the heads-wiki. Mention this and maybe a few other things such as signing and signing-off commits. It is probably also one of those recurring tasks to get contributors to do this :-) EDIT:
If anyone wants to give a tip it should be to you or the project in general. I see this as a small contribution I can make. I am using it on my boards too after all. And tbh. this one was such a small effort compared to the porting of the T480, but you know that @tlaurion Anyway, I would not want to take a tip as I might just disappear from the project at some point. The plus side of all work for the T480 under your guidance is that I am now much more familiar with the code base and the build system. So a contribution like this one takes a lot less time. |
Still have it. Gonna test later. |
done |
I have just tested with f3eb374
@MattClifton76 have you measured boot time for T440p? @tlaurion What is missing so that this can be merged? Can you do the review? If not who should we poke? |
@gaspar-ilom i just timed it, 11-12 seconds to get to splash screen. Much longer to get into qubes because of LUks and logging in. It's definitely much slower than my T480. Is there still some code refinement that needs to happen up stream? Will it get faster as coreboot matures? |
f3eb374
to
7038b25
Compare
…oreboot.modify_and_save_oldconfig_in_place Input for linuxboot#1923 Signed-off-by: Thierry Laurion <insurgo@riseup.net>
The proper tool to get coreboot boot time measurements is cbmem -t. Console logs are cbmem -1 (this is a one) output as log so I can have eyes here. A diff between t430 and t440p should tell us a lot here. So
Upload cbmem_stages_time.txt and cbmem_console.txt.
We need a baseline prior of tuning things, understand where time is spent at least, and why. @gaspar-ilom once 24.12 PR is merged, thus or should be rebased --signoff so that comment here are only relevant to NRI port effort. Nobody will look at all the changes coming from T480 to 24.12 bump to only check NRI related config changes. I'm out of time today but this is where the fun starts. One would have to explore the changes and kconfig dependencies and what can be done here. 12s at each boot means memory is most probably retrained at each boot. High level analysis of upstream patchwork stipulates that it should not be the case. This is where collaboration upstream starts. Either here or subsequent PR. But first, we need to understand config options related to NRI, but at first glance there is none outside of tweaking numerical values... But also Kconfig options says s3 suspend resume is not working..... While it is. I thing it got just merged to stop bitorotting and conflicting with other code base. Future of NRI start here (while be aware that memory training code is the most complicated part, and really hard to reverse. So amazing work here already.) My main concern here is to understand what happens when there is no bootsplash. Bootsplash shown on romstage. So hypothesis is that mrc cache is not reused. But that needs to be proven with logs. |
@tlaurion here is the requested txt file. |
You were a bit too quick, I added cbmem -1 but not in my steps... Logs would help, and config gives timestamps there as well, figurenwill be complete and I will be able to compare with x230 posting same logs. Sorry for the edit while you were doing that. See #1923 (comment) again. |
So, w541_usb_ge96f426.log has a raminit log where I don't see anything unusual (it's just four dual-rank, reference card F DDR3 SO-DIMMs getting trained). As there aren't any timestamps during raminit, I can't tell from the log file where the long boot times may come from. Even with logging, I would expect boot times to be less than 5 seconds. If you watch the EHCI debug console in real time, are the romstage messages getting printed unusually slowly? Messages should be going pretty fast (compare against ramstage logs), unless there's big pauses somewhere. If so, it would be great to know at which point in the log those big pauses occur (note down the messages right before each pause). |
ping @Th3Fanbus I do not think this is normal and seems to prevent MRC cache to be saved and reused no? |
@Th3Fanbus @gaspar-ilom : I find
Which show clearly where time was spent between each line printed on console. |
What exactly doesn't seem normal? I already said that The MRC cache itself seems to be populated properly, remember this is a first-time boot so the cache was empty. |
No. That's exactly the question here. If you look at the full oldconfig file in PR, romsize fits bios region. Unfortunately I cannot help more here than repeating that this seems to be the cause of issue, and that MRC training de esnt seem to stick and constantly be retrained, but post first boot logs are missing. @gaspar-ilom ping on piping the logs through | ts on host receiving the logs so we have timestamps for each line of cbmem output over serial. Nothing much else I can say here. |
@gaspar-ilom did you happen to replace the flash chips on this computer with bigger ones? I need to understand why the logs contain the error I quoted earlier. |
Will try to speed this up a little. Long delays and long context switches for things I'm not knowledgeable into are not easy for me here. So Last commit from which logs were extracted by @gaspar-ilom were 6dfe541 in follow up comment #1923 (comment) (even if commit id mismatches @gaspar-ilom )
Where my comment at #1923 (comment) pinpointed
I checked all the comments of this PR @Th3Fanbus and recall your comment on ME being at fault #1923 (comment) but cannot find your comment on the size difference error. Sorry if i'm still missing it. @gaspar-ilom uses w541 and roms built from CircleCI to test and report to this PR to make things reproducible for all board owners. Therefore, from that commit 6dfe541's coreboot's oldconfig file
My question still stands; where "SF size 0x2800000" is obtained from. @gaspar-ilom correct me if i'm wrong, but you use roms from this PR right and stock SPI |
@tlaurion Sorry, the above does not tell me anything I didn't know already. |
I just wanted to speed up between @gaspar-ilom answers, putting the hypothesis else where than
and restated I do not understand either that error happening twice in logs
|
Sorry @Th3Fanbus and @tlaurion this project got sidelined. I will provide what you asked for as soon as I find the time. In the meanwhile anyone who cares about these boards is free to contribute as documented here. |
I made https://review.coreboot.org/87830 which should print how long each NRI task took. I expect this to be pretty quick in spite of all the logging. |
Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…M_SETUP=y This is merged https://review.coreboot.org/c/coreboot/+/87830 to help troubleshoot linuxboot#1923 (comment) As of now, unknowns are: - if NRI can work with Heads neutered ME per https://github.com/linuxboot/heads/blob/master/blobs/t440p/download-clean-me and https://github.com/linuxboot/heads/blob/master/blobs/w541/download-clean-me - What is causing '[ERROR] SF size 0x2800000 does not correspond to CONFIG_ROM_SIZE 0xc00000!!' - NRI per-task spent time; this is what this commit should shed some light for both w541 and t440p Signed-off-by: Thierry Laurion <insurgo@riseup.net>
@gaspar-ilom I took liberty to push in your branch doing CircleCI should detect (cache layers checks) new TLDR: bcf27a5 contains merged master + @Th3Fanbus merged patch from https://review.coreboot.org/c/coreboot/+/87830 (hinted from #1923 (comment)) to hopefully move this faster. Once again, we cannot expect people coming to Heads and wanting to flash t440p/w541 to tolerate 30s (30s or 50s?) booting with preppy's MRC blob, while using NRI conclusions as of now (this PR is in DEBUG without full state and problems of it known) to be fully usable. https://review.coreboot.org/c/coreboot/+/87830 aims to provide logs in the form:
So that @gaspar-ilom's UART+RPI pico debug setup (see #1923 (comment)) can provide logs from this PR's w541 coreboot 24.12+ patches/coreboot-24.12/NRI patches+ w541 coreboot oldconfig+Neutered ME debug logs with Note once more as under commit log bcf27a5 unknowns:
10-4 (bitrotting PR is no fun. merging master+adapt under ce63b85 took around 1h with changes + local build test. Not sure I will rebase/merge master in the future if this PR doesn't progress soon to a merging state myself). |
@gaspar-ilom : master's patch+revert patches+reapply logic was improved in the past months and was merged back in your PR. So doing
Should suffice producing local reproducible rom to CI to test and produce debug logs for bcf27a5 (While CI is producing clean ROM while I wrote those lines) |
Unfortunately I'm not sure we have anyone else under Heads currently willing to reproduce coreboot debug dongle you done under #1923 (comment). Hopefully others will come but as I said under #1923 (comment), I unfortunately think this is my last attempt to help current technical board owners to get NRI under Heads for w541/t440p. |
@Th3Fanbus I have not replaced the flash chips and neither did anyone else afaict. The chips definitely have the original size 4 and 8MB. |
Thanks @tlaurion your work on this PR. Here are the usb logs I captured on a W541 with the circleci build for bcf27a5:
@Th3Fanbus I hope this helps. It includes the changes https://review.coreboot.org/c/coreboot/+/87830 and thus hopefully improved logs. I do not have the time (and knowledge) to analyze the logs, but I hope this helps anyway. Let me know if I missed anything. |
To speed up things once more:
excerpt:
So as can be seen above, memory training happens at each boot, MRC cache is not reused (I pointed above it didn't seem to be saved, but that is for @Th3Fanbus to say what is the truth here, I cannot help much more than what I've done before)
@Th3Fanbus this log will help as if you were in front of computer with ts provided timestamp. Known
@gaspar-ilom thanks for your time and sharing the logs. I think logs with ts output from w541_subsequent_boot_bcf27a5_ts.log should tell us more, hopefully. |
Hey, Team! I'd like to offer some testing efforts from my side, but my experience in coreboot debugging is limited, so probably sort of step-by-step guidance will be needed. Hardware: T440p
Previously I was on Coreboot 25.03 (Edk2) with NRI enabled.
Yesterday I've updated to CB 25.06 (Edk2) with NRI enabled.
So, if I may be helpful to the project - could someone guide me how to collect debugging info? |
With an FT232H dongle, scroll up I believe @gaspar-ilom posted a little guide. |
@crazyfox-ua I think you need an FT232H as pointed out by @Th3Fanbus here. I have posted a guide in this comment |
Thanks @gaspar-ilom, @MattClifton76 for assistance!
Not sure which part may be useful, so adding complete log. |
@crazyfox-ua Hmmm, interesting that NRI would fail like that. I've checked the logs and the only thing I've noticed are some differences in the memory map, but NRI's memory map should be more optimal (it requires using less MTRRs to configure cacheability of the various memory ranges). I've noticed the NRI log did not have |
@Th3Fanbus here is log with debug enabled: coreboot_debug_nri_2x8.log. |
@crazyfox-ua Thank you. I'm stumped: I don't see anything wrong in the log (wrong enough to cause problems with the payload and/or OS). I wonder if the issues also happen with a single 8 GiB or 16 GiB SO-DIMM (even though I wouldn't expect dual-channel operation to be the culprit). |
@Th3Fanbus thanks for confirming, for me that behavior also looks strange. For now, it's failing to boot in any of following combinations: 8+8, 16+16, 16, 8, 2+16, 8+16. Guessing, that it isn't directly related to RAM init itself, but somewhat affecting Shared Video RAM and causing those artifacts and boot freezes. Is there anything else I may collect to debug such behavior? |
@Th3Fanbus gentle ping |
Sorry, I haven't been able to make any progress on this. |
T440p/w541 only :
Untested: only for external flashers.-> 10s spent in romstage, lot of debug info: truncates cbmem up to cbmem being 1mb big stillQuickly hacked together. Probably I still missed something, but I will let you have a look.
Takes upstream NRI patch train from https://review.coreboot.org/c/coreboot/+/64186/9 and changes Heads coreboot configs for t440p/w541 to test results on top of coreboot 24.12
@tlaurion See also #1825 (comment) and #1711 (comment)
TODOs:
Before merge:
So t440p/w541 board owners, it would be a time to compare before/after this PR roms:
Board owners:
AFTER MERGE:
EDIT: 4cb6985 should probably be merged into #1908 as a cleanup-> done