-
Notifications
You must be signed in to change notification settings - Fork 625
[dv] Fix how we catch iside_err in cosim #1909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In the case of a load/store access fault trap, rvfi_order number stays same but pc_id changes. And if the older instruction is faulty we still end up pushing it to the queue. This commit fixes that by checking for a matching rvfi_order number first. Signed-off-by: Canberk Topal <ctopal@lowrisc.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, riscv_mem_error_test
passes for 14/15 seeds (starting at seed 16776
). The one failure is due to an illegal cross bin:
riscv_mem_error_test.16781
--------------------------
binary: test.bin
rtl_log: rtl_sim.log
rtl_trace: trace_core_00000000.log
iss_cosim_trace: spike_cosim_trace_core_00000000.log
[FAILED]: error seen in 'rtl_sim.log'
---------------*LOG-EXTRACT*----------------
137: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1247380: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
138: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1315760: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
139: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1404020: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
140: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1440280: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
141: xmsim: *SE,EILLCT: (File: ./fcov/core_ibex_fcov_if.sv, Line: 691):(Time: 1440940 NS + 4) Illegal cross tuple (<0, 1, InstrCategoryFetchError, IdStallTypeMem, 0, 0, 0>) occurred corresponding to illegal cross bin (mem_stall_illegal) of cross (core_ibex_tb_top.dut.u_ibex_top.u_ibex_core.u_fcov_bind.uarch_cg_inst@11436_628.exception_stall_instr_cross).
[E] 142: xmsim: *SE,EILLCT: (File: ./fcov/core_ibex_fcov_if.sv, Line: 691):(Time: 1440980 NS + 4) Illegal cross tuple (<0, 1, InstrCategoryFetchError, IdStallTypeMem, 0, 0, 0>) occurred corresponding to illegal cross bin (mem_stall_illegal) of cross (core_ibex_tb_top.dut.u_ibex_top.gen_lockstep.u_ibex_lockstep.u_shadow_core.u_fcov_bind.uarch_cg_inst@11451_1.exception_stall_instr_cross).
143: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1480360: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
144: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1569100: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
145: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1639980: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
146: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1717520: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
--------------------------------------------
Summary of the full regression:
Test | Passing | Failed | Timed-out | Total | Pass Rate |
---|---|---|---|---|---|
riscv_arithmetic_basic_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_assorted_traps_interrupts_debug_test | 3 | 7 | 0 | 10 | 30.00 % |
riscv_bitmanip_balanced_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_bitmanip_otearlgrey_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_csr_test | 5 | 0 | 0 | 5 | 100.00 % |
riscv_debug_basic_test | 8 | 2 | 0 | 10 | 80.00 % |
riscv_debug_branch_jump_test | 9 | 0 | 1 | 10 | 90.00 % |
riscv_debug_csr_entry_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_debug_ebreak_test | 14 | 0 | 1 | 15 | 93.33 % |
riscv_debug_ebreakmu_test | 13 | 1 | 1 | 15 | 86.67 % |
riscv_debug_in_irq_test | 9 | 0 | 1 | 10 | 90.00 % |
riscv_debug_instr_test | 24 | 0 | 1 | 25 | 96.00 % |
riscv_debug_single_step_test | 14 | 1 | 0 | 15 | 93.33 % |
riscv_debug_stress_test | 15 | 0 | 0 | 15 | 100.00 % |
riscv_debug_triggers_test | 4 | 0 | 1 | 5 | 80.00 % |
riscv_debug_wfi_test | 8 | 1 | 1 | 10 | 80.00 % |
riscv_dret_test | 4 | 1 | 0 | 5 | 80.00 % |
riscv_ebreak_test | 9 | 1 | 0 | 10 | 90.00 % |
riscv_epmp_mml_execute_only_test | 20 | 0 | 0 | 20 | 100.00 % |
riscv_epmp_mml_read_only_test | 20 | 0 | 0 | 20 | 100.00 % |
riscv_epmp_mml_test | 0 | 0 | 20 | 20 | 0.00 % |
riscv_epmp_mmwp_test | 0 | 0 | 20 | 20 | 0.00 % |
riscv_epmp_rlb_test | 19 | 0 | 1 | 20 | 95.00 % |
riscv_hint_instr_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_icache_intg_test | 15 | 0 | 0 | 15 | 100.00 % |
riscv_illegal_instr_test | 14 | 1 | 0 | 15 | 93.33 % |
riscv_interrupt_csr_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_interrupt_instr_test | 25 | 0 | 0 | 25 | 100.00 % |
riscv_interrupt_wfi_test | 15 | 0 | 0 | 15 | 100.00 % |
riscv_invalid_csr_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_irq_in_debug_mode_test | 9 | 0 | 1 | 10 | 90.00 % |
riscv_jump_stress_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_loop_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_machine_mode_rand_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_mem_error_test | 14 | 1 | 0 | 15 | 93.33 % |
riscv_mem_intg_error_test | 13 | 2 | 0 | 15 | 86.67 % |
riscv_mmu_stress_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_multiple_interrupt_test | 9 | 0 | 1 | 10 | 90.00 % |
riscv_nested_interrupt_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_pc_intg_test | 15 | 0 | 0 | 15 | 100.00 % |
riscv_pmp_basic_test | 49 | 0 | 1 | 50 | 98.00 % |
riscv_pmp_disable_all_regions_test | 50 | 0 | 0 | 50 | 100.00 % |
riscv_pmp_full_random_test | 283 | 2 | 315 | 600 | 47.17 % |
riscv_pmp_out_of_bounds_test | 48 | 0 | 2 | 50 | 96.00 % |
riscv_rand_instr_test | 10 | 0 | 0 | 10 | 100.00 % |
riscv_rand_jump_test | 8 | 0 | 2 | 10 | 80.00 % |
riscv_reset_test | 15 | 0 | 0 | 15 | 100.00 % |
riscv_rf_intg_test | 15 | 0 | 0 | 15 | 100.00 % |
riscv_single_interrupt_test | 14 | 0 | 1 | 15 | 93.33 % |
riscv_umode_tw_test | 9 | 1 | 0 | 10 | 90.00 % |
riscv_unaligned_load_store_test | 5 | 0 | 0 | 5 | 100.00 % |
riscv_user_mode_rand_test | 10 | 0 | 0 | 10 | 100.00 % |
TOTAL | 963 | 21 | 371 | 1355 | 71.07 % |
This is a follow-up on #1895 to improve the pass rate of |
// Determine if we were here before. If so, that means this rvfi_order number was actually used | ||
// twice (a load/store fault caused jumping to the handler). Delete the old iside_error_queue | ||
// element. | ||
if (latest_order == instr_vif.instr_cb.rvfi_order_id) begin | ||
iside_error_queue.pop_back(); | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a bit of thought, I am not sure if this would work considering we actually don't check for failed_iside_accesses.exists(aligned_addr) && !iside_pmp_failure.exists(aligned_addr)
in here. Concern being, we might end up deleting a valid iside_error
. I guess the solution would be to check for the condition again before calculating the new aligned_addr
, WDYT @GregAC ? Regression numbers show no such problem but just wanted to be sure before merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this @ctopal it's a tricky scenario!
After some thought I think I prefer a different fix, your fix uses the latest_order
matching the just seen instruction order to detect the scenario and whilst this currently seems effective (i.e. there aren't other places where this is the case) I worry it'll generated other bugs down the line.
My fix is here: #1922 it aims to detect the specific scenario and remove the flushed errors from the iside error queue.
Should we close this in favor of #1922? |
Yes I believe #1922 is preferred as it's more of a general solution. |
In the case of a load/store access fault trap, rvfi_order number stays same but pc_id changes. And if the older instruction is faulty we still end up pushing it to the queue. This commit fixes that by checking for a matching rvfi_order number first.
Signed-off-by: Canberk Topal ctopal@lowrisc.org