Skip to content

[dv] Fix how we catch iside_err in cosim #1909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ctopal
Copy link
Contributor

@ctopal ctopal commented Nov 7, 2022

In the case of a load/store access fault trap, rvfi_order number stays same but pc_id changes. And if the older instruction is faulty we still end up pushing it to the queue. This commit fixes that by checking for a matching rvfi_order number first.

Signed-off-by: Canberk Topal ctopal@lowrisc.org

In the case of a load/store access fault trap, rvfi_order number
stays same but pc_id changes. And if the older instruction is faulty
we still end up pushing it to the queue. This commit fixes that by
checking for a matching rvfi_order number first.

Signed-off-by: Canberk Topal <ctopal@lowrisc.org>
@ctopal ctopal added Type:Enhancement Feature requests, enhancements Component:DV Design verification (DV) or testing issue labels Nov 7, 2022
@ctopal ctopal requested review from GregAC and andreaskurth November 7, 2022 19:23
Copy link
Contributor

@andreaskurth andreaskurth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, riscv_mem_error_test passes for 14/15 seeds (starting at seed 16776). The one failure is due to an illegal cross bin:

riscv_mem_error_test.16781
--------------------------
binary:          test.bin
rtl_log:         rtl_sim.log
rtl_trace:       trace_core_00000000.log
iss_cosim_trace: spike_cosim_trace_core_00000000.log

[FAILED]: error seen in 'rtl_sim.log'
---------------*LOG-EXTRACT*----------------
    137: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1247380: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
    138: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1315760: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
    139: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1404020: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
    140: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1440280: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
    141: xmsim: *SE,EILLCT: (File: ./fcov/core_ibex_fcov_if.sv, Line: 691):(Time: 1440940 NS + 4) Illegal cross tuple (<0, 1, InstrCategoryFetchError, IdStallTypeMem, 0, 0, 0>) occurred corresponding to illegal cross bin (mem_stall_illegal) of cross (core_ibex_tb_top.dut.u_ibex_top.u_ibex_core.u_fcov_bind.uarch_cg_inst@11436_628.exception_stall_instr_cross).
[E] 142: xmsim: *SE,EILLCT: (File: ./fcov/core_ibex_fcov_if.sv, Line: 691):(Time: 1440980 NS + 4) Illegal cross tuple (<0, 1, InstrCategoryFetchError, IdStallTypeMem, 0, 0, 0>) occurred corresponding to illegal cross bin (mem_stall_illegal) of cross (core_ibex_tb_top.dut.u_ibex_top.gen_lockstep.u_ibex_lockstep.u_shadow_core.u_fcov_bind.uarch_cg_inst@11451_1.exception_stall_instr_cross).
    143: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1480360: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
    144: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1569100: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
    145: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1639980: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
    146: UVM_INFO /home/dev/src/dv/uvm/core_ibex/tb/core_ibex_tb_top.sv(163) @ 1717520: reporter [core_ibex_tb_top.unmblk1] Enabling assertions: core_ibex_tb_top.NoAlertsTriggered
--------------------------------------------

Summary of the full regression:

Test Passing Failed Timed-out Total Pass Rate
riscv_arithmetic_basic_test 10 0 0 10 100.00 %
riscv_assorted_traps_interrupts_debug_test 3 7 0 10 30.00 %
riscv_bitmanip_balanced_test 10 0 0 10 100.00 %
riscv_bitmanip_otearlgrey_test 10 0 0 10 100.00 %
riscv_csr_test 5 0 0 5 100.00 %
riscv_debug_basic_test 8 2 0 10 80.00 %
riscv_debug_branch_jump_test 9 0 1 10 90.00 %
riscv_debug_csr_entry_test 10 0 0 10 100.00 %
riscv_debug_ebreak_test 14 0 1 15 93.33 %
riscv_debug_ebreakmu_test 13 1 1 15 86.67 %
riscv_debug_in_irq_test 9 0 1 10 90.00 %
riscv_debug_instr_test 24 0 1 25 96.00 %
riscv_debug_single_step_test 14 1 0 15 93.33 %
riscv_debug_stress_test 15 0 0 15 100.00 %
riscv_debug_triggers_test 4 0 1 5 80.00 %
riscv_debug_wfi_test 8 1 1 10 80.00 %
riscv_dret_test 4 1 0 5 80.00 %
riscv_ebreak_test 9 1 0 10 90.00 %
riscv_epmp_mml_execute_only_test 20 0 0 20 100.00 %
riscv_epmp_mml_read_only_test 20 0 0 20 100.00 %
riscv_epmp_mml_test 0 0 20 20 0.00 %
riscv_epmp_mmwp_test 0 0 20 20 0.00 %
riscv_epmp_rlb_test 19 0 1 20 95.00 %
riscv_hint_instr_test 10 0 0 10 100.00 %
riscv_icache_intg_test 15 0 0 15 100.00 %
riscv_illegal_instr_test 14 1 0 15 93.33 %
riscv_interrupt_csr_test 10 0 0 10 100.00 %
riscv_interrupt_instr_test 25 0 0 25 100.00 %
riscv_interrupt_wfi_test 15 0 0 15 100.00 %
riscv_invalid_csr_test 10 0 0 10 100.00 %
riscv_irq_in_debug_mode_test 9 0 1 10 90.00 %
riscv_jump_stress_test 10 0 0 10 100.00 %
riscv_loop_test 10 0 0 10 100.00 %
riscv_machine_mode_rand_test 10 0 0 10 100.00 %
riscv_mem_error_test 14 1 0 15 93.33 %
riscv_mem_intg_error_test 13 2 0 15 86.67 %
riscv_mmu_stress_test 10 0 0 10 100.00 %
riscv_multiple_interrupt_test 9 0 1 10 90.00 %
riscv_nested_interrupt_test 10 0 0 10 100.00 %
riscv_pc_intg_test 15 0 0 15 100.00 %
riscv_pmp_basic_test 49 0 1 50 98.00 %
riscv_pmp_disable_all_regions_test 50 0 0 50 100.00 %
riscv_pmp_full_random_test 283 2 315 600 47.17 %
riscv_pmp_out_of_bounds_test 48 0 2 50 96.00 %
riscv_rand_instr_test 10 0 0 10 100.00 %
riscv_rand_jump_test 8 0 2 10 80.00 %
riscv_reset_test 15 0 0 15 100.00 %
riscv_rf_intg_test 15 0 0 15 100.00 %
riscv_single_interrupt_test 14 0 1 15 93.33 %
riscv_umode_tw_test 9 1 0 10 90.00 %
riscv_unaligned_load_store_test 5 0 0 5 100.00 %
riscv_user_mode_rand_test 10 0 0 10 100.00 %
TOTAL 963 21 371 1355 71.07 %

@andreaskurth
Copy link
Contributor

andreaskurth commented Nov 8, 2022

This is a follow-up on #1895 to improve the pass rate of riscv_mem_error_test.

Comment on lines +244 to +249
// Determine if we were here before. If so, that means this rvfi_order number was actually used
// twice (a load/store fault caused jumping to the handler). Delete the old iside_error_queue
// element.
if (latest_order == instr_vif.instr_cb.rvfi_order_id) begin
iside_error_queue.pop_back();
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit of thought, I am not sure if this would work considering we actually don't check for failed_iside_accesses.exists(aligned_addr) && !iside_pmp_failure.exists(aligned_addr) in here. Concern being, we might end up deleting a valid iside_error. I guess the solution would be to check for the condition again before calculating the new aligned_addr, WDYT @GregAC ? Regression numbers show no such problem but just wanted to be sure before merging this.

Copy link
Contributor

@GregAC GregAC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this @ctopal it's a tricky scenario!

After some thought I think I prefer a different fix, your fix uses the latest_order matching the just seen instruction order to detect the scenario and whilst this currently seems effective (i.e. there aren't other places where this is the case) I worry it'll generated other bugs down the line.

My fix is here: #1922 it aims to detect the specific scenario and remove the flushed errors from the iside error queue.

@andreaskurth
Copy link
Contributor

Should we close this in favor of #1922?

@GregAC
Copy link
Contributor

GregAC commented Nov 14, 2022

Should we close this in favor of #1922?

That's the idea but I wanted to have #1922 reviewed first so we can agree (or not) that's the preferred solution.

@ctopal
Copy link
Contributor Author

ctopal commented Nov 14, 2022

Yes I believe #1922 is preferred as it's more of a general solution.

@ctopal ctopal closed this Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:DV Design verification (DV) or testing issue Type:Enhancement Feature requests, enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants