-
Notifications
You must be signed in to change notification settings - Fork 5
RISC-V Vector Permutation & Narrow Instructions Implementation #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughModularizes VecALU into Arith/Comparison/Narrow/Permutation/FixedRoundMode, refactors VecALU IO (adds rs1_in, lmul, func3; replaces vl), updates pipeline wiring and VecRegFile (de_instr + LMUL-read adjustment), expands VaquitaConfig defaults, simplifies WBStage signature, cleans ForwardingUnit and VaquitaTop, and adds Decode-stage VecIO bundle. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant IF as IFetch
participant DE as DecodeStage
participant VRF as VecRegFile
participant EX as ExcuteStage
participant VALU as VecALU
participant A as Arith
participant C as Comparison
participant N as Narrow
participant P as Permutation
participant WB as WBStage
IF->>DE: instr
DE->>VRF: de_instr
DE->>EX: decoded controls, operands
EX->>VALU: vs1/vs2/vs3, rs1_in, lmul, func3, alu_opcode, sew, vl_in
note right of VALU: classify op → arith / comp / narrow / perm
VALU->>A: arith path (when selected)
VALU->>C: comparison path (when selected)
VALU->>N: narrow path (when selected)
VALU->>P: permutation path (when selected)
A-->>VALU: xlen result
C-->>VALU: comparison result bits
N-->>VALU: narrowed result
P-->>VALU: permuted result
VALU-->>EX: vsd_out
EX-->>WB: writeback
WB-->>VRF: VRF write (as applicable)
sequenceDiagram
autonumber
participant DE as DecodeStage
participant VRF as VecRegFile
DE->>VRF: de_instr
VRF->>VRF: isNarrowOp = instr(31,26) in 0x44..0x47
VRF->>VRF: lmul_read = Mux(isNarrowOp, min(io.lmul+1, max), io.lmul)
VRF-->>DE: reads use lmul_read
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. ✨ Finishing touches🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/scala/vaquita/components/VecRegFile.scala (1)
28-33
: Address-width vs reg_count mismatch can alias registers when reg_count < 32.vrf has length reg_count (default now 8), but addresses are 5-bit; dynamic Vec indexing truncates high bits (mod reg_count). Add an assertion until reg_count is restored to 32 or IO addr width is parameterized.
val vrf = RegInit(VecInit(Seq.fill(config.reg_count){VecInit(Seq.fill(config.count_lanes) {0.S(config.XLEN.W)})})) dontTouch(vrf) + // Guard against silent aliasing when reg_count < 32 + assert(io.vs1_addr < config.reg_count.U && io.vs2_addr < config.reg_count.U && + io.vd_addr < config.reg_count.U && io.wb_vd_addr < config.reg_count.U, + "VRF address >= reg_count will wrap/truncate (reg_count = %d)".format(config.reg_count))
🧹 Nitpick comments (11)
src/main/scala/vaquita/pipeline/MemStage.scala (1)
27-28
: Remove unused initValue_vs3 (64-bit) to avoid confusion with XLEN paths.initValue_vs3 is never used, while vs3_data uses initValue (XLEN width). Drop the dead 64-bit init to prevent accidental width mismatches later.
- val initValue_vs3 = VecInit(Seq.fill(8)(VecInit(Seq.fill(config.count_lanes)(0.S(64.W))))) - val vs3_data = RegNext(WireDefault(initValue)) + // vs3 uses XLEN width; drop unused 64b init to avoid confusion + val vs3_data = RegNext(WireDefault(initValue))src/main/scala/vaquita/pipeline/WBStage.scala (1)
21-27
: Avoid shadowing io.mem_to_reg.Local val mem_to_reg shadows IO; rename for clarity.
- val mem_to_reg = RegNext(io.mem_to_reg) + val mem_to_reg_r = RegNext(io.mem_to_reg) @@ - when(mem_to_reg===0.B){ + when(mem_to_reg_r===0.B){src/main/scala/vaquita/components/VecRegFile.scala (1)
30-31
: Avoid reusing name vl (conflicts with io.vl).Rename the local constant to prevent confusion.
- val vl=3.U + val vlConst = 3.Usrc/main/scala/vaquita/pipeline/ExcuteStage.scala (1)
47-49
: Precompute opcode/type predicate once to avoid N× replication.Minor optimization/readability: hoist the repeated predicate out of the inner loops.
- for (i <- 0 to 7) { // for grouping = 8 + val isVX = io.ex_instr_out(6,0) === "b1010111".U && io.ex_instr_out(14,12) === "b100".U + for (i <- 0 to 7) { // for grouping = 8 for (j <- 0 until (config.count_lanes)) { - vec_alu_module.io.vs1_in(i)(j) := Mux(io.ex_instr_out(6,0)==="b1010111".U && io.ex_instr_out(14,12)==="b100".U,sew_selector.sew_selector_with_element(next_sew,io.hazard_rs1.asSInt),io.ex_vs1_data_in(i)(j)) + vec_alu_module.io.vs1_in(i)(j) := Mux(isVX, sew_selector.sew_selector_with_element(next_sew, io.hazard_rs1.asSInt), io.ex_vs1_data_in(i)(j)) } }src/main/scala/vaquita/components/ALUClasses/Arith.scala (1)
12-14
: Width-safe saturate bounds.Make max/min constants explicit-width to the current sew to avoid width/sign surprises.
- val maxValue = (1.S << (sew - 1)) - 1.S - val minValue = -(1.S << (sew - 1)) + val maxValue = ((BigInt(1) << (sew - 1)) - 1).S(sew.W) + val minValue = (-(BigInt(1) << (sew - 1))).S(sew.W)src/main/scala/vaquita/components/ALUClasses/Permutation.scala (2)
87-95
: Clean up unused localselem_idx_wire and result_val are declared but never used. Please remove to avoid dead logic.
- val elem_idx_wire = Wire(UInt(33.W)) - val result_val = WireInit(0.U(32.W)) - ... - elem_idx_wire := elem_idx.U + // removed unused locals
31-41
: LMUL bound construction: clarify/guard LMUL rangeswitch(lmul) only handles 0–3. If upstream ever drives a wider encoded LMUL, lmul_valid_cat remains 1.U. Add a default is(...) or assert for safety.
switch(lmul) { is(0.U) { lmul_valid_cat := sew_lanes } is(1.U) { lmul_valid_cat := Cat(sew_lanes, 0.U(1.W)) } is(2.U) { lmul_valid_cat := Cat(sew_lanes, 0.U(2.W)) } is(3.U) { lmul_valid_cat := Cat(sew_lanes, 0.U(3.W)) } + // default: keep lmul_valid_cat unchanged but flag + // (Optionally) assert(lmul <= 3.U, "Unsupported LMUL") }src/main/scala/vaquita/components/ALUClasses/Narrow.scala (2)
13-19
: vxrm is hard-wired to 0; verify rounding mode integrationRounding mode should come from vcsr.vxrm. If not plumbed yet, please add a TODO and ensure tests cover the intended default.
24-25
: Shift amount source for vnsrl/vnsraCurrently taken from vs1_in bits; if vx/vi forms are supported, ensure upstream maps rs1/imm into vs1_in consistently. Otherwise, consider using rs1_in where appropriate.
src/main/scala/vaquita/components/VecALU.scala (2)
26-33
: rs1_imm_value selection is ad-hoc; clarify func3 mapping and default
- func3==="b011" path pulls immediate from io.vs1_in(0)(0)(4,0), not from the instruction.
- Default path uses io.vs1_in(0)(0) instead of rs1_in.
Please document the intended encoding and ensure ExecuteStage feeds these consistently. Suggest using rs1_in for vx/vi forms and reserving vs1_in only for vv.
55-74
: dontTouch on vs1_8/vs2_8/vs3_8 helps debug, but consider removing laterKeep for bring-up; remove to allow DCE once verified.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (15)
src/main/scala/vaquita/VaquitaTop.scala
(0 hunks)src/main/scala/vaquita/components/ALUClasses/Arith.scala
(1 hunks)src/main/scala/vaquita/components/ALUClasses/Comparison.scala
(1 hunks)src/main/scala/vaquita/components/ALUClasses/FixedRoundMode.scala
(1 hunks)src/main/scala/vaquita/components/ALUClasses/Narrow.scala
(1 hunks)src/main/scala/vaquita/components/ALUClasses/Permutation.scala
(1 hunks)src/main/scala/vaquita/components/ALUObject.scala
(1 hunks)src/main/scala/vaquita/components/ForwardingUnit.scala
(0 hunks)src/main/scala/vaquita/components/VecALU.scala
(1 hunks)src/main/scala/vaquita/components/VecRegFile.scala
(3 hunks)src/main/scala/vaquita/configparameter/VaquitaConfig.scala
(1 hunks)src/main/scala/vaquita/pipeline/DecodeStage.scala
(1 hunks)src/main/scala/vaquita/pipeline/ExcuteStage.scala
(1 hunks)src/main/scala/vaquita/pipeline/MemStage.scala
(1 hunks)src/main/scala/vaquita/pipeline/WBStage.scala
(1 hunks)
💤 Files with no reviewable changes (2)
- src/main/scala/vaquita/VaquitaTop.scala
- src/main/scala/vaquita/components/ForwardingUnit.scala
🧰 Additional context used
🧬 Code graph analysis (5)
src/main/scala/vaquita/components/ALUClasses/Permutation.scala (2)
src/main/scala/vaquita/components/ALUObject.scala (1)
ALUObj
(6-61)src/main/scala/vaquita/configparameter/VaquitaConfig.scala (1)
VaquitaConfig
(5-12)
src/main/scala/vaquita/components/ALUClasses/Comparison.scala (2)
src/main/scala/vaquita/components/ALUObject.scala (1)
ALUObj
(6-61)src/main/scala/vaquita/configparameter/VaquitaConfig.scala (1)
VaquitaConfig
(5-12)
src/main/scala/vaquita/components/ALUClasses/Arith.scala (2)
src/main/scala/vaquita/components/ALUObject.scala (1)
ALUObj
(6-61)src/main/scala/vaquita/configparameter/VaquitaConfig.scala (1)
VaquitaConfig
(5-12)
src/main/scala/vaquita/components/ALUClasses/Narrow.scala (3)
src/main/scala/vaquita/components/ALUObject.scala (1)
ALUObj
(6-61)src/main/scala/vaquita/configparameter/VaquitaConfig.scala (1)
VaquitaConfig
(5-12)src/main/scala/vaquita/components/ALUClasses/FixedRoundMode.scala (1)
fixed_round_mode
(9-40)
src/main/scala/vaquita/components/VecALU.scala (4)
src/main/scala/vaquita/components/ALUClasses/Arith.scala (3)
Arith
(7-119)arith_8
(77-92)arith_8_result
(93-118)src/main/scala/vaquita/components/ALUClasses/Permutation.scala (1)
Permutation
(6-109)src/main/scala/vaquita/components/ALUClasses/Comparison.scala (2)
Comparison
(7-71)main_comp
(23-70)src/main/scala/vaquita/components/ALUClasses/Narrow.scala (2)
NarrowIns
(7-82)narrow_ins
(9-31)
🔇 Additional comments (10)
src/main/scala/vaquita/pipeline/MemStage.scala (1)
32-32
: No functional change — OK.Removal of the trailing comment is fine.
src/main/scala/vaquita/components/ALUObject.scala (2)
50-58
: Comment-only placeholders — OK.Placeholders are fine and don’t affect decode.
18-20
: Confirm intentional opcode aliasing (vslideup and vrgatherei16 both = 14.U).If these are disambiguated elsewhere (e.g., func fields), all good; otherwise this collides in decode.
src/main/scala/vaquita/pipeline/WBStage.scala (1)
7-7
: Constructor simplification — OK.No behavioral change; consistent with defaulted VaquitaConfig.
src/main/scala/vaquita/components/VecRegFile.scala (1)
26-26
: New de_instr port — OK.Port addition aligns with downstream LMUL read adjustments.
src/main/scala/vaquita/pipeline/DecodeStage.scala (1)
59-60
: Wiring de_instr into VRF looks right; please double-check timing/width.Confirm VecRegFile.io.de_instr is exactly 32 bits and used read-only inside VRF (no combinational path back to instr). If VRF decodes this combinationally, consider registering it at the VRF boundary if timing becomes tight.
src/main/scala/vaquita/pipeline/ExcuteStage.scala (1)
40-43
: New VecALU ports wiring: check stage alignment for func3/rs1/lmul.func3 is RegNext of ex_instr_in and you compare ex_instr_out (also RegNext), so they align. rs1_in and lmul are unregistered; verify they are in the correct cycle relative to ex_instr_out in VecALU. If not, register them similarly.
src/main/scala/vaquita/components/ALUClasses/Narrow.scala (1)
76-81
: Zeroing rows 4–7: confirm architectural intentnarrow_result forces rows 4–7 to zero. That matches “narrowing halves the number of elements per group,” but verify this is correct across LMUL>1 and writeback packing.
src/main/scala/vaquita/components/VecALU.scala (2)
122-132
: Operation-path gating seems finenarrow/slide/comp/reduction/arith classification by alu_opcode looks coherent with ALUObj.
214-229
: Enable slide_valid path for sew=32
The.elsewhen(slide_valid)
branch underio.sew==="b010"
is commented out, so vslideup/vslidedown/vrgather won’t run for 32-bit. Re-enable that branch withval Permutation_8 = new Permutation()(config)
and passrs1_imm_value
,io.func3
, andio.lmul
through (and import/definePermutation
if missing).
src/main/scala/vaquita/components/ALUClasses/FixedRoundMode.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (3)
src/main/scala/vaquita/configparameter/VaquitaConfig.scala (1)
6-11
: Derive lane counts from vlen/XLEN and validate; avoid stale hardcoded defaults.Hardcoding
count_lanes=8
,lane8=32
,lane16=16
couples config tovlen=256
. Derive these fromvlen
/XLEN
and addrequire
s to catch misconfigs at elaboration.case class VaquitaConfig( - vlen: Int = 256, - reg_count: Int = 32, - XLEN: Int = 32, - count_lanes: Int = 8 ,// vlen/32=8 - lane8: Int = 32,//vlen/8=32 - lane16: Int = 16,//vlen/16=16 -) + vlen: Int = 256, + reg_count: Int = 32, // 5-bit VRF addrs + XLEN: Int = 32, + count_lanes: Int= vlen / XLEN, // derive: 256/32=8 + lane8: Int = vlen / 8, // derive: 256/8=32 + lane16: Int = vlen / 16 // derive: 256/16=16 +) { + require(vlen % XLEN == 0, s"vlen ($vlen) must be divisible by XLEN ($XLEN)") + require(vlen % 8 == 0 && vlen % 16 == 0, s"vlen ($vlen) must be divisible by 8 and 16") +}src/main/scala/vaquita/components/ALUClasses/Comparison.scala (1)
64-68
: Remove placeholder; preserve pass-through for untouched lanes.- result_val(i)(j) := "hdeadbeef".U //vs3(i)(j).asUInt + result_val(i)(j) := vs3(i)(j)src/main/scala/vaquita/components/VecALU.scala (1)
36-36
: Off-by-one slice fixed—LGTM.
vs0_mask
now correctly selects(config.vlen-1, 0)
. Thanks for addressing this.
🧹 Nitpick comments (2)
src/main/scala/vaquita/components/ALUClasses/VecIO.scala (1)
6-12
: Prevent IO-shape drift by centralizing the vector lane type.This duplicates the same 8×count_lanes×XLEN shape seen elsewhere. Introduce a shared type alias/Bundle (e.g., VecPorts[T]) to avoid divergence when config changes.
src/main/scala/vaquita/components/VecALU.scala (1)
135-136
: Remove unused locals.
val vl = 4
andtail
are unused here; drop to reduce noise.- val vl= 4 - val tail = 0.B + // removed unused locals
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/main/scala/vaquita/components/ALUClasses/Arith.scala
(1 hunks)src/main/scala/vaquita/components/ALUClasses/Comparison.scala
(1 hunks)src/main/scala/vaquita/components/ALUClasses/FixedRoundMode.scala
(1 hunks)src/main/scala/vaquita/components/ALUClasses/Narrow.scala
(1 hunks)src/main/scala/vaquita/components/ALUClasses/Permutation.scala
(1 hunks)src/main/scala/vaquita/components/ALUClasses/VecIO.scala
(1 hunks)src/main/scala/vaquita/components/VecALU.scala
(1 hunks)src/main/scala/vaquita/components/VecRegFile.scala
(3 hunks)src/main/scala/vaquita/configparameter/VaquitaConfig.scala
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- src/main/scala/vaquita/components/ALUClasses/Arith.scala
- src/main/scala/vaquita/components/ALUClasses/Permutation.scala
- src/main/scala/vaquita/components/ALUClasses/Narrow.scala
- src/main/scala/vaquita/components/VecRegFile.scala
🧰 Additional context used
🧬 Code graph analysis (3)
src/main/scala/vaquita/components/ALUClasses/VecIO.scala (1)
src/main/scala/vaquita/configparameter/VaquitaConfig.scala (1)
VaquitaConfig
(5-12)
src/main/scala/vaquita/components/ALUClasses/Comparison.scala (2)
src/main/scala/vaquita/components/ALUObject.scala (1)
ALUObj
(6-61)src/main/scala/vaquita/configparameter/VaquitaConfig.scala (1)
VaquitaConfig
(5-12)
src/main/scala/vaquita/components/VecALU.scala (4)
src/main/scala/vaquita/components/ALUClasses/Arith.scala (3)
Arith
(7-120)arith_8
(78-93)arith_8_result
(94-119)src/main/scala/vaquita/components/ALUClasses/Permutation.scala (2)
Permutation
(6-109)permutation
(44-108)src/main/scala/vaquita/components/ALUClasses/Comparison.scala (2)
Comparison
(7-71)main_comp
(23-70)src/main/scala/vaquita/components/ALUClasses/Narrow.scala (3)
NarrowIns
(7-85)narrow_ins
(9-32)narrow_result
(47-84)
Vector Permutation & Narrow Instructions (RISC-V Vector Extension)
Implemented vslideup, vslidedown, and vrgather instructions for different SEW (8,16,32) and LMUL (1,2,4,8).
Added support for vector narrowing instructions: vnsra, vnsrl, vssrl, vssra, vsrl, vsra.
All implementations are verified and pass the RISC-V Imperas test suites.
by @latifbhatti
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Chores