major changes to integrate AlphaPepDeep #34

ypriverol · 2025-08-12T10:35:42Z

PR Type

Enhancement

Description

Renamed mask_modloss parameter to consider_modloss with inverted logic
Enhanced AlphaPeptDeep model integration with custom model loading
Improved modification parsing for N-terminal modifications
Added decoy filtering for model validation

Diagram Walkthrough

flowchart LR
  A["mask_modloss parameter"] --> B["consider_modloss parameter"]
  C["AlphaPeptDeep model"] --> D["Enhanced model loading"]
  E["Modification parsing"] --> F["Improved N-term handling"]
  G["Model validation"] --> H["Decoy filtering"]

File Walkthrough

Relevant files

Version update

__init__.py `Version bump to 0.0.12` quantmsrescore/init.py Updated version from 0.0.11 to 0.0.12	+1/-1
pyproject.toml `Version bump in project configuration` pyproject.toml Updated version from 0.0.11 to 0.0.12	+1/-1

Enhancement

alphapeptdeep.py `Major AlphaPeptDeep model integration refactor` quantmsrescore/alphapeptdeep.py Renamed `mask_modloss` to `consider_modloss` with inverted boolean logic Added custom model loading with pDeepModel for external models Enhanced fragment type handling for modloss ions Improved prediction pipeline with conditional fragment types	+48/-26
annotator.py `Annotator parameter updates and decoy filtering` quantmsrescore/annotator.py Updated parameter name from `mask_modloss` to `consider_modloss` Added decoy filtering for calibration set in model validation Enhanced model comparison logic	+21/-6
idxmlreader.py `Enhanced PSM data structure and modification parsing` quantmsrescore/idxmlreader.py Added `is_decoy` and `score` fields to PSM dataframe Enhanced modification parsing for N-terminal modifications Improved modification site handling with position-specific logic	+14/-2
ms2rescore.py `CLI parameter rename for modloss handling` quantmsrescore/ms2rescore.py Updated CLI parameter from `--mask_modloss` to `--consider_modloss` Updated parameter documentation and help text	+6/-6

Summary by CodeRabbit

New Features
- Added option to consider neutral-loss (modloss) ions in MS2 modeling and scoring.
- Improved automatic model selection using a calibration subset for more reliable results.
Refactor
- Renamed parameter/flag from “mask_modloss” to “consider_modloss” across CLI and APIs. Note: this is a breaking change for scripts.
Documentation
- Updated CLI help and parameter descriptions to reflect the new “consider_modloss” option.
Chores
- Version bumped to 0.0.12.

Dev

Dockerfile update

Enhance and fixed bugs

…o dev

coderabbitai · 2025-08-12T10:35:49Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Version bumped to 0.0.12. Renamed mask_modloss → consider_modloss across CLI, annotator, and AlphaPeptDeep components and wired modloss-aware MS2 model/prediction paths. Introduced a non-decoy, rank-1 calibration subset for MS2 model selection. idxmlreader now adds is_decoy and score and refines modification-string handling.

Changes

Cohort / File(s)	Change summary
Version bump `pyproject.toml`, `quantmsrescore/__init__.py`	Bump project and package version from `0.0.11` to `0.0.12`.
AlphaPeptDeep modloss handling `quantmsrescore/alphapeptdeep.py`	Replace `mask_modloss` with `consider_modloss` in constructors and functions; add modloss-capable fragment types and pDeepModel usage; alter MS2 model loading/prediction to support modloss-aware and standard paths; produce masked theoretical/predicted fragment arrays.
Annotator calibration & flag propagation `quantmsrescore/annotator.py`	`FeatureAnnotator` now uses `consider_modloss` and forwards it to AlphaPeptDeep; introduce `calibration_set` (non-decoy rank-1) and filtered dataframe for selecting best MS2 models; adjust model-selection calls to use calibration subset.
CLI option rename `quantmsrescore/ms2rescore.py`	CLI option and function parameter renamed from `--mask_modloss`/`mask_modloss` to `--consider_modloss`/`consider_modloss`; pass through to `FeatureAnnotator`; update docstrings.
PSM parsing tweaks `quantmsrescore/idxmlreader.py`	`_build_psm_index_` adds `is_decoy` and `score` columns; `extract_modifications` reworks initial-mod handling for N-term and multi-mod cases, changing textual mod representation for those cases without public API changes.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as msrescore2feature (CLI)
  participant FA as FeatureAnnotator
  participant APD as AlphaPeptDeepAnnotator
  participant MM as ModelManager/pDeepModel
  participant Pred as Predictor

  User->>CLI: --consider_modloss [True/False]
  CLI->>FA: construct FeatureAnnotator(consider_modloss)
  FA->>APD: create AlphaPeptDeepAnnotator(consider_modloss)
  alt consider_modloss == True
    APD->>MM: load modloss-capable MS2 model/types
  else
    APD->>MM: load standard MS2 model/types
  end
  APD->>Pred: predict intensities (per PSM)
  Pred-->>APD: fragment intensities + mz
  APD-->>FA: annotated features
  FA-->>User: output features

sequenceDiagram
  participant FA as FeatureAnnotator
  participant Data as PSMList/DF
  participant MS2PIP as MS2PIP Selector
  participant APD as AlphaPeptDeep Selector

  FA->>Data: copy batch, build calibration_set (non-decoy rank-1)
  FA->>MS2PIP: select best model(calibration_set)
  FA->>APD: select best model(calibration_set, df_without_decoy)
  FA->>FA: choose model_to_use
  FA-->>Data: apply chosen model to full batch

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

quantms-rescoring AlphaPepDeep #31: Further AlphaPeptDeep integration and the same mask_modloss → consider_modloss migration across alphapeptdeep/annotator/ms2rescore.
[PLEASE DO NOT MERGE] Major changes in the ms2rescore #10: Overlaps API/signature updates and version bump across core modules (annotator, alphapeptdeep, ms2rescore, idxmlreader).
increase version 0.0.9 -> 0.0.10 #29: Modifies FeatureAnnotator constructor and version metadata, intersecting with constructor changes here.

Suggested labels

Review effort 4/5

Suggested reviewers

daichengxin

Poem

I hop and nibble code—0.0.12 on my back,
Modloss now welcomed, not hidden from track.
I sift calibration crumbs, pick models that fit,
Mark decoys and scores—every peptide a bit.
A joyful thump—features bloom, hop-hop, clap-clap! 🐰✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

qodo-merge-pro · 2025-08-12T10:36:18Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue In custom model loading path, variables like `precusor_df` are used later without being defined when using installed models vs external path; also `predict_int_df`/`theoretical_mz_df` assignment differs between branches. Verify `precusor_df`, `predict_int_df`, and `theoretical_mz_df` are consistently set for both external and installed model paths. def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model, model_dir, ms2_tolerance, processes, consider_modloss): if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")): if consider_modloss: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'], device="cpu") theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']) else: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu") theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2']) model.load(os.path.join(model_dir, "ms2.pth")) predict_int_df = model.predict(psms_df) precusor_df = psms_df else: model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu") model_mgr.load_installed_models(model) if consider_modloss: predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"], frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'], process_num=processes) else: predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"], frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], process_num=processes) precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[ "fragment_intensity_df"], predictions["fragment_mz_df"] results = [] precusor_df = precusor_df.set_index("provenance_data") Typo/Variable Name The variable `precusor_df` appears multiple times; likely intended `precursor_df`. This typo risks confusion and bugs when integrating or refactoring. results = [] precusor_df = precusor_df.set_index("provenance_data") b_cols = [col for col in theoretical_mz_df.columns if col.startswith('b')] y_cols = [col for col in theoretical_mz_df.columns if col.startswith('y')] # Compile regex for spectrum ID matching API Change Propagation CLI now uses `consider_modloss` and internal classes accept it, but ensure all call sites and downstream consumers handle inverted logic vs previous `mask_modloss`. Double-check default semantics and documentation consistency. spectrum_path=self._idxml_reader.spectrum_path, spectrum_id_pattern=self._spectrum_id_pattern, model_dir=self._ms2_model_path, calibration_set_size=self._calibration_set_size, valid_correlations_size=self._valid_correlations_size, correlation_threshold=0.7, # Consider making this configurable higher_score_better=self._higher_score_better, processes=self._processes, force_model=self._force_model, consider_modloss=self._consider_modloss )

Copilot

Pull Request Overview

This PR integrates AlphaPeptDeep as an alternative to MS2PIP for peptide spectrum prediction and rescoring. The changes reverse the logic from masking modloss ions to considering them, allowing better handling of modified peptides, particularly for phosphorylation analysis.

Changes parameter semantics from mask_modloss to consider_modloss with inverted logic
Updates AlphaPeptDeep integration to properly handle external models and modloss ions
Adds decoy detection and scoring information to PSM processing pipeline

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
quantmsrescore/ms2rescore.py	Updates CLI parameter from `mask_modloss` to `consider_modloss` with inverted semantics
quantmsrescore/idxmlreader.py	Adds decoy status and scoring fields to PSM data structure
quantmsrescore/annotator.py	Updates annotator configuration to use new `consider_modloss` parameter
quantmsrescore/alphapeptdeep.py	Major refactoring to support external models and proper modloss handling
quantmsrescore/init.py	Version bump from 0.0.11 to 0.0.12
pyproject.toml	Version bump from 0.0.11 to 0.0.12

Copilot · 2025-08-12T10:37:13Z

quantmsrescore/idxmlreader.py

+                if len(modsites) == 3:
+                    mods_res.append(v.group(0)[1:-1] + "@" + modsites[-1] + "^" + "_".join(modsites[:-1]))
+                else:
+                    mods_res.append(v.group(0)[1:-1] + "@" + "_".join(modsites))


This complex string manipulation logic for modification site parsing has multiple nested conditions and repeated string operations. Consider extracting this into a separate helper method to improve readability and maintainability.

quantmsrescore/alphapeptdeep.py

qodo-merge-pro · 2025-08-12T10:38:15Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Verify modloss handling logic The rename from mask_modloss to consider_modloss inverts semantics, but the new code mixes both directions (e.g., ModelManager(mask_modloss=not consider_modloss), custom pDeepModel frag_types, masking zeros) and changes default behavior, risking inconsistent fragment sets and correlation comparability across code paths. Add a centralized adapter that derives fragment types, ModelManager flags, and downstream masking consistently from a single consider_modloss boolean, and validate output shapes/indices between external and installed models to avoid silent misalignment. Examples: quantmsrescore/alphapeptdeep.py [703-731] if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")): if consider_modloss: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'], device="cpu") theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']) else: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu") ... (clipped 19 lines) Solution Walkthrough: Before: def make_prediction(..., consider_modloss): if custom_model_path_exists: if consider_modloss: frag_types = MODLOSS_FRAGS else: frag_types = NO_MODLOSS_FRAGS model = pDeepModel(charged_frag_types=frag_types) ... else: model_mgr = ModelManager(mask_modloss=not consider_modloss) if consider_modloss: frag_types = MODLOSS_FRAGS else: frag_types = NO_MODLOSS_FRAGS predictions = model_mgr.predict_all(frag_types=frag_types, ...) ... After: def get_frag_types(consider_modloss): if consider_modloss: return MODLOSS_FRAGS else: return NO_MODLOSS_FRAGS def make_prediction(..., consider_modloss): frag_types = get_frag_types(consider_modloss) if custom_model_path_exists: model = pDeepModel(charged_frag_types=frag_types) ... else: model_mgr = ModelManager(mask_modloss=not consider_modloss) predictions = model_mgr.predict_all(frag_types=frag_types, ...) ... Suggestion importance[1-10]: 9 __ Why: This suggestion correctly identifies a critical design issue where duplicated and potentially inconsistent logic for handling `modloss` fragments exists across different code paths, which could lead to subtle bugs and maintenance challenges.	High
Possible issue	✅ ~~Unify prediction outputs and fix typo~~ Suggestion Impact: The commit fixed the typo by renaming the variable from precusor_df to precursor_df in the pDeepModel branch. code diff: - precusor_df = psms_df + precursor_df = psms_df Ensure consistent variable naming and returned structures between the custom pDeepModel path and ModelManager path. Currently `precusor_df` is misspelled and the external model branch does not create a `fragment_mz_df` consistent with later usage. Align names (precursor_df) and construct `theoretical_mz_df` and `predict_int_df` with matching columns for downstream code. quantmsrescore/alphapeptdeep.py [703-731] if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")): if consider_modloss: - model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', - 'b_modloss_z1', 'b_modloss_z2', - 'y_modloss_z1', 'y_modloss_z2'], device="cpu") - theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2', - 'b_modloss_z1', 'b_modloss_z2', - 'y_modloss_z1', 'y_modloss_z2']) + frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2', + 'b_modloss_z1', 'b_modloss_z2', + 'y_modloss_z1', 'y_modloss_z2'] else: - model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu") - theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2']) + frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2'] + model = pDeepModel(charged_frag_types=frag_types, device="cpu") model.load(os.path.join(model_dir, "ms2.pth")) - predict_int_df = model.predict(psms_df) - precusor_df = psms_df + # Predict intensities and compute theoretical m/z to mirror ModelManager output + precursor_df = psms_df.copy() + predict_int_df = model.predict(precursor_df) + theoretical_mz_df = create_fragment_mz_dataframe(precursor_df, frag_types) else: model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu") model_mgr.load_installed_models(model) - if consider_modloss: - predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"], - frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', - 'b_modloss_z1', 'b_modloss_z2', - 'y_modloss_z1', 'y_modloss_z2'], - process_num=processes) - else: - predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"], - frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], - process_num=processes) - precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[ - "fragment_intensity_df"], predictions["fragment_mz_df"] + predictions = model_mgr.predict_all( + precursor_df=psms_df, predict_items=["ms2"], + frag_types=frag_types if 'frag_types' in locals() else ['b_z1','y_z1','b_z2','y_z2'], + process_num=processes + ) + precursor_df = predictions["precursor_df"] + predict_int_df = predictions["fragment_intensity_df"] + theoretical_mz_df = predictions["fragment_mz_df"] `[Suggestion processed]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a typo in the variable name `precusor_df` and proposes to fix it, which improves code readability and maintainability.	Medium
Update

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

coderabbitai

Actionable comments posted: 5

🔭 Outside diff range comments (2)

quantmsrescore/idxmlreader.py (1)
265-274: New PSM DataFrame fields look good; ensure downstream uses boolean filtering for is_decoy.

Adding instrument, is_decoy, and score is useful. Note that is_decoy is stored as a boolean; in quantmsrescore/annotator.py Line 438, the filter compares to 0 (integer). Prefer boolean masking there to avoid implicit casts.

Apply this diff in annotator.py Lines 438-439 to use a boolean mask:
-            psms_df_without_decoy = psms_df[psms_df["is_decoy"] == 0]
+            psms_df_without_decoy = psms_df[~psms_df["is_decoy"]]
quantmsrescore/annotator.py (1)
447-455: Wrong fallback model variable for AlphaPeptDeep validation.

In the AlphaPeptDeep branch, the fallback uses original_model from MS2PIP. This will be invalid for AlphaPeptDeep (e.g., “HCD2021” vs “generic”).

Apply this diff:
-                else:
-                    # Fallback to original model if best model doesn't validate
-                    if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=original_model):
+                else:
+                    # Fallback to the original AlphaPeptDeep model if best model doesn't validate
+                    if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_generator.model):
Additionally, for clarity earlier in the method (outside this hunk), consider capturing both originals:
# Near where original_model is defined for MS2PIP
original_ms2pip_model = ms2pip_generator.model
original_alphapeptdeep_model = alphapeptdeep_generator.model

♻️ Duplicate comments (1)

quantmsrescore/alphapeptdeep.py (1)

720-729: Fragment type lists are duplicated again in the ModelManager path.

The same fragment type lists appear here as in lines 704-710. This reinforces the need to extract them as constants.

🧹 Nitpick comments (5)

quantmsrescore/ms2rescore.py (1)

178-182: Docstring: fix default value and typos.

Current text says “Defaults to True” and “prediciton”. Align with default False and fix spelling.

Apply this diff:

-    consider_modloss: bool, optional
-        If modloss ions are considered in the ms2 model. `modloss`
-        ions are mostly useful for phospho MS2 prediciton model.
-        Defaults to True.
+    consider_modloss: bool, optional
+        Whether to consider modloss ions in the MS2 model (useful for phospho).
+        Defaults to False.

quantmsrescore/annotator.py (1)

33-34: consider_modloss default mismatch and typos in docstring.

Signature default is False, but docstring implies True and contains “prediciton”. Align the doc with the actual default and fix spelling.

Apply this diff:
-        consider_modloss: bool, optional
-            If modloss ions are considered in the ms2 model. `modloss`
-            ions are mostly useful for phospho MS2 prediciton model.
-            Defaults to True.
+        consider_modloss: bool, optional
+            Whether to consider modloss ions in the MS2 model (useful for phospho).
+            Defaults to False.
Also applies to: 69-73

quantmsrescore/alphapeptdeep.py (3)

704-710: Consider extracting the fragment type lists to avoid duplication.

The fragment type lists are duplicated across multiple conditions. Consider defining them as constants to improve maintainability.

+# Define fragment types as constants
+STANDARD_FRAG_TYPES = ['b_z1', 'y_z1', 'b_z2', 'y_z2']
+MODLOSS_FRAG_TYPES = ['b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']
+ALL_FRAG_TYPES = STANDARD_FRAG_TYPES + MODLOSS_FRAG_TYPES
+
 def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model, model_dir,
                    ms2_tolerance, processes, consider_modloss):
     if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
         if consider_modloss:
-            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                                   'b_modloss_z1', 'b_modloss_z2',
-                                                   'y_modloss_z1', 'y_modloss_z2'], device="cpu")
-            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                                                       'b_modloss_z1', 'b_modloss_z2',
-                                                                       'y_modloss_z1', 'y_modloss_z2'])
+            model = pDeepModel(charged_frag_types=ALL_FRAG_TYPES, device="cpu")
+            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ALL_FRAG_TYPES)
         else:
-            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")
-            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2'])
+            model = pDeepModel(charged_frag_types=STANDARD_FRAG_TYPES, device="cpu")
+            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, STANDARD_FRAG_TYPES)

718-718: Logic inversion for mask_modloss is correct but could benefit from a comment.

The inversion logic mask_modloss=not consider_modloss correctly maintains backward compatibility with the ModelManager API. Consider adding a comment to clarify this intentional inversion.

-        model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")
+        # ModelManager uses inverse logic: mask_modloss=True means don't consider modloss
+        model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")

716-716: Typo in variable name: precusor_df should be precursor_df.

There's a typo in the variable name. While it's used consistently, it should be corrected for better code readability.

-        precusor_df = psms_df
+        precursor_df = psms_df

Also update line 730-731 and 734-735:

-        precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
+        precursor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
             "fragment_intensity_df"], predictions["fragment_mz_df"]

     results = []
-    precusor_df = precusor_df.set_index("provenance_data")
+    precursor_df = precursor_df.set_index("provenance_data")

And line 770:

-            row = precusor_df.loc[next(iter(psm.provenance_data.keys()))]
+            row = precursor_df.loc[next(iter(psm.provenance_data.keys()))]

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f070da8 and 7028c97.

📒 Files selected for processing (6)

pyproject.toml (1 hunks)
quantmsrescore/__init__.py (1 hunks)
quantmsrescore/alphapeptdeep.py (13 hunks)
quantmsrescore/annotator.py (5 hunks)
quantmsrescore/idxmlreader.py (2 hunks)
quantmsrescore/ms2rescore.py (4 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

quantmsrescore/idxmlreader.py (1)

quantmsrescore/openms.py (1)

is_decoy_peptide_hit (131-149)

quantmsrescore/annotator.py (3)

quantmsrescore/idxmlreader.py (2)

psms_df (102-104)

psms_df (107-111)

quantmsrescore/ms2pip.py (1)

_find_best_ms2pip_model (413-460)

quantmsrescore/alphapeptdeep.py (1)

_find_best_ms2_model (596-639)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Codacy Static Code Analysis
GitHub Check: build (3.11)
GitHub Check: build

🔇 Additional comments (10)

pyproject.toml (1)

6-6: Version bump aligns with codebase.

The project version is correctly updated to 0.0.12 and matches quantmsrescore/init.py.

quantmsrescore/__init__.py (1)

7-7: version synchronized with pyproject.

Good consistency between package metadata and code.

quantmsrescore/ms2rescore.py (1)

199-200: Propagating consider_modloss to FeatureAnnotator is correct.

Wiring looks consistent with the new API.

quantmsrescore/annotator.py (3)

119-120: State wiring for consider_modloss looks good.

Internal storage is straightforward and consistent with usage downstream.

359-361: Passing consider_modloss to AlphaPeptDeepAnnotator is correct.

The new option is properly propagated into AlphaPeptDeep.

444-446: Model selection uses calibration subset correctly; ensure non-empty set.

If calibration_set is empty (e.g., all decoys), both model searches may degrade. Consider guarding and falling back to a small random target subset or the original model.

Do you want me to add a safe fallback (e.g., select top-scoring targets via _get_top_batch_psms) when calibration_set is empty?

quantmsrescore/alphapeptdeep.py (4)

46-46: Parameter naming change from mask_modloss to consider_modloss is appropriate.

The rename from mask_modloss to consider_modloss improves clarity by using positive logic rather than negative. This aligns with the CLI changes mentioned in the PR summary.

774-782: Good implementation of masking for zero values.

The masking approach to handle non-zero fragment values is correct and efficient. This ensures that only valid fragment ions are considered in the downstream processing.

787-789: Consistent use of masked arrays throughout the data flow.

The masked arrays are correctly propagated through theoretical_mz and predicted_intensity, ensuring consistency in the data processing pipeline.

401-401: Consistent propagation of consider_modloss parameter.

The consider_modloss parameter is consistently passed through all the relevant methods in the AlphaPeptDeepAnnotator class, maintaining the modloss-aware functionality throughout the annotation process.

Also applies to: 419-419, 458-458, 512-512, 631-631

coderabbitai · 2025-08-12T10:46:20Z

quantmsrescore/alphapeptdeep.py

+from peptdeep.model.ms2 import pDeepModel
+from alphabase.peptide.fragment import create_fragment_mz_dataframe


💡 Verification agent

🧩 Analysis chain

Ensure the new imports are utilized correctly.

The new imports pDeepModel and create_fragment_mz_dataframe are added to support modloss-aware fragment modeling. Verify that these are available in the installed version of the peptdeep/alphabase packages.

🏁 Script executed:

#!/bin/bash # Check if pDeepModel and create_fragment_mz_dataframe are properly imported and used elsewhere rg -A 5 "from peptdeep.model.ms2 import pDeepModel|from alphabase.peptide.fragment import create_fragment_mz_dataframe"

Length of output: 652

Fix unused imports in quantmsrescore/alphapeptdeep.py

It looks like both pDeepModel and create_fragment_mz_dataframe are imported on lines 6–7 but never referenced elsewhere in this file. Please address the following:

quantmsrescore/alphapeptdeep.py:6–7
- from peptdeep.model.ms2 import pDeepModel - from alphabase.peptide.fragment import create_fragment_mz_dataframe

If you plan to enable modloss-aware fragment modeling here, add code to instantiate pDeepModel and call create_fragment_mz_dataframe at the appropriate location.

Otherwise, remove these two imports to keep the code clean and avoid lint errors.

Finally, verify that your project’s pinned versions of peptdeep and alphabase actually export these symbols, and update your dependency definitions (requirements.txt / setup.py / pyproject.toml) if needed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from peptdeep.model.ms2 import pDeepModel

from alphabase.peptide.fragment import create_fragment_mz_dataframe

🤖 Prompt for AI Agents

In quantmsrescore/alphapeptdeep.py around lines 6–7, the imports pDeepModel and create_fragment_mz_dataframe are unused; either remove these two import lines to silence lint errors and keep the file clean, or if you intend to add modloss-aware fragment modeling, instantiate pDeepModel and call create_fragment_mz_dataframe at the correct integration point (e.g., when building fragment predictions) and wire their outputs into the existing code paths; finally, confirm that the pinned versions of peptdeep and alphabase in requirements.txt / setup.py / pyproject.toml export these symbols and update the dependency versions if they do not.

quantmsrescore/annotator.py

quantmsrescore/idxmlreader.py

quantmsrescore/ms2rescore.py

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

quantmsrescore/idxmlreader.py (1)

290-300: Complex modification parsing logic could benefit from extraction.

The N-terminal modification parsing logic is complex with multiple nested conditions and string operations. Consider extracting this into a separate helper method for improved readability and maintainability.

Example refactoring approach:

+def _format_nterm_modification(mod_id: str, mod_parts: list) -> str:
+    """Format N-terminal modification string."""
+    if mod_parts[0] == "N-term":
+        suffix = mod_parts[-1] if len(mod_parts) == 2 else ""
+        term_label = "^Any_N-term" if suffix else "Any_N-term"
+        return f"{mod_id}@{suffix}{term_label}" if suffix else f"{mod_id}@{term_label}"
+    else:
+        if len(mod_parts) == 3:
+            return f"{mod_id}@{mod_parts[-1]}^{'_'.join(mod_parts[:-1])}"
+        return f"{mod_id}@{'_'.join(mod_parts)}"

 def extract_modifications(peptidoform, mods_name_dict):
     # ... existing code ...
     for i, v in enumerate(list(pattern.finditer(peptidoform))):
         if peptidoform.startswith(".") and i == 0:
             modsites = mods_name_dict[v.group(0)[1:-1]].split(" ")
-            if "".join(modsites[0]) == "N-term":
-                if len(modsites) == 2:
-                    mods_res.append(v.group(0)[1:-1] + "@" + modsites[-1] + "^Any_N-term")
-                else:
-                    mods_res.append(v.group(0)[1:-1] + "@" + "Any_N-term")
-            else:
-                if len(modsites) == 3:
-                    mods_res.append(v.group(0)[1:-1] + "@" + modsites[-1] + "^" + "_".join(modsites[:-1]))
-                else:
-                    mods_res.append(v.group(0)[1:-1] + "@" + "_".join(modsites))
+            mods_res.append(_format_nterm_modification(v.group(0)[1:-1], modsites))

quantmsrescore/alphapeptdeep.py (1)

716-716: Fix typo in variable name.

Variable name 'precusor_df' is misspelled. It should be 'precursor_df' to match the correct spelling used elsewhere in the codebase.

Apply this diff:
-        precusor_df = psms_df
+        precursor_df = psms_df

🧹 Nitpick comments (2)

quantmsrescore/alphapeptdeep.py (1)

703-716: Consider extracting model initialization logic.

The model initialization logic with conditional fragment types could be extracted into a helper method to reduce duplication and improve maintainability.

Example refactoring:

+def _get_fragment_types(consider_modloss: bool) -> list:
+    """Get fragment types based on modloss consideration."""
+    base_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2']
+    if consider_modloss:
+        return base_types + ['b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']
+    return base_types
+
+def _initialize_local_model(model_dir: str, consider_modloss: bool) -> tuple:
+    """Initialize local pDeep model."""
+    fragment_types = _get_fragment_types(consider_modloss)
+    model = pDeepModel(charged_frag_types=fragment_types, device="cpu")
+    model.load(os.path.join(model_dir, "ms2.pth"))
+    return model, fragment_types

 def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model, model_dir,
                     ms2_tolerance, processes, consider_modloss):
     if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
-        if consider_modloss:
-            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                                   'b_modloss_z1', 'b_modloss_z2',
-                                                   'y_modloss_z1', 'y_modloss_z2'], device="cpu")
-            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                                                       'b_modloss_z1', 'b_modloss_z2',
-                                                                       'y_modloss_z1', 'y_modloss_z2'])
-        else:
-            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")
-            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2'])
-        model.load(os.path.join(model_dir, "ms2.pth"))
+        model, fragment_types = _initialize_local_model(model_dir, consider_modloss)
+        theoretical_mz_df = create_fragment_mz_dataframe(psms_df, fragment_types)
         predict_int_df = model.predict(psms_df)
         precursor_df = psms_df

quantmsrescore/annotator.py (1)

446-464: Consider extracting duplicated validation logic.

The validation and fallback logic for AlphaPeptDeep (lines 447-463) is nearly identical to the MS2PIP logic (lines 466-481). This could be refactored into a shared method to improve maintainability.

Consider extracting a helper method:

def _apply_best_model(self, generator, psm_list, best_model, best_corr, 
                      original_model, generator_name, psms_df=None):
    """Apply the best model with validation and fallback logic."""
    if psms_df is not None:
        # AlphaPeptDeep case
        validation_args = (psm_list, psms_df)
    else:
        # MS2PIP case
        validation_args = (psm_list,)
    
    if best_model and generator.validate_features(psm_list=psm_list, 
                                                  *validation_args[1:], 
                                                  model=best_model):
        model_to_use = best_model
        logger.info(f"Using best model: {model_to_use} with correlation: {best_corr:.4f}")
    else:
        # Fallback to original model if best model doesn't validate
        if generator.validate_features(*validation_args, model=original_model):
            logger.warning("Best model validation failed, falling back to original model")
            model_to_use = original_model
        else:
            logger.error("Both best model and original model validation failed")
            return False  # Indicate failure
    
    # Apply the selected model
    generator.model = model_to_use
    generator.add_features(*validation_args)
    logger.info(f"Successfully applied {generator_name} annotation using model: {model_to_use}")
    self.ms2_generator = generator_name
    return True  # Indicate success

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f070da8 and 6c71878.

📒 Files selected for processing (6)

pyproject.toml (1 hunks)
quantmsrescore/__init__.py (1 hunks)
quantmsrescore/alphapeptdeep.py (13 hunks)
quantmsrescore/annotator.py (5 hunks)
quantmsrescore/idxmlreader.py (2 hunks)
quantmsrescore/ms2rescore.py (4 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

quantmsrescore/idxmlreader.py (1)

quantmsrescore/openms.py (1)

is_decoy_peptide_hit (131-149)

quantmsrescore/annotator.py (3)

quantmsrescore/idxmlreader.py (2)

psms_df (102-104)

psms_df (107-111)

quantmsrescore/ms2pip.py (1)

_find_best_ms2pip_model (413-460)

quantmsrescore/alphapeptdeep.py (1)

_find_best_ms2_model (596-639)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Codacy Static Code Analysis
GitHub Check: build
GitHub Check: build (3.11)

🔇 Additional comments (12)

pyproject.toml (1)

6-6: LGTM!

Version bump to 0.0.12 correctly aligns with the changes in quantmsrescore/__init__.py.

quantmsrescore/idxmlreader.py (1)

271-273: LGTM! Good addition of is_decoy and score fields.

The inclusion of is_decoy and score fields in the PSM dataframe provides valuable information for downstream processing and filtering. The implementation correctly uses the existing helper method.

quantmsrescore/__init__.py (1)

7-7: LGTM!

Version bump to 0.0.12 is consistent with the version update in pyproject.toml.

quantmsrescore/ms2rescore.py (2)

109-111: Parameter rename from mask_modloss to consider_modloss is clear and intuitive.

The semantic change from "masking" to "considering" modloss ions better reflects the actual behavior of the parameter.

199-199: LGTM!

The parameter is correctly passed to the FeatureAnnotator with the updated name.

quantmsrescore/alphapeptdeep.py (1)

774-782: Good implementation of explicit masking for zero values.

The explicit masking of zero values in fragment arrays ensures that only valid fragments are considered in downstream processing. This is a robust approach to handle missing fragments.

quantmsrescore/annotator.py (6)

33-33: LGTM! Parameter rename is consistent across the module.

The parameter rename from mask_modloss to consider_modloss is properly implemented here.

119-119: LGTM! Instance variable properly initialized.

The parameter is correctly stored as a private instance variable.

359-361: LGTM! AlphaPeptDeep annotator creation properly updated.

The consider_modloss parameter is correctly passed to the AlphaPeptDeep annotator constructor.

443-446: LGTM! Model selection logic properly implements calibration-based comparison.

The implementation correctly:

Uses the calibration set (non-decoy PSMs) for MS2PIP model selection

Uses the filtered DataFrame (without decoys) for AlphaPeptDeep model selection

Compares correlations to determine the best model

606-608: The replacement only affects how feature keys are labeled on the PSMs before being written out; there are no downstream branches or lookups keyed on the literal "MS2PIP:*" strings. All of the .replace("MS2PIP", "AlphaPeptDeep") does is rename the column prefixes in the final psm["rescoring_features"] dict. Since:

• No code ever does if canonical_feature == "MS2PIP:…".
• No mapping back to MS2PIP_FEATURES happens after this point.
• The rescoring engine consumes whatever feature names are present.

this renaming will not break anything downstream.

I recommend resolving this comment.

427-439: Please confirm PSMList.copy() behavior and decoy‐filter consistency

In quantmsrescore/annotator.py (lines 427–439):

PSMList.copy(): We weren’t able to locate its implementation in the repo. Verify that this method returns a new PSMList instance with its own underlying list (otherwise filtering on batch_psms_copy.psm_list may still mutate the original).

Decoy filtering: You use if not result.is_decoy on the PSM objects and psms_df["is_decoy"] == 0 on the DataFrame. Although False == 0 evaluates as True in pandas, consider standardizing the check (e.g. psms_df.loc[~psms_df["is_decoy"]] or == False) to improve readability and avoid confusion.

coderabbitai · 2025-08-12T10:56:25Z

quantmsrescore/alphapeptdeep.py

@@ -747,29 +767,31 @@ def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern

        # Process each PSM for this spectrum
        for psm_idx, psm in psms_by_specid[spectrum_id]:
-
-            row = precusor_df.loc[next(iter(psm.provenance_data.keys()))]
+            row = precursor_df.loc[next(iter(psm.provenance_data.keys()))]


⚠️ Potential issue

Ensure proper error handling for missing provenance data.

Using next(iter(psm.provenance_data.keys())) could raise a StopIteration if provenance_data is empty. Consider adding error handling.

Apply this diff to add error handling:

- row = precursor_df.loc[next(iter(psm.provenance_data.keys()))] + try: + provenance_key = next(iter(psm.provenance_data.keys())) + row = precursor_df.loc[provenance_key] + except (StopIteration, KeyError) as e: + logger.warning(f"Skipping PSM due to missing provenance data: {e}") + continue

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

row = precursor_df.loc[next(iter(psm.provenance_data.keys()))]

try:

provenance_key = next(iter(psm.provenance_data.keys()))

row = precursor_df.loc[provenance_key]

except (StopIteration, KeyError) as e:

logger.warning(f"Skipping PSM due to missing provenance data: {e}")

continue

🤖 Prompt for AI Agents

In quantmsrescore/alphapeptdeep.py around line 770, the code uses next(iter(psm.provenance_data.keys())) which will raise StopIteration if provenance_data is empty; guard against an empty provenance_data by checking its truthiness before using next(iter(...)) and handle the case (e.g., log and skip this PSM or raise a clear ValueError) and then retrieve the key safely (next(iter(psm.provenance_data)) or use list(...) with bounds check) so the code either continues gracefully or fails with an explicit message when provenance is missing.

coderabbitai · 2025-08-12T10:56:25Z

quantmsrescore/annotator.py

+        consider_modloss: bool, optional
+            If modloss ions are considered in the ms2 model. `modloss`
            ions are mostly useful for phospho MS2 prediciton model.
            Defaults to True.


⚠️ Potential issue

Fix docstring inconsistency.

The docstring states "Defaults to True" but the parameter default is False.

Apply this diff to fix the documentation:

- consider_modloss: bool, optional - If modloss ions are considered in the ms2 model. `modloss` - ions are mostly useful for phospho MS2 prediciton model. - Defaults to True. + consider_modloss: bool, optional + If modloss ions are considered in the ms2 model. `modloss` + ions are mostly useful for phospho MS2 prediction model. + Defaults to False.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

consider_modloss: bool, optional

If modloss ions are considered in the ms2 model. `modloss`

ions are mostly useful for phospho MS2 prediciton model.

Defaults to True.

consider_modloss: bool, optional

If modloss ions are considered in the ms2 model. `modloss`

ions are mostly useful for phospho MS2 prediction model.

Defaults to False.

🤖 Prompt for AI Agents

In quantmsrescore/annotator.py around lines 69 to 72, the docstring for the parameter `consider_modloss` incorrectly states "Defaults to True" while the function default is False; update the docstring text to "Defaults to False" (or otherwise match the actual default) so the documentation and code are consistent.

coderabbitai · 2025-08-12T10:56:25Z

quantmsrescore/ms2rescore.py

+    consider_modloss: bool, optional
+        If modloss ions are considered in the ms2 model. `modloss`
        ions are mostly useful for phospho MS2 prediciton model.
        Defaults to True.


⚠️ Potential issue

Fix inconsistent docstring default value.

The docstring states "Defaults to True" but the CLI flag is actually False by default (using is_flag=True without default=True).

Apply this diff to fix the docstring:

consider_modloss: bool, optional If modloss ions are considered in the ms2 model. `modloss` ions are mostly useful for phospho MS2 prediciton model. - Defaults to True. + Defaults to False.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

consider_modloss: bool, optional

If modloss ions are considered in the ms2 model. `modloss`

ions are mostly useful for phospho MS2 prediciton model.

Defaults to True.

consider_modloss: bool, optional

If modloss ions are considered in the ms2 model. `modloss`

ions are mostly useful for phospho MS2 prediciton model.

Defaults to False.

🤖 Prompt for AI Agents

In quantmsrescore/ms2rescore.py around lines 178 to 181, the docstring for the parameter `consider_modloss` incorrectly states "Defaults to True" while the CLI flag is false by default; update the docstring to reflect the actual default by changing the sentence to "Defaults to False." so the documentation matches the CLI behavior.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

quantmsrescore/alphapeptdeep.py (1)

6-7: Duplicate comment detected for unused imports.

Both pDeepModel and create_fragment_mz_dataframe are imported but never used anywhere in this file. This was previously flagged but remains unaddressed.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7028c97 and 6c71878.

📒 Files selected for processing (1)

quantmsrescore/alphapeptdeep.py (13 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

quantmsrescore/alphapeptdeep.py (1)

quantmsrescore/idxmlreader.py (2)

psms_df (102-104)

psms_df (107-111)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Codacy Static Code Analysis
GitHub Check: build
GitHub Check: build (3.11)

🔇 Additional comments (10)

quantmsrescore/alphapeptdeep.py (10)

46-46: LGTM! Parameter rename successfully implemented.

The parameter has been correctly renamed from mask_modloss to consider_modloss with appropriate default value and storage as instance variable.

Also applies to: 83-83

195-195: LGTM! Parameter propagation is consistent.

The consider_modloss parameter is correctly propagated through all the custom_correlate function calls across different methods.

Also applies to: 458-458, 512-512, 631-631

401-401: LGTM! AlphaPeptDeepAnnotator correctly inherits the new parameter.

The consider_modloss parameter is properly added to the constructor and stored as an instance variable.

Also applies to: 419-419

679-679: LGTM! Function signature updates are consistent.

The mask_modloss parameter has been correctly renamed to consider_modloss across all function signatures.

Also applies to: 689-689, 702-702

720-731: LGTM! Modloss-aware fragment type handling is correctly implemented.

The conditional logic properly handles different fragment types based on the consider_modloss flag, including modloss fragment types when enabled.

770-770: LGTM! Provenance data handling is correct.

The code properly accesses the provenance data key to locate the corresponding row in the precursor dataframe.

774-775: LGTM! Masking logic correctly filters non-zero fragments.

The implementation correctly applies boolean masks to filter out zero-valued theoretical m/z values and uses these masks consistently across theoretical m/z, targets, and predictions.

Also applies to: 778-778, 781-782, 787-789

568-572: LGTM! Decoy filtering for calibration set is correctly implemented.

The validation method properly filters out decoy PSMs and selects only rank-1 target PSMs for the calibration set, which aligns with the PR objectives.

704-716: No action required: pDeepModel and create_fragment_mz_dataframe are used
Both imports are invoked in the consider_modloss and default branches, so they are not unused.

718-718: Verify the mask_modloss inversion in ModelManager

Please confirm that inverting the consider_modloss flag when calling ModelManager matches the intended behavior in the external peptdeep API—i.e., that setting mask_modloss=True indeed disables modloss handling and vice versa.

Locations to check:
quantmsrescore/alphapeptdeep.py:718
    model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")
Since ModelManager comes from an external package, please review its documentation or source to ensure that mask_modloss was renamed and inverted correctly relative to consider_modloss. If the logic is reversed, adjust the argument accordingly or add an inline comment to clarify.

coderabbitai · 2025-08-12T11:00:15Z

quantmsrescore/alphapeptdeep.py

-    precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
-        "fragment_intensity_df"], predictions["fragment_mz_df"]
-    precusor_df = precusor_df.set_index("provenance_data")
+    precursor_df = precursor_df.set_index("provenance_data")


⚠️ Potential issue

Fix variable assignment after conditional initialization.

The precursor_df assignment occurs outside the conditional blocks, but precursor_df is only defined in the else block (line 730-731). In the if block, it should be assigned to psms_df.

if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")): if consider_modloss: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'], device="cpu") theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']) else: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu") theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2']) model.load(os.path.join(model_dir, "ms2.pth")) predict_int_df = model.predict(psms_df) - precursor_df = psms_df + precursor_df = psms_df else: model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu") model_mgr.load_installed_models(model) if consider_modloss: predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"], frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'], process_num=processes) else: predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"], frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], process_num=processes) precursor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[ "fragment_intensity_df"], predictions["fragment_mz_df"] - results = [] - precursor_df = precursor_df.set_index("provenance_data") + results = [] + precursor_df = precursor_df.set_index("provenance_data")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In quantmsrescore/alphapeptdeep.py around line 734, precursor_df is being set_index("provenance_data") but precursor_df is only defined in the else branch; in the if branch you should assign precursor_df = psms_df (or set index on psms_df and assign that to precursor_df) before calling set_index so precursor_df is always defined; update the if branch to create/assign precursor_df from psms_df and then call set_index("provenance_data") on it.

daichengxin and others added 6 commits July 25, 2025 10:20

Merge pull request #16 from bigbio/dev

576e066

Dev

Merge pull request #18 from bigbio/dev

91845e3

Dockerfile update

fixed bugs and enhance

744b771

version

06f91c4

Merge pull request #33 from daichengxin/dev

4dcdb40

Enhance and fixed bugs

Merge branch 'dev' of https://github.com/bigbio/quantms-rescoring int…

7028c97

…o dev

ypriverol requested a review from Copilot August 12, 2025 10:35

qodo-merge-pro bot added the Review effort 3/5 label Aug 12, 2025

Copilot AI reviewed Aug 12, 2025

View reviewed changes

ypriverol and others added 2 commits August 12, 2025 11:42

Update quantmsrescore/alphapeptdeep.py

b92a71f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

minor changes

6c71878

coderabbitai bot reviewed Aug 12, 2025

View reviewed changes

ypriverol merged commit 13f498f into main Aug 12, 2025
4 checks passed

		from peptdeep.model.ms2 import pDeepModel
		from alphabase.peptide.fragment import create_fragment_mz_dataframe

-            row = precursor_df.loc[next(iter(psm.provenance_data.keys()))]
+            try:
+                provenance_key = next(iter(psm.provenance_data.keys()))
+                row = precursor_df.loc[provenance_key]
+            except (StopIteration, KeyError) as e:
+                logger.warning(f"Skipping PSM due to missing provenance data: {e}")
+                continue

major changes to integrate AlphaPepDeep #34

major changes to integrate AlphaPepDeep #34

Uh oh!

Conversation

ypriverol commented Aug 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

qodo-merge-pro bot commented Aug 12, 2025

PR Reviewer Guide 🔍

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qodo-merge-pro bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ypriverol commented Aug 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 12, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

qodo-merge-pro bot commented Aug 12, 2025 •

edited

Loading