Skip to content

Conversation

ypriverol
Copy link
Member

@ypriverol ypriverol commented Aug 12, 2025

PR Type

Enhancement


Description

  • Renamed mask_modloss parameter to consider_modloss with inverted logic

  • Enhanced AlphaPeptDeep model integration with custom model loading

  • Improved modification parsing for N-terminal modifications

  • Added decoy filtering for model validation


Diagram Walkthrough

flowchart LR
  A["mask_modloss parameter"] --> B["consider_modloss parameter"]
  C["AlphaPeptDeep model"] --> D["Enhanced model loading"]
  E["Modification parsing"] --> F["Improved N-term handling"]
  G["Model validation"] --> H["Decoy filtering"]
Loading

File Walkthrough

Relevant files
Version update
__init__.py
Version bump to 0.0.12                                                                     

quantmsrescore/init.py

  • Updated version from 0.0.11 to 0.0.12
+1/-1     
pyproject.toml
Version bump in project configuration                                       

pyproject.toml

  • Updated version from 0.0.11 to 0.0.12
+1/-1     
Enhancement
alphapeptdeep.py
Major AlphaPeptDeep model integration refactor                     

quantmsrescore/alphapeptdeep.py

  • Renamed mask_modloss to consider_modloss with inverted boolean logic
  • Added custom model loading with pDeepModel for external models
  • Enhanced fragment type handling for modloss ions
  • Improved prediction pipeline with conditional fragment types
+48/-26 
annotator.py
Annotator parameter updates and decoy filtering                   

quantmsrescore/annotator.py

  • Updated parameter name from mask_modloss to consider_modloss
  • Added decoy filtering for calibration set in model validation
  • Enhanced model comparison logic
+21/-6   
idxmlreader.py
Enhanced PSM data structure and modification parsing         

quantmsrescore/idxmlreader.py

  • Added is_decoy and score fields to PSM dataframe
  • Enhanced modification parsing for N-terminal modifications
  • Improved modification site handling with position-specific logic
+14/-2   
ms2rescore.py
CLI parameter rename for modloss handling                               

quantmsrescore/ms2rescore.py

  • Updated CLI parameter from --mask_modloss to --consider_modloss
  • Updated parameter documentation and help text
+6/-6     

Summary by CodeRabbit

  • New Features

    • Added option to consider neutral-loss (modloss) ions in MS2 modeling and scoring.
    • Improved automatic model selection using a calibration subset for more reliable results.
  • Refactor

    • Renamed parameter/flag from “mask_modloss” to “consider_modloss” across CLI and APIs. Note: this is a breaking change for scripts.
  • Documentation

    • Updated CLI help and parameter descriptions to reflect the new “consider_modloss” option.
  • Chores

    • Version bumped to 0.0.12.

Copy link
Contributor

coderabbitai bot commented Aug 12, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Version bumped to 0.0.12. Renamed mask_modloss → consider_modloss across CLI, annotator, and AlphaPeptDeep components and wired modloss-aware MS2 model/prediction paths. Introduced a non-decoy, rank-1 calibration subset for MS2 model selection. idxmlreader now adds is_decoy and score and refines modification-string handling.

Changes

Cohort / File(s) Change summary
Version bump
pyproject.toml, quantmsrescore/__init__.py
Bump project and package version from 0.0.11 to 0.0.12.
AlphaPeptDeep modloss handling
quantmsrescore/alphapeptdeep.py
Replace mask_modloss with consider_modloss in constructors and functions; add modloss-capable fragment types and pDeepModel usage; alter MS2 model loading/prediction to support modloss-aware and standard paths; produce masked theoretical/predicted fragment arrays.
Annotator calibration & flag propagation
quantmsrescore/annotator.py
FeatureAnnotator now uses consider_modloss and forwards it to AlphaPeptDeep; introduce calibration_set (non-decoy rank-1) and filtered dataframe for selecting best MS2 models; adjust model-selection calls to use calibration subset.
CLI option rename
quantmsrescore/ms2rescore.py
CLI option and function parameter renamed from --mask_modloss/mask_modloss to --consider_modloss/consider_modloss; pass through to FeatureAnnotator; update docstrings.
PSM parsing tweaks
quantmsrescore/idxmlreader.py
_build_psm_index_ adds is_decoy and score columns; extract_modifications reworks initial-mod handling for N-term and multi-mod cases, changing textual mod representation for those cases without public API changes.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as msrescore2feature (CLI)
  participant FA as FeatureAnnotator
  participant APD as AlphaPeptDeepAnnotator
  participant MM as ModelManager/pDeepModel
  participant Pred as Predictor

  User->>CLI: --consider_modloss [True/False]
  CLI->>FA: construct FeatureAnnotator(consider_modloss)
  FA->>APD: create AlphaPeptDeepAnnotator(consider_modloss)
  alt consider_modloss == True
    APD->>MM: load modloss-capable MS2 model/types
  else
    APD->>MM: load standard MS2 model/types
  end
  APD->>Pred: predict intensities (per PSM)
  Pred-->>APD: fragment intensities + mz
  APD-->>FA: annotated features
  FA-->>User: output features
Loading
sequenceDiagram
  participant FA as FeatureAnnotator
  participant Data as PSMList/DF
  participant MS2PIP as MS2PIP Selector
  participant APD as AlphaPeptDeep Selector

  FA->>Data: copy batch, build calibration_set (non-decoy rank-1)
  FA->>MS2PIP: select best model(calibration_set)
  FA->>APD: select best model(calibration_set, df_without_decoy)
  FA->>FA: choose model_to_use
  FA-->>Data: apply chosen model to full batch
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Suggested labels

Review effort 4/5

Suggested reviewers

  • daichengxin

Poem

I hop and nibble code—0.0.12 on my back,
Modloss now welcomed, not hidden from track.
I sift calibration crumbs, pick models that fit,
Mark decoys and scores—every peptide a bit.
A joyful thump—features bloom, hop-hop, clap-clap! 🐰✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

In custom model loading path, variables like precusor_df are used later without being defined when using installed models vs external path; also predict_int_df/theoretical_mz_df assignment differs between branches. Verify precusor_df, predict_int_df, and theoretical_mz_df are consistently set for both external and installed model paths.

def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model, model_dir,
                    ms2_tolerance, processes, consider_modloss):
    if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
        if consider_modloss:
            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                   'b_modloss_z1', 'b_modloss_z2',
                                                   'y_modloss_z1', 'y_modloss_z2'], device="cpu")
            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                                       'b_modloss_z1', 'b_modloss_z2',
                                                                       'y_modloss_z1', 'y_modloss_z2'])
        else:
            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")
            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2'])
        model.load(os.path.join(model_dir, "ms2.pth"))
        predict_int_df = model.predict(psms_df)
        precusor_df = psms_df
    else:
        model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")
        model_mgr.load_installed_models(model)
        if consider_modloss:
            predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
                                                frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                            'b_modloss_z1', 'b_modloss_z2',
                                                            'y_modloss_z1', 'y_modloss_z2'],
                                                process_num=processes)
        else:
            predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
                                                frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'],
                                                process_num=processes)
        precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
            "fragment_intensity_df"], predictions["fragment_mz_df"]

    results = []
    precusor_df = precusor_df.set_index("provenance_data")
Typo/Variable Name

The variable precusor_df appears multiple times; likely intended precursor_df. This typo risks confusion and bugs when integrating or refactoring.

results = []
precusor_df = precusor_df.set_index("provenance_data")

b_cols = [col for col in theoretical_mz_df.columns if col.startswith('b')]
y_cols = [col for col in theoretical_mz_df.columns if col.startswith('y')]
# Compile regex for spectrum ID matching
API Change Propagation

CLI now uses consider_modloss and internal classes accept it, but ensure all call sites and downstream consumers handle inverted logic vs previous mask_modloss. Double-check default semantics and documentation consistency.

    spectrum_path=self._idxml_reader.spectrum_path,
    spectrum_id_pattern=self._spectrum_id_pattern,
    model_dir=self._ms2_model_path,
    calibration_set_size=self._calibration_set_size,
    valid_correlations_size=self._valid_correlations_size,
    correlation_threshold=0.7,  # Consider making this configurable
    higher_score_better=self._higher_score_better,
    processes=self._processes,
    force_model=self._force_model,
    consider_modloss=self._consider_modloss
)

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates AlphaPeptDeep as an alternative to MS2PIP for peptide spectrum prediction and rescoring. The changes reverse the logic from masking modloss ions to considering them, allowing better handling of modified peptides, particularly for phosphorylation analysis.

  • Changes parameter semantics from mask_modloss to consider_modloss with inverted logic
  • Updates AlphaPeptDeep integration to properly handle external models and modloss ions
  • Adds decoy detection and scoring information to PSM processing pipeline

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
quantmsrescore/ms2rescore.py Updates CLI parameter from mask_modloss to consider_modloss with inverted semantics
quantmsrescore/idxmlreader.py Adds decoy status and scoring fields to PSM data structure
quantmsrescore/annotator.py Updates annotator configuration to use new consider_modloss parameter
quantmsrescore/alphapeptdeep.py Major refactoring to support external models and proper modloss handling
quantmsrescore/init.py Version bump from 0.0.11 to 0.0.12
pyproject.toml Version bump from 0.0.11 to 0.0.12

if len(modsites) == 3:
mods_res.append(v.group(0)[1:-1] + "@" + modsites[-1] + "^" + "_".join(modsites[:-1]))
else:
mods_res.append(v.group(0)[1:-1] + "@" + "_".join(modsites))
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This complex string manipulation logic for modification site parsing has multiple nested conditions and repeated string operations. Consider extracting this into a separate helper method to improve readability and maintainability.

Copilot uses AI. Check for mistakes.

Copy link

qodo-merge-pro bot commented Aug 12, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Verify modloss handling logic

The rename from mask_modloss to consider_modloss inverts semantics, but the new
code mixes both directions (e.g., ModelManager(mask_modloss=not
consider_modloss), custom pDeepModel frag_types, masking zeros) and changes
default behavior, risking inconsistent fragment sets and correlation
comparability across code paths. Add a centralized adapter that derives fragment
types, ModelManager flags, and downstream masking consistently from a single
consider_modloss boolean, and validate output shapes/indices between external
and installed models to avoid silent misalignment.

Examples:

quantmsrescore/alphapeptdeep.py [703-731]
if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
        if consider_modloss:
            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                   'b_modloss_z1', 'b_modloss_z2',
                                                   'y_modloss_z1', 'y_modloss_z2'], device="cpu")
            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                                       'b_modloss_z1', 'b_modloss_z2',
                                                                       'y_modloss_z1', 'y_modloss_z2'])
        else:
            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")

 ... (clipped 19 lines)

Solution Walkthrough:

Before:

def make_prediction(..., consider_modloss):
    if custom_model_path_exists:
        if consider_modloss:
            frag_types = MODLOSS_FRAGS
        else:
            frag_types = NO_MODLOSS_FRAGS
        model = pDeepModel(charged_frag_types=frag_types)
        ...
    else:
        model_mgr = ModelManager(mask_modloss=not consider_modloss)
        if consider_modloss:
            frag_types = MODLOSS_FRAGS
        else:
            frag_types = NO_MODLOSS_FRAGS
        predictions = model_mgr.predict_all(frag_types=frag_types, ...)
        ...

After:

def get_frag_types(consider_modloss):
    if consider_modloss:
        return MODLOSS_FRAGS
    else:
        return NO_MODLOSS_FRAGS

def make_prediction(..., consider_modloss):
    frag_types = get_frag_types(consider_modloss)
    if custom_model_path_exists:
        model = pDeepModel(charged_frag_types=frag_types)
        ...
    else:
        model_mgr = ModelManager(mask_modloss=not consider_modloss)
        predictions = model_mgr.predict_all(frag_types=frag_types, ...)
        ...
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical design issue where duplicated and potentially inconsistent logic for handling modloss fragments exists across different code paths, which could lead to subtle bugs and maintenance challenges.

High
Possible issue
Unify prediction outputs and fix typo
Suggestion Impact:The commit fixed the typo by renaming the variable from precusor_df to precursor_df in the pDeepModel branch.

code diff:

-        precusor_df = psms_df
+        precursor_df = psms_df

Ensure consistent variable naming and returned structures between the custom
pDeepModel path and ModelManager path. Currently precusor_df is misspelled and
the external model branch does not create a fragment_mz_df consistent with later
usage. Align names (precursor_df) and construct theoretical_mz_df and
predict_int_df with matching columns for downstream code.

quantmsrescore/alphapeptdeep.py [703-731]

 if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
     if consider_modloss:
-        model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                               'b_modloss_z1', 'b_modloss_z2',
-                                               'y_modloss_z1', 'y_modloss_z2'], device="cpu")
-        theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                                                   'b_modloss_z1', 'b_modloss_z2',
-                                                                   'y_modloss_z1', 'y_modloss_z2'])
+        frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2',
+                      'b_modloss_z1', 'b_modloss_z2',
+                      'y_modloss_z1', 'y_modloss_z2']
     else:
-        model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")
-        theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2'])
+        frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2']
+    model = pDeepModel(charged_frag_types=frag_types, device="cpu")
     model.load(os.path.join(model_dir, "ms2.pth"))
-    predict_int_df = model.predict(psms_df)
-    precusor_df = psms_df
+    # Predict intensities and compute theoretical m/z to mirror ModelManager output
+    precursor_df = psms_df.copy()
+    predict_int_df = model.predict(precursor_df)
+    theoretical_mz_df = create_fragment_mz_dataframe(precursor_df, frag_types)
 else:
     model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")
     model_mgr.load_installed_models(model)
-    if consider_modloss:
-        predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
-                                            frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                                        'b_modloss_z1', 'b_modloss_z2',
-                                                        'y_modloss_z1', 'y_modloss_z2'],
-                                            process_num=processes)
-    else:
-        predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
-                                            frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'],
-                                            process_num=processes)
-    precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
-        "fragment_intensity_df"], predictions["fragment_mz_df"]
+    predictions = model_mgr.predict_all(
+        precursor_df=psms_df, predict_items=["ms2"],
+        frag_types=frag_types if 'frag_types' in locals() else ['b_z1','y_z1','b_z2','y_z2'],
+        process_num=processes
+    )
+    precursor_df = predictions["precursor_df"]
+    predict_int_df = predictions["fragment_intensity_df"]
+    theoretical_mz_df = predictions["fragment_mz_df"]

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a typo in the variable name precusor_df and proposes to fix it, which improves code readability and maintainability.

Medium
  • Update

ypriverol and others added 2 commits August 12, 2025 11:42
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (2)
quantmsrescore/idxmlreader.py (1)

265-274: New PSM DataFrame fields look good; ensure downstream uses boolean filtering for is_decoy.

Adding instrument, is_decoy, and score is useful. Note that is_decoy is stored as a boolean; in quantmsrescore/annotator.py Line 438, the filter compares to 0 (integer). Prefer boolean masking there to avoid implicit casts.

Apply this diff in annotator.py Lines 438-439 to use a boolean mask:

-            psms_df_without_decoy = psms_df[psms_df["is_decoy"] == 0]
+            psms_df_without_decoy = psms_df[~psms_df["is_decoy"]]
quantmsrescore/annotator.py (1)

447-455: Wrong fallback model variable for AlphaPeptDeep validation.

In the AlphaPeptDeep branch, the fallback uses original_model from MS2PIP. This will be invalid for AlphaPeptDeep (e.g., “HCD2021” vs “generic”).

Apply this diff:

-                else:
-                    # Fallback to original model if best model doesn't validate
-                    if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=original_model):
+                else:
+                    # Fallback to the original AlphaPeptDeep model if best model doesn't validate
+                    if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_generator.model):

Additionally, for clarity earlier in the method (outside this hunk), consider capturing both originals:

# Near where original_model is defined for MS2PIP
original_ms2pip_model = ms2pip_generator.model
original_alphapeptdeep_model = alphapeptdeep_generator.model
♻️ Duplicate comments (1)
quantmsrescore/alphapeptdeep.py (1)

720-729: Fragment type lists are duplicated again in the ModelManager path.

The same fragment type lists appear here as in lines 704-710. This reinforces the need to extract them as constants.

🧹 Nitpick comments (5)
quantmsrescore/ms2rescore.py (1)

178-182: Docstring: fix default value and typos.

Current text says “Defaults to True” and “prediciton”. Align with default False and fix spelling.

Apply this diff:

-    consider_modloss: bool, optional
-        If modloss ions are considered in the ms2 model. `modloss`
-        ions are mostly useful for phospho MS2 prediciton model.
-        Defaults to True.
+    consider_modloss: bool, optional
+        Whether to consider modloss ions in the MS2 model (useful for phospho).
+        Defaults to False.
quantmsrescore/annotator.py (1)

33-34: consider_modloss default mismatch and typos in docstring.

Signature default is False, but docstring implies True and contains “prediciton”. Align the doc with the actual default and fix spelling.

Apply this diff:

-        consider_modloss: bool, optional
-            If modloss ions are considered in the ms2 model. `modloss`
-            ions are mostly useful for phospho MS2 prediciton model.
-            Defaults to True.
+        consider_modloss: bool, optional
+            Whether to consider modloss ions in the MS2 model (useful for phospho).
+            Defaults to False.

Also applies to: 69-73

quantmsrescore/alphapeptdeep.py (3)

704-710: Consider extracting the fragment type lists to avoid duplication.

The fragment type lists are duplicated across multiple conditions. Consider defining them as constants to improve maintainability.

+# Define fragment types as constants
+STANDARD_FRAG_TYPES = ['b_z1', 'y_z1', 'b_z2', 'y_z2']
+MODLOSS_FRAG_TYPES = ['b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']
+ALL_FRAG_TYPES = STANDARD_FRAG_TYPES + MODLOSS_FRAG_TYPES
+
 def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model, model_dir,
                    ms2_tolerance, processes, consider_modloss):
     if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
         if consider_modloss:
-            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                                   'b_modloss_z1', 'b_modloss_z2',
-                                                   'y_modloss_z1', 'y_modloss_z2'], device="cpu")
-            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                                                       'b_modloss_z1', 'b_modloss_z2',
-                                                                       'y_modloss_z1', 'y_modloss_z2'])
+            model = pDeepModel(charged_frag_types=ALL_FRAG_TYPES, device="cpu")
+            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ALL_FRAG_TYPES)
         else:
-            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")
-            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2'])
+            model = pDeepModel(charged_frag_types=STANDARD_FRAG_TYPES, device="cpu")
+            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, STANDARD_FRAG_TYPES)

718-718: Logic inversion for mask_modloss is correct but could benefit from a comment.

The inversion logic mask_modloss=not consider_modloss correctly maintains backward compatibility with the ModelManager API. Consider adding a comment to clarify this intentional inversion.

-        model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")
+        # ModelManager uses inverse logic: mask_modloss=True means don't consider modloss
+        model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")

716-716: Typo in variable name: precusor_df should be precursor_df.

There's a typo in the variable name. While it's used consistently, it should be corrected for better code readability.

-        precusor_df = psms_df
+        precursor_df = psms_df

Also update line 730-731 and 734-735:

-        precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
+        precursor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
             "fragment_intensity_df"], predictions["fragment_mz_df"]

     results = []
-    precusor_df = precusor_df.set_index("provenance_data")
+    precursor_df = precursor_df.set_index("provenance_data")

And line 770:

-            row = precusor_df.loc[next(iter(psm.provenance_data.keys()))]
+            row = precursor_df.loc[next(iter(psm.provenance_data.keys()))]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f070da8 and 7028c97.

📒 Files selected for processing (6)
  • pyproject.toml (1 hunks)
  • quantmsrescore/__init__.py (1 hunks)
  • quantmsrescore/alphapeptdeep.py (13 hunks)
  • quantmsrescore/annotator.py (5 hunks)
  • quantmsrescore/idxmlreader.py (2 hunks)
  • quantmsrescore/ms2rescore.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
quantmsrescore/idxmlreader.py (1)
quantmsrescore/openms.py (1)
  • is_decoy_peptide_hit (131-149)
quantmsrescore/annotator.py (3)
quantmsrescore/idxmlreader.py (2)
  • psms_df (102-104)
  • psms_df (107-111)
quantmsrescore/ms2pip.py (1)
  • _find_best_ms2pip_model (413-460)
quantmsrescore/alphapeptdeep.py (1)
  • _find_best_ms2_model (596-639)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (3.11)
  • GitHub Check: build
🔇 Additional comments (10)
pyproject.toml (1)

6-6: Version bump aligns with codebase.

The project version is correctly updated to 0.0.12 and matches quantmsrescore/init.py.

quantmsrescore/__init__.py (1)

7-7: version synchronized with pyproject.

Good consistency between package metadata and code.

quantmsrescore/ms2rescore.py (1)

199-200: Propagating consider_modloss to FeatureAnnotator is correct.

Wiring looks consistent with the new API.

quantmsrescore/annotator.py (3)

119-120: State wiring for consider_modloss looks good.

Internal storage is straightforward and consistent with usage downstream.


359-361: Passing consider_modloss to AlphaPeptDeepAnnotator is correct.

The new option is properly propagated into AlphaPeptDeep.


444-446: Model selection uses calibration subset correctly; ensure non-empty set.

If calibration_set is empty (e.g., all decoys), both model searches may degrade. Consider guarding and falling back to a small random target subset or the original model.

Do you want me to add a safe fallback (e.g., select top-scoring targets via _get_top_batch_psms) when calibration_set is empty?

quantmsrescore/alphapeptdeep.py (4)

46-46: Parameter naming change from mask_modloss to consider_modloss is appropriate.

The rename from mask_modloss to consider_modloss improves clarity by using positive logic rather than negative. This aligns with the CLI changes mentioned in the PR summary.


774-782: Good implementation of masking for zero values.

The masking approach to handle non-zero fragment values is correct and efficient. This ensures that only valid fragment ions are considered in the downstream processing.


787-789: Consistent use of masked arrays throughout the data flow.

The masked arrays are correctly propagated through theoretical_mz and predicted_intensity, ensuring consistency in the data processing pipeline.


401-401: Consistent propagation of consider_modloss parameter.

The consider_modloss parameter is consistently passed through all the relevant methods in the AlphaPeptDeepAnnotator class, maintaining the modloss-aware functionality throughout the annotation process.

Also applies to: 419-419, 458-458, 512-512, 631-631

Comment on lines +6 to +7
from peptdeep.model.ms2 import pDeepModel
from alphabase.peptide.fragment import create_fragment_mz_dataframe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Ensure the new imports are utilized correctly.

The new imports pDeepModel and create_fragment_mz_dataframe are added to support modloss-aware fragment modeling. Verify that these are available in the installed version of the peptdeep/alphabase packages.


🏁 Script executed:

#!/bin/bash
# Check if pDeepModel and create_fragment_mz_dataframe are properly imported and used elsewhere
rg -A 5 "from peptdeep.model.ms2 import pDeepModel|from alphabase.peptide.fragment import create_fragment_mz_dataframe"

Length of output: 652


Fix unused imports in quantmsrescore/alphapeptdeep.py

It looks like both pDeepModel and create_fragment_mz_dataframe are imported on lines 6–7 but never referenced elsewhere in this file. Please address the following:

  • quantmsrescore/alphapeptdeep.py:6–7
    - from peptdeep.model.ms2 import pDeepModel
    - from alphabase.peptide.fragment import create_fragment_mz_dataframe
  • If you plan to enable modloss-aware fragment modeling here, add code to instantiate pDeepModel and call create_fragment_mz_dataframe at the appropriate location.
  • Otherwise, remove these two imports to keep the code clean and avoid lint errors.
  • Finally, verify that your project’s pinned versions of peptdeep and alphabase actually export these symbols, and update your dependency definitions (requirements.txt / setup.py / pyproject.toml) if needed.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from peptdeep.model.ms2 import pDeepModel
from alphabase.peptide.fragment import create_fragment_mz_dataframe
🤖 Prompt for AI Agents
In quantmsrescore/alphapeptdeep.py around lines 6–7, the imports pDeepModel and
create_fragment_mz_dataframe are unused; either remove these two import lines to
silence lint errors and keep the file clean, or if you intend to add
modloss-aware fragment modeling, instantiate pDeepModel and call
create_fragment_mz_dataframe at the correct integration point (e.g., when
building fragment predictions) and wire their outputs into the existing code
paths; finally, confirm that the pinned versions of peptdeep and alphabase in
requirements.txt / setup.py / pyproject.toml export these symbols and update the
dependency versions if they do not.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
quantmsrescore/idxmlreader.py (1)

290-300: Complex modification parsing logic could benefit from extraction.

The N-terminal modification parsing logic is complex with multiple nested conditions and string operations. Consider extracting this into a separate helper method for improved readability and maintainability.

Example refactoring approach:

+def _format_nterm_modification(mod_id: str, mod_parts: list) -> str:
+    """Format N-terminal modification string."""
+    if mod_parts[0] == "N-term":
+        suffix = mod_parts[-1] if len(mod_parts) == 2 else ""
+        term_label = "^Any_N-term" if suffix else "Any_N-term"
+        return f"{mod_id}@{suffix}{term_label}" if suffix else f"{mod_id}@{term_label}"
+    else:
+        if len(mod_parts) == 3:
+            return f"{mod_id}@{mod_parts[-1]}^{'_'.join(mod_parts[:-1])}"
+        return f"{mod_id}@{'_'.join(mod_parts)}"

 def extract_modifications(peptidoform, mods_name_dict):
     # ... existing code ...
     for i, v in enumerate(list(pattern.finditer(peptidoform))):
         if peptidoform.startswith(".") and i == 0:
             modsites = mods_name_dict[v.group(0)[1:-1]].split(" ")
-            if "".join(modsites[0]) == "N-term":
-                if len(modsites) == 2:
-                    mods_res.append(v.group(0)[1:-1] + "@" + modsites[-1] + "^Any_N-term")
-                else:
-                    mods_res.append(v.group(0)[1:-1] + "@" + "Any_N-term")
-            else:
-                if len(modsites) == 3:
-                    mods_res.append(v.group(0)[1:-1] + "@" + modsites[-1] + "^" + "_".join(modsites[:-1]))
-                else:
-                    mods_res.append(v.group(0)[1:-1] + "@" + "_".join(modsites))
+            mods_res.append(_format_nterm_modification(v.group(0)[1:-1], modsites))
quantmsrescore/alphapeptdeep.py (1)

716-716: Fix typo in variable name.

Variable name 'precusor_df' is misspelled. It should be 'precursor_df' to match the correct spelling used elsewhere in the codebase.

Apply this diff:

-        precusor_df = psms_df
+        precursor_df = psms_df
🧹 Nitpick comments (2)
quantmsrescore/alphapeptdeep.py (1)

703-716: Consider extracting model initialization logic.

The model initialization logic with conditional fragment types could be extracted into a helper method to reduce duplication and improve maintainability.

Example refactoring:

+def _get_fragment_types(consider_modloss: bool) -> list:
+    """Get fragment types based on modloss consideration."""
+    base_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2']
+    if consider_modloss:
+        return base_types + ['b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']
+    return base_types
+
+def _initialize_local_model(model_dir: str, consider_modloss: bool) -> tuple:
+    """Initialize local pDeep model."""
+    fragment_types = _get_fragment_types(consider_modloss)
+    model = pDeepModel(charged_frag_types=fragment_types, device="cpu")
+    model.load(os.path.join(model_dir, "ms2.pth"))
+    return model, fragment_types

 def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model, model_dir,
                     ms2_tolerance, processes, consider_modloss):
     if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
-        if consider_modloss:
-            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                                   'b_modloss_z1', 'b_modloss_z2',
-                                                   'y_modloss_z1', 'y_modloss_z2'], device="cpu")
-            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
-                                                                       'b_modloss_z1', 'b_modloss_z2',
-                                                                       'y_modloss_z1', 'y_modloss_z2'])
-        else:
-            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")
-            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2'])
-        model.load(os.path.join(model_dir, "ms2.pth"))
+        model, fragment_types = _initialize_local_model(model_dir, consider_modloss)
+        theoretical_mz_df = create_fragment_mz_dataframe(psms_df, fragment_types)
         predict_int_df = model.predict(psms_df)
         precursor_df = psms_df
quantmsrescore/annotator.py (1)

446-464: Consider extracting duplicated validation logic.

The validation and fallback logic for AlphaPeptDeep (lines 447-463) is nearly identical to the MS2PIP logic (lines 466-481). This could be refactored into a shared method to improve maintainability.

Consider extracting a helper method:

def _apply_best_model(self, generator, psm_list, best_model, best_corr, 
                      original_model, generator_name, psms_df=None):
    """Apply the best model with validation and fallback logic."""
    if psms_df is not None:
        # AlphaPeptDeep case
        validation_args = (psm_list, psms_df)
    else:
        # MS2PIP case
        validation_args = (psm_list,)
    
    if best_model and generator.validate_features(psm_list=psm_list, 
                                                  *validation_args[1:], 
                                                  model=best_model):
        model_to_use = best_model
        logger.info(f"Using best model: {model_to_use} with correlation: {best_corr:.4f}")
    else:
        # Fallback to original model if best model doesn't validate
        if generator.validate_features(*validation_args, model=original_model):
            logger.warning("Best model validation failed, falling back to original model")
            model_to_use = original_model
        else:
            logger.error("Both best model and original model validation failed")
            return False  # Indicate failure
    
    # Apply the selected model
    generator.model = model_to_use
    generator.add_features(*validation_args)
    logger.info(f"Successfully applied {generator_name} annotation using model: {model_to_use}")
    self.ms2_generator = generator_name
    return True  # Indicate success
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f070da8 and 6c71878.

📒 Files selected for processing (6)
  • pyproject.toml (1 hunks)
  • quantmsrescore/__init__.py (1 hunks)
  • quantmsrescore/alphapeptdeep.py (13 hunks)
  • quantmsrescore/annotator.py (5 hunks)
  • quantmsrescore/idxmlreader.py (2 hunks)
  • quantmsrescore/ms2rescore.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
quantmsrescore/idxmlreader.py (1)
quantmsrescore/openms.py (1)
  • is_decoy_peptide_hit (131-149)
quantmsrescore/annotator.py (3)
quantmsrescore/idxmlreader.py (2)
  • psms_df (102-104)
  • psms_df (107-111)
quantmsrescore/ms2pip.py (1)
  • _find_best_ms2pip_model (413-460)
quantmsrescore/alphapeptdeep.py (1)
  • _find_best_ms2_model (596-639)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build (3.11)
🔇 Additional comments (12)
pyproject.toml (1)

6-6: LGTM!

Version bump to 0.0.12 correctly aligns with the changes in quantmsrescore/__init__.py.

quantmsrescore/idxmlreader.py (1)

271-273: LGTM! Good addition of is_decoy and score fields.

The inclusion of is_decoy and score fields in the PSM dataframe provides valuable information for downstream processing and filtering. The implementation correctly uses the existing helper method.

quantmsrescore/__init__.py (1)

7-7: LGTM!

Version bump to 0.0.12 is consistent with the version update in pyproject.toml.

quantmsrescore/ms2rescore.py (2)

109-111: Parameter rename from mask_modloss to consider_modloss is clear and intuitive.

The semantic change from "masking" to "considering" modloss ions better reflects the actual behavior of the parameter.


199-199: LGTM!

The parameter is correctly passed to the FeatureAnnotator with the updated name.

quantmsrescore/alphapeptdeep.py (1)

774-782: Good implementation of explicit masking for zero values.

The explicit masking of zero values in fragment arrays ensures that only valid fragments are considered in downstream processing. This is a robust approach to handle missing fragments.

quantmsrescore/annotator.py (6)

33-33: LGTM! Parameter rename is consistent across the module.

The parameter rename from mask_modloss to consider_modloss is properly implemented here.


119-119: LGTM! Instance variable properly initialized.

The parameter is correctly stored as a private instance variable.


359-361: LGTM! AlphaPeptDeep annotator creation properly updated.

The consider_modloss parameter is correctly passed to the AlphaPeptDeep annotator constructor.


443-446: LGTM! Model selection logic properly implements calibration-based comparison.

The implementation correctly:

  1. Uses the calibration set (non-decoy PSMs) for MS2PIP model selection
  2. Uses the filtered DataFrame (without decoys) for AlphaPeptDeep model selection
  3. Compares correlations to determine the best model

606-608: The replacement only affects how feature keys are labeled on the PSMs before being written out; there are no downstream branches or lookups keyed on the literal "MS2PIP:*" strings. All of the .replace("MS2PIP", "AlphaPeptDeep") does is rename the column prefixes in the final psm["rescoring_features"] dict. Since:

• No code ever does if canonical_feature == "MS2PIP:…".
• No mapping back to MS2PIP_FEATURES happens after this point.
• The rescoring engine consumes whatever feature names are present.

this renaming will not break anything downstream.

I recommend resolving this comment.


427-439: Please confirm PSMList.copy() behavior and decoy‐filter consistency

In quantmsrescore/annotator.py (lines 427–439):

  • PSMList.copy(): We weren’t able to locate its implementation in the repo. Verify that this method returns a new PSMList instance with its own underlying list (otherwise filtering on batch_psms_copy.psm_list may still mutate the original).
  • Decoy filtering: You use if not result.is_decoy on the PSM objects and psms_df["is_decoy"] == 0 on the DataFrame. Although False == 0 evaluates as True in pandas, consider standardizing the check (e.g. psms_df.loc[~psms_df["is_decoy"]] or == False) to improve readability and avoid confusion.

@@ -747,29 +767,31 @@ def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern

# Process each PSM for this spectrum
for psm_idx, psm in psms_by_specid[spectrum_id]:

row = precusor_df.loc[next(iter(psm.provenance_data.keys()))]
row = precursor_df.loc[next(iter(psm.provenance_data.keys()))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure proper error handling for missing provenance data.

Using next(iter(psm.provenance_data.keys())) could raise a StopIteration if provenance_data is empty. Consider adding error handling.

Apply this diff to add error handling:

-            row = precursor_df.loc[next(iter(psm.provenance_data.keys()))]
+            try:
+                provenance_key = next(iter(psm.provenance_data.keys()))
+                row = precursor_df.loc[provenance_key]
+            except (StopIteration, KeyError) as e:
+                logger.warning(f"Skipping PSM due to missing provenance data: {e}")
+                continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
row = precursor_df.loc[next(iter(psm.provenance_data.keys()))]
try:
provenance_key = next(iter(psm.provenance_data.keys()))
row = precursor_df.loc[provenance_key]
except (StopIteration, KeyError) as e:
logger.warning(f"Skipping PSM due to missing provenance data: {e}")
continue
🤖 Prompt for AI Agents
In quantmsrescore/alphapeptdeep.py around line 770, the code uses
next(iter(psm.provenance_data.keys())) which will raise StopIteration if
provenance_data is empty; guard against an empty provenance_data by checking its
truthiness before using next(iter(...)) and handle the case (e.g., log and skip
this PSM or raise a clear ValueError) and then retrieve the key safely
(next(iter(psm.provenance_data)) or use list(...) with bounds check) so the code
either continues gracefully or fails with an explicit message when provenance is
missing.

Comment on lines +69 to 72
consider_modloss: bool, optional
If modloss ions are considered in the ms2 model. `modloss`
ions are mostly useful for phospho MS2 prediciton model.
Defaults to True.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix docstring inconsistency.

The docstring states "Defaults to True" but the parameter default is False.

Apply this diff to fix the documentation:

-        consider_modloss: bool, optional
-            If modloss ions are considered in the ms2 model. `modloss`
-            ions are mostly useful for phospho MS2 prediciton model.
-            Defaults to True.
+        consider_modloss: bool, optional
+            If modloss ions are considered in the ms2 model. `modloss`
+            ions are mostly useful for phospho MS2 prediction model.
+            Defaults to False.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
consider_modloss: bool, optional
If modloss ions are considered in the ms2 model. `modloss`
ions are mostly useful for phospho MS2 prediciton model.
Defaults to True.
consider_modloss: bool, optional
If modloss ions are considered in the ms2 model. `modloss`
ions are mostly useful for phospho MS2 prediction model.
Defaults to False.
🤖 Prompt for AI Agents
In quantmsrescore/annotator.py around lines 69 to 72, the docstring for the
parameter `consider_modloss` incorrectly states "Defaults to True" while the
function default is False; update the docstring text to "Defaults to False" (or
otherwise match the actual default) so the documentation and code are
consistent.

Comment on lines +178 to 181
consider_modloss: bool, optional
If modloss ions are considered in the ms2 model. `modloss`
ions are mostly useful for phospho MS2 prediciton model.
Defaults to True.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent docstring default value.

The docstring states "Defaults to True" but the CLI flag is actually False by default (using is_flag=True without default=True).

Apply this diff to fix the docstring:

     consider_modloss: bool, optional
         If modloss ions are considered in the ms2 model. `modloss`
         ions are mostly useful for phospho MS2 prediciton model.
-        Defaults to True.
+        Defaults to False.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
consider_modloss: bool, optional
If modloss ions are considered in the ms2 model. `modloss`
ions are mostly useful for phospho MS2 prediciton model.
Defaults to True.
consider_modloss: bool, optional
If modloss ions are considered in the ms2 model. `modloss`
ions are mostly useful for phospho MS2 prediciton model.
Defaults to False.
🤖 Prompt for AI Agents
In quantmsrescore/ms2rescore.py around lines 178 to 181, the docstring for the
parameter `consider_modloss` incorrectly states "Defaults to True" while the CLI
flag is false by default; update the docstring to reflect the actual default by
changing the sentence to "Defaults to False." so the documentation matches the
CLI behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
quantmsrescore/alphapeptdeep.py (1)

6-7: Duplicate comment detected for unused imports.

Both pDeepModel and create_fragment_mz_dataframe are imported but never used anywhere in this file. This was previously flagged but remains unaddressed.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7028c97 and 6c71878.

📒 Files selected for processing (1)
  • quantmsrescore/alphapeptdeep.py (13 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
quantmsrescore/alphapeptdeep.py (1)
quantmsrescore/idxmlreader.py (2)
  • psms_df (102-104)
  • psms_df (107-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build (3.11)
🔇 Additional comments (10)
quantmsrescore/alphapeptdeep.py (10)

46-46: LGTM! Parameter rename successfully implemented.

The parameter has been correctly renamed from mask_modloss to consider_modloss with appropriate default value and storage as instance variable.

Also applies to: 83-83


195-195: LGTM! Parameter propagation is consistent.

The consider_modloss parameter is correctly propagated through all the custom_correlate function calls across different methods.

Also applies to: 458-458, 512-512, 631-631


401-401: LGTM! AlphaPeptDeepAnnotator correctly inherits the new parameter.

The consider_modloss parameter is properly added to the constructor and stored as an instance variable.

Also applies to: 419-419


679-679: LGTM! Function signature updates are consistent.

The mask_modloss parameter has been correctly renamed to consider_modloss across all function signatures.

Also applies to: 689-689, 702-702


720-731: LGTM! Modloss-aware fragment type handling is correctly implemented.

The conditional logic properly handles different fragment types based on the consider_modloss flag, including modloss fragment types when enabled.


770-770: LGTM! Provenance data handling is correct.

The code properly accesses the provenance data key to locate the corresponding row in the precursor dataframe.


774-775: LGTM! Masking logic correctly filters non-zero fragments.

The implementation correctly applies boolean masks to filter out zero-valued theoretical m/z values and uses these masks consistently across theoretical m/z, targets, and predictions.

Also applies to: 778-778, 781-782, 787-789


568-572: LGTM! Decoy filtering for calibration set is correctly implemented.

The validation method properly filters out decoy PSMs and selects only rank-1 target PSMs for the calibration set, which aligns with the PR objectives.


704-716: No action required: pDeepModel and create_fragment_mz_dataframe are used
Both imports are invoked in the consider_modloss and default branches, so they are not unused.


718-718: Verify the mask_modloss inversion in ModelManager

Please confirm that inverting the consider_modloss flag when calling ModelManager matches the intended behavior in the external peptdeep API—i.e., that setting mask_modloss=True indeed disables modloss handling and vice versa.

Locations to check:

  • quantmsrescore/alphapeptdeep.py:718
        model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")

Since ModelManager comes from an external package, please review its documentation or source to ensure that mask_modloss was renamed and inverted correctly relative to consider_modloss. If the logic is reversed, adjust the argument accordingly or add an inline comment to clarify.

precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
"fragment_intensity_df"], predictions["fragment_mz_df"]
precusor_df = precusor_df.set_index("provenance_data")
precursor_df = precursor_df.set_index("provenance_data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix variable assignment after conditional initialization.

The precursor_df assignment occurs outside the conditional blocks, but precursor_df is only defined in the else block (line 730-731). In the if block, it should be assigned to psms_df.

     if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
         if consider_modloss:
             model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                    'b_modloss_z1', 'b_modloss_z2',
                                                    'y_modloss_z1', 'y_modloss_z2'], device="cpu")
             theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                                        'b_modloss_z1', 'b_modloss_z2',
                                                                        'y_modloss_z1', 'y_modloss_z2'])
         else:
             model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")
             theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2'])
         model.load(os.path.join(model_dir, "ms2.pth"))
         predict_int_df = model.predict(psms_df)
-        precursor_df = psms_df
+        precursor_df = psms_df
     else:
         model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")
         model_mgr.load_installed_models(model)
         if consider_modloss:
             predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
                                                 frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                             'b_modloss_z1', 'b_modloss_z2',
                                                             'y_modloss_z1', 'y_modloss_z2'],
                                                 process_num=processes)
         else:
             predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
                                                 frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'],
                                                 process_num=processes)
         precursor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
             "fragment_intensity_df"], predictions["fragment_mz_df"]

-    results = []
-    precursor_df = precursor_df.set_index("provenance_data")
+    results = []
+    precursor_df = precursor_df.set_index("provenance_data")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In quantmsrescore/alphapeptdeep.py around line 734, precursor_df is being
set_index("provenance_data") but precursor_df is only defined in the else
branch; in the if branch you should assign precursor_df = psms_df (or set index
on psms_df and assign that to precursor_df) before calling set_index so
precursor_df is always defined; update the if branch to create/assign
precursor_df from psms_df and then call set_index("provenance_data") on it.

@ypriverol ypriverol merged commit 13f498f into main Aug 12, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants