Skip to content

Conversation

daichengxin
Copy link
Collaborator

@daichengxin daichengxin commented Aug 12, 2025

PR Type

Enhancement, Bug fix


Description

  • Renamed mask_modloss parameter to consider_modloss for clarity

  • Enhanced modification parsing for N-terminal modifications

  • Added decoy and score information to PSM dataframe

  • Improved model prediction logic with modloss consideration


Diagram Walkthrough

flowchart LR
  A["Parameter Rename"] --> B["mask_modloss → consider_modloss"]
  C["Modification Parsing"] --> D["Enhanced N-term handling"]
  E["PSM DataFrame"] --> F["Added decoy/score fields"]
  G["Model Prediction"] --> H["Improved modloss logic"]
Loading

File Walkthrough

Relevant files
Enhancement
alphapeptdeep.py
Enhanced modloss handling and model prediction logic         

quantmsrescore/alphapeptdeep.py

  • Renamed mask_modloss parameter to consider_modloss throughout the file
  • Added new imports for pDeepModel and create_fragment_mz_dataframe
  • Enhanced model prediction logic to handle modloss ions properly
  • Improved fragment filtering with proper masking for zero values
+48/-26 
annotator.py
Updated modloss parameter and model selection                       

quantmsrescore/annotator.py

  • Updated parameter name from mask_modloss to consider_modloss
  • Added filtering for target PSMs only in model selection
  • Enhanced model comparison logic for better accuracy
+21/-6   
ms2rescore.py
Updated CLI parameter for modloss handling                             

quantmsrescore/ms2rescore.py

  • Updated CLI parameter from --mask_modloss to --consider_modloss
  • Updated parameter documentation for clarity
+6/-6     
Bug fix
idxmlreader.py
Fixed modification parsing and added PSM metadata               

quantmsrescore/idxmlreader.py

  • Added is_decoy and score fields to PSM dataframe
  • Fixed N-terminal modification parsing with proper format handling
  • Enhanced modification extraction logic for complex modifications
+14/-2   

Copy link
Contributor

coderabbitai bot commented Aug 12, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

In make_prediction, when loading an external ms2 model file, variables precusor_df, predict_int_df, and theoretical_mz_df are created, but the code sets precusor_df = psms_df right before usage. Ensure precusor_df has the expected indices/columns (e.g., frag_start_idx/frag_stop_idx and provenance_data) or use the predicted precursor_df from the model; otherwise indexing may fail or select wrong ranges.

if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
    if consider_modloss:
        model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                               'b_modloss_z1', 'b_modloss_z2',
                                               'y_modloss_z1', 'y_modloss_z2'], device="cpu")
        theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                                   'b_modloss_z1', 'b_modloss_z2',
                                                                   'y_modloss_z1', 'y_modloss_z2'])
    else:
        model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")
        theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2'])
    model.load(os.path.join(model_dir, "ms2.pth"))
    predict_int_df = model.predict(psms_df)
    precusor_df = psms_df
else:
    model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")
    model_mgr.load_installed_models(model)
    if consider_modloss:
        predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
                                            frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                        'b_modloss_z1', 'b_modloss_z2',
                                                        'y_modloss_z1', 'y_modloss_z2'],
                                            process_num=processes)
    else:
        predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
                                            frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'],
                                            process_num=processes)
    precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
        "fragment_intensity_df"], predictions["fragment_mz_df"]

results = []
Name Shadowing

The parameter model is reused as a local variable to hold a pDeepModel instance inside make_prediction. This can be confusing and may impact log messages that expect a string model name. Consider using a different local variable (e.g., ms2_model) and preserve the original model identifier for reporting.

if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
    if consider_modloss:
        model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                               'b_modloss_z1', 'b_modloss_z2',
                                               'y_modloss_z1', 'y_modloss_z2'], device="cpu")
        theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                                   'b_modloss_z1', 'b_modloss_z2',
                                                                   'y_modloss_z1', 'y_modloss_z2'])
    else:
        model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")
        theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2'])
    model.load(os.path.join(model_dir, "ms2.pth"))
    predict_int_df = model.predict(psms_df)
    precusor_df = psms_df
Docstring Mismatch

The new consider_modloss option’s docstrings say “Defaults to True” but the default passed/defined is False. Align documentation with actual default to prevent user confusion.

remove_missing_spectra : bool, optional
    Remove PSMs with missing spectra (default: True).
ms2_only : bool, optional
    Process only MS2-level PSMs (default: True).
force_model : bool, optional
    Force the use of the provided MS2 model (default: False).
find_best_model : bool, optional
    Force the use of the best MS2 model (default: False).
consider_modloss: bool, optional
    If modloss ions are considered in the ms2 model. `modloss`
    ions are mostly useful for phospho MS2 prediciton model.
    Defaults to True.

Copy link

qodo-merge-pro bot commented Aug 12, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Risky pDeep integration path

The new code conditionally bypasses ModelManager to load a pDeepModel directly
when an external ms2.pth exists, but leaves variables like
precusor_df/predict_int_df/theoretical_mz_df inconsistently set and may diverge
from expected preprocessing/postprocessing (e.g., frag type alignment,
normalization). Unify the prediction path (external vs installed) through a
single abstraction that guarantees identical outputs, validated shapes, and
consistent masking, or ensure the custom pDeep path builds and returns the same
data structures as ModelManager, including proper precursor_df construction and
fragment_mz_df generation.

Examples:

quantmsrescore/alphapeptdeep.py [701-731]
def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model, model_dir,
                    ms2_tolerance, processes, consider_modloss):
    if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
        if consider_modloss:
            model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                   'b_modloss_z1', 'b_modloss_z2',
                                                   'y_modloss_z1', 'y_modloss_z2'], device="cpu")
            theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
                                                                       'b_modloss_z1', 'b_modloss_z2',
                                                                       'y_modloss_z1', 'y_modloss_z2'])

 ... (clipped 21 lines)

Solution Walkthrough:

Before:

def make_prediction(...):
    if local_model_exists:
        # Path 1: Use pDeepModel directly
        model = pDeepModel(...)
        predict_int_df = model.predict(psms_df)
        theoretical_mz_df = create_fragment_mz_dataframe(...)
        precusor_df = psms_df  # Potentially inconsistent with else-branch
    else:
        # Path 2: Use ModelManager
        model_mgr = ModelManager(...)
        predictions = model_mgr.predict_all(...)
        precusor_df, predict_int_df, theoretical_mz_df = predictions[...]
    
    # ... process results using potentially inconsistent dataframes

After:

def make_prediction(...):
    # Unified prediction path
    model_mgr = ModelManager(...)
    if local_model_exists:
        # The old implementation used this, which provides a unified interface
        model_mgr.load_external_models(ms2_model_file=...)
    else:
        model_mgr.load_installed_models(...)

    # A single, consistent prediction call
    predictions = model_mgr.predict_all(...)
    precusor_df, predict_int_df, theoretical_mz_df = predictions[...]
    
    # ... process results using consistent dataframes
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical design flaw where two separate prediction code paths can lead to inconsistent data structures (precusor_df, predict_int_df), potentially causing subtle bugs or incorrect results.

High
General
Handle absent decoy column

Guard against missing is_decoy in psms_df when older idXML inputs are used.
Default to treating all PSMs as targets if the column is absent to prevent
KeyError during best model selection.

quantmsrescore/annotator.py [438]

-psms_df_without_decoy = psms_df[psms_df["is_decoy"] == 0]
+if "is_decoy" in psms_df.columns:
+    psms_df_without_decoy = psms_df[psms_df["is_decoy"] == 0]
+else:
+    psms_df_without_decoy = psms_df
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: This suggestion correctly identifies a potential KeyError if the is_decoy column is missing and provides a robust solution, improving backward compatibility.

Medium
  • Update

@daichengxin daichengxin requested a review from ypriverol August 12, 2025 07:15
@ypriverol ypriverol merged commit 4dcdb40 into bigbio:dev Aug 12, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants