Enhance and fixed bugs #33

daichengxin · 2025-08-12T06:39:29Z

PR Type

Enhancement, Bug fix

Description

Renamed mask_modloss parameter to consider_modloss for clarity
Enhanced modification parsing for N-terminal modifications
Added decoy and score information to PSM dataframe
Improved model prediction logic with modloss consideration

Diagram Walkthrough

flowchart LR
  A["Parameter Rename"] --> B["mask_modloss → consider_modloss"]
  C["Modification Parsing"] --> D["Enhanced N-term handling"]
  E["PSM DataFrame"] --> F["Added decoy/score fields"]
  G["Model Prediction"] --> H["Improved modloss logic"]

File Walkthrough

Relevant files

Enhancement

alphapeptdeep.py `Enhanced modloss handling and model prediction logic` quantmsrescore/alphapeptdeep.py Renamed `mask_modloss` parameter to `consider_modloss` throughout the file Added new imports for `pDeepModel` and `create_fragment_mz_dataframe` Enhanced model prediction logic to handle modloss ions properly Improved fragment filtering with proper masking for zero values	+48/-26
annotator.py `Updated modloss parameter and model selection` quantmsrescore/annotator.py Updated parameter name from `mask_modloss` to `consider_modloss` Added filtering for target PSMs only in model selection Enhanced model comparison logic for better accuracy	+21/-6
ms2rescore.py `Updated CLI parameter for modloss handling` quantmsrescore/ms2rescore.py Updated CLI parameter from `--mask_modloss` to `--consider_modloss` Updated parameter documentation for clarity	+6/-6

Bug fix

idxmlreader.py `Fixed modification parsing and added PSM metadata` quantmsrescore/idxmlreader.py Added `is_decoy` and `score` fields to PSM dataframe Fixed N-terminal modification parsing with proper format handling Enhanced modification extraction logic for complex modifications	+14/-2

Dev

Dockerfile update

coderabbitai · 2025-08-12T06:39:36Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

qodo-merge-pro · 2025-08-12T06:39:58Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue In `make_prediction`, when loading an external ms2 model file, variables `precusor_df`, `predict_int_df`, and `theoretical_mz_df` are created, but the code sets `precusor_df = psms_df` right before usage. Ensure `precusor_df` has the expected indices/columns (e.g., `frag_start_idx/frag_stop_idx` and `provenance_data`) or use the predicted `precursor_df` from the model; otherwise indexing may fail or select wrong ranges. if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")): if consider_modloss: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'], device="cpu") theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']) else: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu") theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2']) model.load(os.path.join(model_dir, "ms2.pth")) predict_int_df = model.predict(psms_df) precusor_df = psms_df else: model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu") model_mgr.load_installed_models(model) if consider_modloss: predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"], frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'], process_num=processes) else: predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"], frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], process_num=processes) precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[ "fragment_intensity_df"], predictions["fragment_mz_df"] results = [] Name Shadowing The parameter `model` is reused as a local variable to hold a `pDeepModel` instance inside `make_prediction`. This can be confusing and may impact log messages that expect a string model name. Consider using a different local variable (e.g., `ms2_model`) and preserve the original `model` identifier for reporting. if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")): if consider_modloss: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'], device="cpu") theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']) else: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu") theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2']) model.load(os.path.join(model_dir, "ms2.pth")) predict_int_df = model.predict(psms_df) precusor_df = psms_df Docstring Mismatch The new `consider_modloss` option’s docstrings say “Defaults to True” but the default passed/defined is `False`. Align documentation with actual default to prevent user confusion. remove_missing_spectra : bool, optional Remove PSMs with missing spectra (default: True). ms2_only : bool, optional Process only MS2-level PSMs (default: True). force_model : bool, optional Force the use of the provided MS2 model (default: False). find_best_model : bool, optional Force the use of the best MS2 model (default: False). consider_modloss: bool, optional If modloss ions are considered in the ms2 model. `modloss` ions are mostly useful for phospho MS2 prediciton model. Defaults to True.

qodo-merge-pro · 2025-08-12T06:41:07Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Risky pDeep integration path The new code conditionally bypasses ModelManager to load a pDeepModel directly when an external ms2.pth exists, but leaves variables like precusor_df/predict_int_df/theoretical_mz_df inconsistently set and may diverge from expected preprocessing/postprocessing (e.g., frag type alignment, normalization). Unify the prediction path (external vs installed) through a single abstraction that guarantees identical outputs, validated shapes, and consistent masking, or ensure the custom pDeep path builds and returns the same data structures as ModelManager, including proper precursor_df construction and fragment_mz_df generation. Examples: quantmsrescore/alphapeptdeep.py [701-731] def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model, model_dir, ms2_tolerance, processes, consider_modloss): if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")): if consider_modloss: model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'], device="cpu") theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2', 'b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2']) ... (clipped 21 lines) Solution Walkthrough: Before: def make_prediction(...): if local_model_exists: # Path 1: Use pDeepModel directly model = pDeepModel(...) predict_int_df = model.predict(psms_df) theoretical_mz_df = create_fragment_mz_dataframe(...) precusor_df = psms_df # Potentially inconsistent with else-branch else: # Path 2: Use ModelManager model_mgr = ModelManager(...) predictions = model_mgr.predict_all(...) precusor_df, predict_int_df, theoretical_mz_df = predictions[...] # ... process results using potentially inconsistent dataframes After: def make_prediction(...): # Unified prediction path model_mgr = ModelManager(...) if local_model_exists: # The old implementation used this, which provides a unified interface model_mgr.load_external_models(ms2_model_file=...) else: model_mgr.load_installed_models(...) # A single, consistent prediction call predictions = model_mgr.predict_all(...) precusor_df, predict_int_df, theoretical_mz_df = predictions[...] # ... process results using consistent dataframes Suggestion importance[1-10]: 9 __ Why: This suggestion correctly identifies a critical design flaw where two separate prediction code paths can lead to inconsistent data structures (`precusor_df`, `predict_int_df`), potentially causing subtle bugs or incorrect results.	High
General	Handle absent decoy column Guard against missing `is_decoy` in `psms_df` when older idXML inputs are used. Default to treating all PSMs as targets if the column is absent to prevent KeyError during best model selection. quantmsrescore/annotator.py [438] -psms_df_without_decoy = psms_df[psms_df["is_decoy"] == 0] +if "is_decoy" in psms_df.columns: + psms_df_without_decoy = psms_df[psms_df["is_decoy"] == 0] +else: + psms_df_without_decoy = psms_df Apply / Chat Suggestion importance[1-10]: 7 __ Why: This suggestion correctly identifies a potential `KeyError` if the `is_decoy` column is missing and provides a robust solution, improving backward compatibility.	Medium
Update

daichengxin added 3 commits July 25, 2025 10:20

Merge pull request #16 from bigbio/dev

576e066

Dev

Merge pull request #18 from bigbio/dev

91845e3

Dockerfile update

fixed bugs and enhance

744b771

qodo-merge-pro bot added the Review effort 3/5 label Aug 12, 2025

daichengxin requested a review from ypriverol August 12, 2025 07:15

version

06f91c4

ypriverol approved these changes Aug 12, 2025

View reviewed changes

ypriverol merged commit 4dcdb40 into bigbio:dev Aug 12, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance and fixed bugs #33

Enhance and fixed bugs #33

Uh oh!

daichengxin commented Aug 12, 2025 •

edited by qodo-merge-pro bot

Loading

Uh oh!

coderabbitai bot commented Aug 12, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

qodo-merge-pro bot commented Aug 12, 2025

Uh oh!

qodo-merge-pro bot commented Aug 12, 2025 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Uh oh!

Uh oh!

Enhance and fixed bugs #33

Enhance and fixed bugs #33

Uh oh!

Conversation

daichengxin commented Aug 12, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

coderabbitai bot commented Aug 12, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

qodo-merge-pro bot commented Aug 12, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Uh oh!

Uh oh!

daichengxin commented Aug 12, 2025 •

edited by qodo-merge-pro bot

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

qodo-merge-pro bot commented Aug 12, 2025 •

edited

Loading