-
Notifications
You must be signed in to change notification settings - Fork 1
major changes to integrate AlphaPepDeep #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Dockerfile update
Enhance and fixed bugs
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughVersion bumped to 0.0.12. Renamed mask_modloss → consider_modloss across CLI, annotator, and AlphaPeptDeep components and wired modloss-aware MS2 model/prediction paths. Introduced a non-decoy, rank-1 calibration subset for MS2 model selection. idxmlreader now adds is_decoy and score and refines modification-string handling. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as msrescore2feature (CLI)
participant FA as FeatureAnnotator
participant APD as AlphaPeptDeepAnnotator
participant MM as ModelManager/pDeepModel
participant Pred as Predictor
User->>CLI: --consider_modloss [True/False]
CLI->>FA: construct FeatureAnnotator(consider_modloss)
FA->>APD: create AlphaPeptDeepAnnotator(consider_modloss)
alt consider_modloss == True
APD->>MM: load modloss-capable MS2 model/types
else
APD->>MM: load standard MS2 model/types
end
APD->>Pred: predict intensities (per PSM)
Pred-->>APD: fragment intensities + mz
APD-->>FA: annotated features
FA-->>User: output features
sequenceDiagram
participant FA as FeatureAnnotator
participant Data as PSMList/DF
participant MS2PIP as MS2PIP Selector
participant APD as AlphaPeptDeep Selector
FA->>Data: copy batch, build calibration_set (non-decoy rank-1)
FA->>MS2PIP: select best model(calibration_set)
FA->>APD: select best model(calibration_set, df_without_decoy)
FA->>FA: choose model_to_use
FA-->>Data: apply chosen model to full batch
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates AlphaPeptDeep as an alternative to MS2PIP for peptide spectrum prediction and rescoring. The changes reverse the logic from masking modloss ions to considering them, allowing better handling of modified peptides, particularly for phosphorylation analysis.
- Changes parameter semantics from
mask_modloss
toconsider_modloss
with inverted logic - Updates AlphaPeptDeep integration to properly handle external models and modloss ions
- Adds decoy detection and scoring information to PSM processing pipeline
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
quantmsrescore/ms2rescore.py | Updates CLI parameter from mask_modloss to consider_modloss with inverted semantics |
quantmsrescore/idxmlreader.py | Adds decoy status and scoring fields to PSM data structure |
quantmsrescore/annotator.py | Updates annotator configuration to use new consider_modloss parameter |
quantmsrescore/alphapeptdeep.py | Major refactoring to support external models and proper modloss handling |
quantmsrescore/init.py | Version bump from 0.0.11 to 0.0.12 |
pyproject.toml | Version bump from 0.0.11 to 0.0.12 |
if len(modsites) == 3: | ||
mods_res.append(v.group(0)[1:-1] + "@" + modsites[-1] + "^" + "_".join(modsites[:-1])) | ||
else: | ||
mods_res.append(v.group(0)[1:-1] + "@" + "_".join(modsites)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This complex string manipulation logic for modification site parsing has multiple nested conditions and repeated string operations. Consider extracting this into a separate helper method to improve readability and maintainability.
Copilot uses AI. Check for mistakes.
PR Code Suggestions ✨Explore these optional code suggestions:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🔭 Outside diff range comments (2)
quantmsrescore/idxmlreader.py (1)
265-274
: New PSM DataFrame fields look good; ensure downstream uses boolean filtering for is_decoy.Adding instrument, is_decoy, and score is useful. Note that is_decoy is stored as a boolean; in quantmsrescore/annotator.py Line 438, the filter compares to 0 (integer). Prefer boolean masking there to avoid implicit casts.
Apply this diff in annotator.py Lines 438-439 to use a boolean mask:
- psms_df_without_decoy = psms_df[psms_df["is_decoy"] == 0] + psms_df_without_decoy = psms_df[~psms_df["is_decoy"]]quantmsrescore/annotator.py (1)
447-455
: Wrong fallback model variable for AlphaPeptDeep validation.In the AlphaPeptDeep branch, the fallback uses original_model from MS2PIP. This will be invalid for AlphaPeptDeep (e.g., “HCD2021” vs “generic”).
Apply this diff:
- else: - # Fallback to original model if best model doesn't validate - if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=original_model): + else: + # Fallback to the original AlphaPeptDeep model if best model doesn't validate + if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_generator.model):Additionally, for clarity earlier in the method (outside this hunk), consider capturing both originals:
# Near where original_model is defined for MS2PIP original_ms2pip_model = ms2pip_generator.model original_alphapeptdeep_model = alphapeptdeep_generator.model
♻️ Duplicate comments (1)
quantmsrescore/alphapeptdeep.py (1)
720-729
: Fragment type lists are duplicated again in the ModelManager path.The same fragment type lists appear here as in lines 704-710. This reinforces the need to extract them as constants.
🧹 Nitpick comments (5)
quantmsrescore/ms2rescore.py (1)
178-182
: Docstring: fix default value and typos.Current text says “Defaults to True” and “prediciton”. Align with default False and fix spelling.
Apply this diff:
- consider_modloss: bool, optional - If modloss ions are considered in the ms2 model. `modloss` - ions are mostly useful for phospho MS2 prediciton model. - Defaults to True. + consider_modloss: bool, optional + Whether to consider modloss ions in the MS2 model (useful for phospho). + Defaults to False.quantmsrescore/annotator.py (1)
33-34
: consider_modloss default mismatch and typos in docstring.Signature default is False, but docstring implies True and contains “prediciton”. Align the doc with the actual default and fix spelling.
Apply this diff:
- consider_modloss: bool, optional - If modloss ions are considered in the ms2 model. `modloss` - ions are mostly useful for phospho MS2 prediciton model. - Defaults to True. + consider_modloss: bool, optional + Whether to consider modloss ions in the MS2 model (useful for phospho). + Defaults to False.Also applies to: 69-73
quantmsrescore/alphapeptdeep.py (3)
704-710
: Consider extracting the fragment type lists to avoid duplication.The fragment type lists are duplicated across multiple conditions. Consider defining them as constants to improve maintainability.
+# Define fragment types as constants +STANDARD_FRAG_TYPES = ['b_z1', 'y_z1', 'b_z2', 'y_z2'] +MODLOSS_FRAG_TYPES = ['b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'] +ALL_FRAG_TYPES = STANDARD_FRAG_TYPES + MODLOSS_FRAG_TYPES + def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model, model_dir, ms2_tolerance, processes, consider_modloss): if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")): if consider_modloss: - model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', - 'b_modloss_z1', 'b_modloss_z2', - 'y_modloss_z1', 'y_modloss_z2'], device="cpu") - theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2', - 'b_modloss_z1', 'b_modloss_z2', - 'y_modloss_z1', 'y_modloss_z2']) + model = pDeepModel(charged_frag_types=ALL_FRAG_TYPES, device="cpu") + theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ALL_FRAG_TYPES) else: - model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu") - theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2']) + model = pDeepModel(charged_frag_types=STANDARD_FRAG_TYPES, device="cpu") + theoretical_mz_df = create_fragment_mz_dataframe(psms_df, STANDARD_FRAG_TYPES)
718-718
: Logic inversion formask_modloss
is correct but could benefit from a comment.The inversion logic
mask_modloss=not consider_modloss
correctly maintains backward compatibility with the ModelManager API. Consider adding a comment to clarify this intentional inversion.- model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu") + # ModelManager uses inverse logic: mask_modloss=True means don't consider modloss + model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")
716-716
: Typo in variable name:precusor_df
should beprecursor_df
.There's a typo in the variable name. While it's used consistently, it should be corrected for better code readability.
- precusor_df = psms_df + precursor_df = psms_dfAlso update line 730-731 and 734-735:
- precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[ + precursor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[ "fragment_intensity_df"], predictions["fragment_mz_df"] results = [] - precusor_df = precusor_df.set_index("provenance_data") + precursor_df = precursor_df.set_index("provenance_data")And line 770:
- row = precusor_df.loc[next(iter(psm.provenance_data.keys()))] + row = precursor_df.loc[next(iter(psm.provenance_data.keys()))]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
pyproject.toml
(1 hunks)quantmsrescore/__init__.py
(1 hunks)quantmsrescore/alphapeptdeep.py
(13 hunks)quantmsrescore/annotator.py
(5 hunks)quantmsrescore/idxmlreader.py
(2 hunks)quantmsrescore/ms2rescore.py
(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
quantmsrescore/idxmlreader.py (1)
quantmsrescore/openms.py (1)
is_decoy_peptide_hit
(131-149)
quantmsrescore/annotator.py (3)
quantmsrescore/idxmlreader.py (2)
psms_df
(102-104)psms_df
(107-111)quantmsrescore/ms2pip.py (1)
_find_best_ms2pip_model
(413-460)quantmsrescore/alphapeptdeep.py (1)
_find_best_ms2_model
(596-639)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build (3.11)
- GitHub Check: build
🔇 Additional comments (10)
pyproject.toml (1)
6-6
: Version bump aligns with codebase.The project version is correctly updated to 0.0.12 and matches quantmsrescore/init.py.
quantmsrescore/__init__.py (1)
7-7
: version synchronized with pyproject.Good consistency between package metadata and code.
quantmsrescore/ms2rescore.py (1)
199-200
: Propagating consider_modloss to FeatureAnnotator is correct.Wiring looks consistent with the new API.
quantmsrescore/annotator.py (3)
119-120
: State wiring for consider_modloss looks good.Internal storage is straightforward and consistent with usage downstream.
359-361
: Passing consider_modloss to AlphaPeptDeepAnnotator is correct.The new option is properly propagated into AlphaPeptDeep.
444-446
: Model selection uses calibration subset correctly; ensure non-empty set.If calibration_set is empty (e.g., all decoys), both model searches may degrade. Consider guarding and falling back to a small random target subset or the original model.
Do you want me to add a safe fallback (e.g., select top-scoring targets via _get_top_batch_psms) when calibration_set is empty?
quantmsrescore/alphapeptdeep.py (4)
46-46
: Parameter naming change frommask_modloss
toconsider_modloss
is appropriate.The rename from
mask_modloss
toconsider_modloss
improves clarity by using positive logic rather than negative. This aligns with the CLI changes mentioned in the PR summary.
774-782
: Good implementation of masking for zero values.The masking approach to handle non-zero fragment values is correct and efficient. This ensures that only valid fragment ions are considered in the downstream processing.
787-789
: Consistent use of masked arrays throughout the data flow.The masked arrays are correctly propagated through
theoretical_mz
andpredicted_intensity
, ensuring consistency in the data processing pipeline.
401-401
: Consistent propagation ofconsider_modloss
parameter.The
consider_modloss
parameter is consistently passed through all the relevant methods in theAlphaPeptDeepAnnotator
class, maintaining the modloss-aware functionality throughout the annotation process.Also applies to: 419-419, 458-458, 512-512, 631-631
from peptdeep.model.ms2 import pDeepModel | ||
from alphabase.peptide.fragment import create_fragment_mz_dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Ensure the new imports are utilized correctly.
The new imports pDeepModel
and create_fragment_mz_dataframe
are added to support modloss-aware fragment modeling. Verify that these are available in the installed version of the peptdeep/alphabase packages.
🏁 Script executed:
#!/bin/bash
# Check if pDeepModel and create_fragment_mz_dataframe are properly imported and used elsewhere
rg -A 5 "from peptdeep.model.ms2 import pDeepModel|from alphabase.peptide.fragment import create_fragment_mz_dataframe"
Length of output: 652
Fix unused imports in quantmsrescore/alphapeptdeep.py
It looks like both pDeepModel
and create_fragment_mz_dataframe
are imported on lines 6–7 but never referenced elsewhere in this file. Please address the following:
- quantmsrescore/alphapeptdeep.py:6–7
- from peptdeep.model.ms2 import pDeepModel - from alphabase.peptide.fragment import create_fragment_mz_dataframe
- If you plan to enable modloss-aware fragment modeling here, add code to instantiate
pDeepModel
and callcreate_fragment_mz_dataframe
at the appropriate location. - Otherwise, remove these two imports to keep the code clean and avoid lint errors.
- Finally, verify that your project’s pinned versions of
peptdeep
andalphabase
actually export these symbols, and update your dependency definitions (requirements.txt / setup.py / pyproject.toml) if needed.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from peptdeep.model.ms2 import pDeepModel | |
from alphabase.peptide.fragment import create_fragment_mz_dataframe |
🤖 Prompt for AI Agents
In quantmsrescore/alphapeptdeep.py around lines 6–7, the imports pDeepModel and
create_fragment_mz_dataframe are unused; either remove these two import lines to
silence lint errors and keep the file clean, or if you intend to add
modloss-aware fragment modeling, instantiate pDeepModel and call
create_fragment_mz_dataframe at the correct integration point (e.g., when
building fragment predictions) and wire their outputs into the existing code
paths; finally, confirm that the pinned versions of peptdeep and alphabase in
requirements.txt / setup.py / pyproject.toml export these symbols and update the
dependency versions if they do not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
quantmsrescore/idxmlreader.py (1)
290-300
: Complex modification parsing logic could benefit from extraction.The N-terminal modification parsing logic is complex with multiple nested conditions and string operations. Consider extracting this into a separate helper method for improved readability and maintainability.
Example refactoring approach:
+def _format_nterm_modification(mod_id: str, mod_parts: list) -> str: + """Format N-terminal modification string.""" + if mod_parts[0] == "N-term": + suffix = mod_parts[-1] if len(mod_parts) == 2 else "" + term_label = "^Any_N-term" if suffix else "Any_N-term" + return f"{mod_id}@{suffix}{term_label}" if suffix else f"{mod_id}@{term_label}" + else: + if len(mod_parts) == 3: + return f"{mod_id}@{mod_parts[-1]}^{'_'.join(mod_parts[:-1])}" + return f"{mod_id}@{'_'.join(mod_parts)}" def extract_modifications(peptidoform, mods_name_dict): # ... existing code ... for i, v in enumerate(list(pattern.finditer(peptidoform))): if peptidoform.startswith(".") and i == 0: modsites = mods_name_dict[v.group(0)[1:-1]].split(" ") - if "".join(modsites[0]) == "N-term": - if len(modsites) == 2: - mods_res.append(v.group(0)[1:-1] + "@" + modsites[-1] + "^Any_N-term") - else: - mods_res.append(v.group(0)[1:-1] + "@" + "Any_N-term") - else: - if len(modsites) == 3: - mods_res.append(v.group(0)[1:-1] + "@" + modsites[-1] + "^" + "_".join(modsites[:-1])) - else: - mods_res.append(v.group(0)[1:-1] + "@" + "_".join(modsites)) + mods_res.append(_format_nterm_modification(v.group(0)[1:-1], modsites))quantmsrescore/alphapeptdeep.py (1)
716-716
: Fix typo in variable name.Variable name 'precusor_df' is misspelled. It should be 'precursor_df' to match the correct spelling used elsewhere in the codebase.
Apply this diff:
- precusor_df = psms_df + precursor_df = psms_df
🧹 Nitpick comments (2)
quantmsrescore/alphapeptdeep.py (1)
703-716
: Consider extracting model initialization logic.The model initialization logic with conditional fragment types could be extracted into a helper method to reduce duplication and improve maintainability.
Example refactoring:
+def _get_fragment_types(consider_modloss: bool) -> list: + """Get fragment types based on modloss consideration.""" + base_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2'] + if consider_modloss: + return base_types + ['b_modloss_z1', 'b_modloss_z2', 'y_modloss_z1', 'y_modloss_z2'] + return base_types + +def _initialize_local_model(model_dir: str, consider_modloss: bool) -> tuple: + """Initialize local pDeep model.""" + fragment_types = _get_fragment_types(consider_modloss) + model = pDeepModel(charged_frag_types=fragment_types, device="cpu") + model.load(os.path.join(model_dir, "ms2.pth")) + return model, fragment_types def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model, model_dir, ms2_tolerance, processes, consider_modloss): if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")): - if consider_modloss: - model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', - 'b_modloss_z1', 'b_modloss_z2', - 'y_modloss_z1', 'y_modloss_z2'], device="cpu") - theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2', - 'b_modloss_z1', 'b_modloss_z2', - 'y_modloss_z1', 'y_modloss_z2']) - else: - model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu") - theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2']) - model.load(os.path.join(model_dir, "ms2.pth")) + model, fragment_types = _initialize_local_model(model_dir, consider_modloss) + theoretical_mz_df = create_fragment_mz_dataframe(psms_df, fragment_types) predict_int_df = model.predict(psms_df) precursor_df = psms_dfquantmsrescore/annotator.py (1)
446-464
: Consider extracting duplicated validation logic.The validation and fallback logic for AlphaPeptDeep (lines 447-463) is nearly identical to the MS2PIP logic (lines 466-481). This could be refactored into a shared method to improve maintainability.
Consider extracting a helper method:
def _apply_best_model(self, generator, psm_list, best_model, best_corr, original_model, generator_name, psms_df=None): """Apply the best model with validation and fallback logic.""" if psms_df is not None: # AlphaPeptDeep case validation_args = (psm_list, psms_df) else: # MS2PIP case validation_args = (psm_list,) if best_model and generator.validate_features(psm_list=psm_list, *validation_args[1:], model=best_model): model_to_use = best_model logger.info(f"Using best model: {model_to_use} with correlation: {best_corr:.4f}") else: # Fallback to original model if best model doesn't validate if generator.validate_features(*validation_args, model=original_model): logger.warning("Best model validation failed, falling back to original model") model_to_use = original_model else: logger.error("Both best model and original model validation failed") return False # Indicate failure # Apply the selected model generator.model = model_to_use generator.add_features(*validation_args) logger.info(f"Successfully applied {generator_name} annotation using model: {model_to_use}") self.ms2_generator = generator_name return True # Indicate success
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
pyproject.toml
(1 hunks)quantmsrescore/__init__.py
(1 hunks)quantmsrescore/alphapeptdeep.py
(13 hunks)quantmsrescore/annotator.py
(5 hunks)quantmsrescore/idxmlreader.py
(2 hunks)quantmsrescore/ms2rescore.py
(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
quantmsrescore/idxmlreader.py (1)
quantmsrescore/openms.py (1)
is_decoy_peptide_hit
(131-149)
quantmsrescore/annotator.py (3)
quantmsrescore/idxmlreader.py (2)
psms_df
(102-104)psms_df
(107-111)quantmsrescore/ms2pip.py (1)
_find_best_ms2pip_model
(413-460)quantmsrescore/alphapeptdeep.py (1)
_find_best_ms2_model
(596-639)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build (3.11)
🔇 Additional comments (12)
pyproject.toml (1)
6-6
: LGTM!Version bump to 0.0.12 correctly aligns with the changes in
quantmsrescore/__init__.py
.quantmsrescore/idxmlreader.py (1)
271-273
: LGTM! Good addition of is_decoy and score fields.The inclusion of
is_decoy
andscore
fields in the PSM dataframe provides valuable information for downstream processing and filtering. The implementation correctly uses the existing helper method.quantmsrescore/__init__.py (1)
7-7
: LGTM!Version bump to 0.0.12 is consistent with the version update in pyproject.toml.
quantmsrescore/ms2rescore.py (2)
109-111
: Parameter rename from mask_modloss to consider_modloss is clear and intuitive.The semantic change from "masking" to "considering" modloss ions better reflects the actual behavior of the parameter.
199-199
: LGTM!The parameter is correctly passed to the FeatureAnnotator with the updated name.
quantmsrescore/alphapeptdeep.py (1)
774-782
: Good implementation of explicit masking for zero values.The explicit masking of zero values in fragment arrays ensures that only valid fragments are considered in downstream processing. This is a robust approach to handle missing fragments.
quantmsrescore/annotator.py (6)
33-33
: LGTM! Parameter rename is consistent across the module.The parameter rename from
mask_modloss
toconsider_modloss
is properly implemented here.
119-119
: LGTM! Instance variable properly initialized.The parameter is correctly stored as a private instance variable.
359-361
: LGTM! AlphaPeptDeep annotator creation properly updated.The
consider_modloss
parameter is correctly passed to the AlphaPeptDeep annotator constructor.
443-446
: LGTM! Model selection logic properly implements calibration-based comparison.The implementation correctly:
- Uses the calibration set (non-decoy PSMs) for MS2PIP model selection
- Uses the filtered DataFrame (without decoys) for AlphaPeptDeep model selection
- Compares correlations to determine the best model
606-608
: The replacement only affects how feature keys are labeled on the PSMs before being written out; there are no downstream branches or lookups keyed on the literal"MS2PIP:*"
strings. All of the.replace("MS2PIP", "AlphaPeptDeep")
does is rename the column prefixes in the finalpsm["rescoring_features"]
dict. Since:• No code ever does
if canonical_feature == "MS2PIP:…"
.
• No mapping back toMS2PIP_FEATURES
happens after this point.
• The rescoring engine consumes whatever feature names are present.this renaming will not break anything downstream.
I recommend resolving this comment.
427-439
: Please confirm PSMList.copy() behavior and decoy‐filter consistencyIn quantmsrescore/annotator.py (lines 427–439):
- PSMList.copy(): We weren’t able to locate its implementation in the repo. Verify that this method returns a new PSMList instance with its own underlying list (otherwise filtering on
batch_psms_copy.psm_list
may still mutate the original).- Decoy filtering: You use
if not result.is_decoy
on the PSM objects andpsms_df["is_decoy"] == 0
on the DataFrame. AlthoughFalse == 0
evaluates as True in pandas, consider standardizing the check (e.g.psms_df.loc[~psms_df["is_decoy"]]
or== False
) to improve readability and avoid confusion.
@@ -747,29 +767,31 @@ def make_prediction(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern | |||
|
|||
# Process each PSM for this spectrum | |||
for psm_idx, psm in psms_by_specid[spectrum_id]: | |||
|
|||
row = precusor_df.loc[next(iter(psm.provenance_data.keys()))] | |||
row = precursor_df.loc[next(iter(psm.provenance_data.keys()))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure proper error handling for missing provenance data.
Using next(iter(psm.provenance_data.keys()))
could raise a StopIteration if provenance_data is empty. Consider adding error handling.
Apply this diff to add error handling:
- row = precursor_df.loc[next(iter(psm.provenance_data.keys()))]
+ try:
+ provenance_key = next(iter(psm.provenance_data.keys()))
+ row = precursor_df.loc[provenance_key]
+ except (StopIteration, KeyError) as e:
+ logger.warning(f"Skipping PSM due to missing provenance data: {e}")
+ continue
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
row = precursor_df.loc[next(iter(psm.provenance_data.keys()))] | |
try: | |
provenance_key = next(iter(psm.provenance_data.keys())) | |
row = precursor_df.loc[provenance_key] | |
except (StopIteration, KeyError) as e: | |
logger.warning(f"Skipping PSM due to missing provenance data: {e}") | |
continue |
🤖 Prompt for AI Agents
In quantmsrescore/alphapeptdeep.py around line 770, the code uses
next(iter(psm.provenance_data.keys())) which will raise StopIteration if
provenance_data is empty; guard against an empty provenance_data by checking its
truthiness before using next(iter(...)) and handle the case (e.g., log and skip
this PSM or raise a clear ValueError) and then retrieve the key safely
(next(iter(psm.provenance_data)) or use list(...) with bounds check) so the code
either continues gracefully or fails with an explicit message when provenance is
missing.
consider_modloss: bool, optional | ||
If modloss ions are considered in the ms2 model. `modloss` | ||
ions are mostly useful for phospho MS2 prediciton model. | ||
Defaults to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix docstring inconsistency.
The docstring states "Defaults to True" but the parameter default is False
.
Apply this diff to fix the documentation:
- consider_modloss: bool, optional
- If modloss ions are considered in the ms2 model. `modloss`
- ions are mostly useful for phospho MS2 prediciton model.
- Defaults to True.
+ consider_modloss: bool, optional
+ If modloss ions are considered in the ms2 model. `modloss`
+ ions are mostly useful for phospho MS2 prediction model.
+ Defaults to False.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
consider_modloss: bool, optional | |
If modloss ions are considered in the ms2 model. `modloss` | |
ions are mostly useful for phospho MS2 prediciton model. | |
Defaults to True. | |
consider_modloss: bool, optional | |
If modloss ions are considered in the ms2 model. `modloss` | |
ions are mostly useful for phospho MS2 prediction model. | |
Defaults to False. |
🤖 Prompt for AI Agents
In quantmsrescore/annotator.py around lines 69 to 72, the docstring for the
parameter `consider_modloss` incorrectly states "Defaults to True" while the
function default is False; update the docstring text to "Defaults to False" (or
otherwise match the actual default) so the documentation and code are
consistent.
consider_modloss: bool, optional | ||
If modloss ions are considered in the ms2 model. `modloss` | ||
ions are mostly useful for phospho MS2 prediciton model. | ||
Defaults to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix inconsistent docstring default value.
The docstring states "Defaults to True" but the CLI flag is actually False by default (using is_flag=True
without default=True
).
Apply this diff to fix the docstring:
consider_modloss: bool, optional
If modloss ions are considered in the ms2 model. `modloss`
ions are mostly useful for phospho MS2 prediciton model.
- Defaults to True.
+ Defaults to False.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
consider_modloss: bool, optional | |
If modloss ions are considered in the ms2 model. `modloss` | |
ions are mostly useful for phospho MS2 prediciton model. | |
Defaults to True. | |
consider_modloss: bool, optional | |
If modloss ions are considered in the ms2 model. `modloss` | |
ions are mostly useful for phospho MS2 prediciton model. | |
Defaults to False. |
🤖 Prompt for AI Agents
In quantmsrescore/ms2rescore.py around lines 178 to 181, the docstring for the
parameter `consider_modloss` incorrectly states "Defaults to True" while the CLI
flag is false by default; update the docstring to reflect the actual default by
changing the sentence to "Defaults to False." so the documentation matches the
CLI behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
quantmsrescore/alphapeptdeep.py (1)
6-7
: Duplicate comment detected for unused imports.Both
pDeepModel
andcreate_fragment_mz_dataframe
are imported but never used anywhere in this file. This was previously flagged but remains unaddressed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsrescore/alphapeptdeep.py
(13 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
quantmsrescore/alphapeptdeep.py (1)
quantmsrescore/idxmlreader.py (2)
psms_df
(102-104)psms_df
(107-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build (3.11)
🔇 Additional comments (10)
quantmsrescore/alphapeptdeep.py (10)
46-46
: LGTM! Parameter rename successfully implemented.The parameter has been correctly renamed from
mask_modloss
toconsider_modloss
with appropriate default value and storage as instance variable.Also applies to: 83-83
195-195
: LGTM! Parameter propagation is consistent.The
consider_modloss
parameter is correctly propagated through all thecustom_correlate
function calls across different methods.Also applies to: 458-458, 512-512, 631-631
401-401
: LGTM! AlphaPeptDeepAnnotator correctly inherits the new parameter.The
consider_modloss
parameter is properly added to the constructor and stored as an instance variable.Also applies to: 419-419
679-679
: LGTM! Function signature updates are consistent.The
mask_modloss
parameter has been correctly renamed toconsider_modloss
across all function signatures.Also applies to: 689-689, 702-702
720-731
: LGTM! Modloss-aware fragment type handling is correctly implemented.The conditional logic properly handles different fragment types based on the
consider_modloss
flag, including modloss fragment types when enabled.
770-770
: LGTM! Provenance data handling is correct.The code properly accesses the provenance data key to locate the corresponding row in the precursor dataframe.
774-775
: LGTM! Masking logic correctly filters non-zero fragments.The implementation correctly applies boolean masks to filter out zero-valued theoretical m/z values and uses these masks consistently across theoretical m/z, targets, and predictions.
Also applies to: 778-778, 781-782, 787-789
568-572
: LGTM! Decoy filtering for calibration set is correctly implemented.The validation method properly filters out decoy PSMs and selects only rank-1 target PSMs for the calibration set, which aligns with the PR objectives.
704-716
: No action required:pDeepModel
andcreate_fragment_mz_dataframe
are used
Both imports are invoked in theconsider_modloss
and default branches, so they are not unused.
718-718
: Verify themask_modloss
inversion in ModelManagerPlease confirm that inverting the
consider_modloss
flag when callingModelManager
matches the intended behavior in the externalpeptdeep
API—i.e., that settingmask_modloss=True
indeed disables modloss handling and vice versa.Locations to check:
- quantmsrescore/alphapeptdeep.py:718
model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")Since
ModelManager
comes from an external package, please review its documentation or source to ensure thatmask_modloss
was renamed and inverted correctly relative toconsider_modloss
. If the logic is reversed, adjust the argument accordingly or add an inline comment to clarify.
precusor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[ | ||
"fragment_intensity_df"], predictions["fragment_mz_df"] | ||
precusor_df = precusor_df.set_index("provenance_data") | ||
precursor_df = precursor_df.set_index("provenance_data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix variable assignment after conditional initialization.
The precursor_df
assignment occurs outside the conditional blocks, but precursor_df
is only defined in the else
block (line 730-731). In the if
block, it should be assigned to psms_df
.
if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
if consider_modloss:
model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
'b_modloss_z1', 'b_modloss_z2',
'y_modloss_z1', 'y_modloss_z2'], device="cpu")
theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2',
'b_modloss_z1', 'b_modloss_z2',
'y_modloss_z1', 'y_modloss_z2'])
else:
model = pDeepModel(charged_frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], device="cpu")
theoretical_mz_df = create_fragment_mz_dataframe(psms_df, ['b_z1', 'y_z1', 'b_z2', 'y_z2'])
model.load(os.path.join(model_dir, "ms2.pth"))
predict_int_df = model.predict(psms_df)
- precursor_df = psms_df
+ precursor_df = psms_df
else:
model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")
model_mgr.load_installed_models(model)
if consider_modloss:
predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
'b_modloss_z1', 'b_modloss_z2',
'y_modloss_z1', 'y_modloss_z2'],
process_num=processes)
else:
predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'],
process_num=processes)
precursor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
"fragment_intensity_df"], predictions["fragment_mz_df"]
- results = []
- precursor_df = precursor_df.set_index("provenance_data")
+ results = []
+ precursor_df = precursor_df.set_index("provenance_data")
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In quantmsrescore/alphapeptdeep.py around line 734, precursor_df is being
set_index("provenance_data") but precursor_df is only defined in the else
branch; in the if branch you should assign precursor_df = psms_df (or set index
on psms_df and assign that to precursor_df) before calling set_index so
precursor_df is always defined; update the if branch to create/assign
precursor_df from psms_df and then call set_index("provenance_data") on it.
PR Type
Enhancement
Description
Renamed
mask_modloss
parameter toconsider_modloss
with inverted logicEnhanced AlphaPeptDeep model integration with custom model loading
Improved modification parsing for N-terminal modifications
Added decoy filtering for model validation
Diagram Walkthrough
File Walkthrough
__init__.py
Version bump to 0.0.12
quantmsrescore/init.py
pyproject.toml
Version bump in project configuration
pyproject.toml
alphapeptdeep.py
Major AlphaPeptDeep model integration refactor
quantmsrescore/alphapeptdeep.py
mask_modloss
toconsider_modloss
with inverted boolean logicannotator.py
Annotator parameter updates and decoy filtering
quantmsrescore/annotator.py
mask_modloss
toconsider_modloss
idxmlreader.py
Enhanced PSM data structure and modification parsing
quantmsrescore/idxmlreader.py
is_decoy
andscore
fields to PSM dataframems2rescore.py
CLI parameter rename for modloss handling
quantmsrescore/ms2rescore.py
--mask_modloss
to--consider_modloss
Summary by CodeRabbit
New Features
Refactor
Documentation
Chores