Skip to content

Conversation

@jfy133
Copy link
Member

@jfy133 jfy133 commented Jun 22, 2025

See code TODOs

To close #833

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@jfy133 jfy133 linked an issue Jun 22, 2025 that may be closed by this pull request
@github-actions
Copy link

github-actions bot commented Jun 22, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit db7f618

+| ✅ 384 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   6 tests had warnings |!

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 5.3.0
  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file [TODO: try and test using for --host_fasta and --host_genome]
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2025-12-03 13:11:21

@nf-core-bot
Copy link
Member

nf-core-bot commented Sep 5, 2025

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.4.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@jfy133
Copy link
Member Author

jfy133 commented Sep 12, 2025

@nf-core-bot fix linting

@jfy133
Copy link
Member Author

jfy133 commented Sep 12, 2025

TODO:

  • Veirfy the reordering of contigs is correct
  • Update output.md
  • Update/fix snapshots

@jfy133
Copy link
Member Author

jfy133 commented Nov 18, 2025

Update:

  • My original tests were too simplistic,
  • ✅ I realised the join wasn't working properly because we can have one contig in multiple bins, so in the last PR I switched to a combine instead
  • TODO: I will need to 'clean' the keys before the join to drop anything after the first as MetaBinner keeps the MEGAHIT headers but the other binners appear to drop them (MetaBAT2, MetaBinner), prventing joining
  • TODO: the current test seems to run through without executing the pydamage bin summaries so I will need to continue investgating (I think due to a join failure in the creation of ch_pydamage_to_bins in pydamage_bins

@jfy133
Copy link
Member Author

jfy133 commented Nov 21, 2025

TODO

  • Fixed the issues from before, and now can produce the reordered and a summary file, however I'm getting variable numbers of rows in the pydamage summary and number of reordered files per -resume run (it appears to be two different values that it flip flops between). To investigate:
    • Check contents of each reordered file before summarizing
    • Set up sorting of all relevant channels
    • I notice SUMMARISE_PYDAMAGE doesn't get cached again implying it is not recieveing the same number of rows each time
    • I should try an nf-consule reprex
  • Triple check manually that the median summaries make sense by manually getting the rows from original pydamage
    results and manually getting contig IDs from bins and 'manually' calculate median pydamage results
    • I am wondering if there is some consuming during combine (it's not doing a true all-by-all combine, but once one key is used it discards the rest?)
  • Test with CONCOCT
  • Add to a relevant test and update snapshot

@jfy133
Copy link
Member Author

jfy133 commented Nov 21, 2025

@nf-core-bot fix linting

@jfy133
Copy link
Member Author

jfy133 commented Nov 26, 2025

I realise now I have made this overly complicated, I think I can basically have a single local module with a custom script that does the reordering (as we actually rename the pydamage output files to make them unique)... so will start again

@jfy133
Copy link
Member Author

jfy133 commented Dec 3, 2025

Latest status: nextflow code is mostly working, now trying to get the summarise_pydamage.py script working.

Last time however I found there was some discreprencey, where there were a lot fo bins missing in one of the files and I'm not sure why

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ancient DNA mode: Add pyDamage results to bin_summary.tsv

4 participants