NLR Finder identifies and extracts the best (longest) protein sequence from transcripts annotated by NLR-annotator for phylogenetic analysis and visualisation with iTOL. Note this tool has only been tested on the Chinese Spring v2.1 Reference Genome and may not work for another genomes (see Issues).
This code is based on an analysis performed by Burkhard Steuernagel for The NLR-Annotator Tool Enables Annotation of the Intracellular Immune Receptor Repertoire.
Pandas v2.1.4
Biopython v1.78
NLR-annotator v2.1 (required for NLR annotations)
iTOL v6
NLR Finder can be run using the following command:
python3 src/main.py \
-c cds.fasta \
-a annotaiton.txt \
-p protein.fasta \
-o output.faa \
The following parameters are required to run NLR Finder:
parameter | description |
---|---|
-c, --cds | Coding sequences from a reference genome |
-a, --annotation | NLR-Annotator output (-o output.txt) of coding sequences |
-p --protein | Sequences corresponding to coding sequences |
-o, --output | Name of the output fasta file |
The following optional parameters can be provided to add cloned sequences and annotate NBD-NBARC NLRs:
parameter | description |
---|---|
-x, --cloned_nlrs | 2 column headerless tsv file of gene name (used for annotaion) and GenBank accession number (used to fetch sequence). An example can be found in example.tsv |
-e, --email | required for accessing NCBI API (required when using --cloned_nlrs) |
-nbd | Enables annotation of NBD-NBARC NLRs |
-m, --motifs | Motifs from NLR-Annotator (-m output.motifs.bed) for CDS (required when using -nbd) |
output | description |
---|---|
output.faa | A fasta file of NLR protein sequence identified from provided cds and sequences specified by --cloned_nlrs. File name specified by --output |
itol_nlr_labels.txt | iTOL annotations for sequences provided by --cloned_nlrs. The corresponding sequences in tree are labelled with the name provided in the tsv. |
itol_nbs_nbarc_stars.txt | iTOL annotations for sequences re-annotated as NBD-NBARC NLRs. The corresponding sequences in tree are denoated by a red star. |
The following is a mock analysis using NLR Finder to create a phylogenetic tree of unique NLRs from a reference genome:
-
Run NLR-Annotator on reference cds
-
Run NLR Finder
-
Generate multiple sequence alignment from NLR Finder output
-
Generate Phylogenetic Tree
-
Load tree into iTOL and visualise annotations from NLR Finder
Any issues can be reported on the repositories issues page. Alternatively, you can reach out and contact me directly here.
This tool was designed to analyse the Chinese Spring v2.1 Reference Genome and may not work with other genomes. If you are interested in applying this analysis to your genome of interest please contact me and I'd be happy to help!
If you use this tool in your research please cite: