Skip to content

issue with gene file #48

@jcerca

Description

@jcerca

Dear Evan,

I am really excited with getting Tephra running, it seems to be a beautiful piece of software. I had some issues I'd like to solve, though. I am putting them here so everyone can see, but please let me know if this shouldn't be on the issues.

I got the docker version running by installing docker:

$ docker run -it --name tephra-con -v $(pwd)/db:/db:Z sestaton/tephra
$ cd /db
$ wget https://raw.githubusercontent.com/sestaton/tephra/master/config/tephra_config.yml
#### changed the "logfile", "genome", "outfile", "repeatdb" (using your sunflower library, thank you for that!).
$ tephra all -c tephra_config.yml

[ERROR]: gene file was not defined in configuration or does not exist. Check input. Exiting.

I noticed that the new config file has this line. It is possibly new since it is not on the manual or help pages.

  • genefile: TAIR10_genes.fas

I deleted it:

$ sed "s/.*genefile.*//; /^$/d" tephra_config.yml > tephra_config2.yml
$ tephra all -c tephra_config2.yml

[ERROR]: 'trnadb' under 'all' is not defined after parsing configuration file.
         This indicates there may be a blank line in your configuration file.
         Please check your configuration file and try again. Exiting.

Q1: I interpret this that it did not like my re-formating of the config file. I was thus wondering what is this "TAIR10_genes.fas". Is this the genetic annotations of arabidopsis? I checked NCBI and TAIR10 seems to be an assembly name for this species ( https://www.ncbi.nlm.nih.gov/assembly/GCF_000001735.4).
Q2: Is there a way to run the "all" command without specifying the annotations?
See config file below.

$ cat t*yml
## For more information about this file, see:
## https://github.com/sestaton/tephra/wiki/Specifications-and-example-usage.
all:
  - logfile:          tephra.log
  - genome:           scalesia_atractyloides.fasta
  - outfile:          scalesia_atractyloides_thra_transposons.gff3
  - repeatdb:         Ha412v1r1_transposons_v1.0.fasta
  - genefile:         TAIR10_genes.fas
  - trnadb:           TephraDB
  - hmmdb:            TephraDB
  - threads:          24
  - clean:            YES
  - debug:            NO
  - subs_rate:        1e-8
findltrs:
  - dedup:             NO
  - tnpfilter:         NO
  - domains_required:  NO
  - ltrharvest:
     - mintsd:         4
     - maxtsd:         20
     - minlenltr:      100
     - maxlenltr:      1000
     - mindistltr:     1000
     - maxdistltr:     15000
     - seedlength:     30
     - tsdradius:      60
     - xdrop:          5
     - swmat:          2
     - swmis:          -2
     - swins:          -3
     - swdel:          -3
     - overlaps:       best
  - ltrdigest:
     - pptradius:      30
     - pptlen:         8 30
     - pptagpr:        0.25
     - uboxlen:        3 30
     - uboxutpr:       0.91
     - pbsradius:      30
     - pbslen:         11 30
     - pbsoffset:      0 5
     - pbstrnaoffset:  0 5
     - pbsmaxeditdist: 1
     - pdomevalue:     1E-6
     - pdomcutoff:     NONE
     - maxgaplen:      50
classifyltrs:
  - percentcov:       50
  - percentid:        80
  - hitlen:           80
illrecomb:
  - repeat_pid:       10
ltrage:
  - all:              NO
maskref:
  - percentid:        80
  - hitlength:        70
  - splitsize:        5000000
  - overlap:          100
sololtr:
  - percentid:        39
  - percentcov:       80
  - matchlen:         80
  - numfamilies:      20
  - allfamilies:      NO
tirage:
  - all:              NO

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions