-
Notifications
You must be signed in to change notification settings - Fork 7
Home
This is the wiki for the project 'SemiautomaticGeneAnnotation'.
This program is the main enter point for the project. It relies on the 'executions.xml' file, where sub-programs are specified along with their arguments so that in the end the whole annotation process is performed. The corresponding jar file can be found at the **jar **project folder.
Execution times expected:
- PredictGenes: about 10 minutes
- RemoveDuplicatedGenes: 10/15 minutes
- SolveOverlappings: Almost instantaneous
- FillDataFromUniprot: Directly proportional to the number of proteins (if there are a lot proteins sometimes it kind of gets stucked for some time... we suspect uniprot cuts temporarily the access to our ip)
- PotatizeGenes: Almost instantaneous
FillDataFromUniprot (deprecated)
Completes protein data performing HTTP requests to Uniprot site.
Removes all genes that are duplicated.
Solves every overlapping found between genes and rnas.
This is one of the most important programs/steps on the semi-automatic annotation process. It carries out the gene prediction phase of the process.
Generates two multifasta files for the genes that have been predicted by the end of the process. One including the nucleotide sequences and other with the amino acid sequences.
Generates the corresponding file in format GFF for the final XML results file.
Potatizes the final XML results file generating a 'more accepted by the general public' CSV file.
It creates a new annotation XML file without any dismissed gene included in the input annotation XML file.
Exports the final xml annotation file to Embl format (one file for each contig).
Exports final xml annotation file to GenBank format.
Exports final xml annotation file to GenBank format.
Looks for weird/wrong syntax <Iteration_query-def> values in blastoutput xml files, specifically wrong number of characters '|'.
Generates some statistics about proteins grouped by organism.
Performs a really basic (still useful) quality control in the final annotation results XML file.
Performs an automatic quality control in some results selected randomly from the final annotation XML file.
Quality control program for '5 columns' GenBank files (those used for genomes submissions) exporter program: Export 5 columns GenBank files