-
Notifications
You must be signed in to change notification settings - Fork 8
Preprocessing Script Overview
Marcus Fedarko edited this page Jul 16, 2018
·
13 revisions
MetagenomeScope's "preprocessing script" is a program that takes as input an assembly graph file and produces a SQLite database file that can be visualized in MetagenomeScope's viewer interface.
The preprocessing script is located in the graph_collator/
directory of MetagenomeScope. The script can be run with the command python graph_collator/collate.py
.
- System Requirements: a list of the various libraries needed to run the preprocessing script
-
Installation: a guide to installing the "SPQR version" of the script on your system
- The "non-SPQR version" of the script is written solely in Python, so that version of the script is relatively portable.
- However, the "SPQR version" of the preprocessing script uses OGDF (a C++ library) and a C++ script to interface with it, in order to generate SPQR tree decompositions for MetagenomeScope's "decomposition mode." So there are a few extra steps to compiling the C++ script to work on your system, in order to install this version of the preprocessing script.
- Settings: information about the various options available when running the script
./collate.py [-h] -i INPUTFILE -o OUTPUTPREFIX [-d OUTPUTDIRECTORY]
[-spqr] [-pg] [-px] [-sp] [-w] [-nt] [-b BICOMPONENTFILE]
[-ub USERBUBBLEFILE] [-ubl] [-up USERPATTERNFILE] [-upl]
[-nbdf]
-h, --help show this help message and exit
-i INPUTFILE, --inputfile INPUTFILE
input assembly graph filename (LastGraph, GFA, or
MetaCarvel GML)
-o OUTPUTPREFIX, --outputprefix OUTPUTPREFIX
output file prefix for .db and .xdot/.gv files
-d OUTPUTDIRECTORY, --outputdirectory OUTPUTDIRECTORY
directory in which all output files will be stored;
defaults to current working directory
-spqr, --computespqrdata
compute data for the SPQR "decomposition modes" in
MetagenomeScope; necessitates a few additional system
requirements (see wiki for details)
-pg, --preservegv save all .gv (DOT) files generated for nontrivial
(i.e. containing more than one node, or at least one
edge or node group) connected components
-px, --preservexdot save all .xdot files generated for nontrivial
connected components
-sp, --structuralpatterns
create .txt files in the output directory containing
node information for all structural patterns
identified in the graph
-w, --overwrite overwrite output files
-nt, --notriangulation
disable triangle smoothing in the SPQR mode
-b BICOMPONENTFILE, --bicomponentfile BICOMPONENTFILE
file containing bicomponent information for the
assembly graph (will be generated using the SPQR
script in the output directory if not passed)
-ub USERBUBBLEFILE, --userbubblefile USERBUBBLEFILE
file describing pre-identified bubbles in the graph,
in the format of MetaCarvel's bubbles.txt output: each
line of the file is formatted as (source ID) (tab)
(sink ID) (tab) (all node IDs in the bubble, including
source and sink IDs, all separated by tabs)
-ubl, --userbubblelabelsused
use node labels instead of IDs in the pre-identified
bubbles file
-up USERPATTERNFILE, --userpatternfile USERPATTERNFILE
file describing pre-identified miscellaneous
structural patterns in the graph: each line of the
file is formatted as (pattern type) (tab) (all node
IDs in the pattern, all separated by tabs)
-upl, --userpatternlabelsused
use node labels instead of IDs in the pre-identified
misc. patterns file
-nbdf, --nobackfilldotfiles
produces .gv (DOT) files without cluster "backfilling"
for each nontrivial connected component in the graph;
use of this argument doesn't impact the .db files
produced by this script -- it just demonstrates the
functionality in layout linearization provided by
cluster "backfilling"
-
Controls
(Work in progress)
-
Viewer Interface Tutorial