Description
At the moment there are two major scripts in the pipeline, countAmpliconsAWS.R
and dict_align.py
.
At the moment, countAmpliconsAWS.R
runs dict_align.py
towards the beginning of the pipeline in order to align and count the amplicons. When dict_align.py
is finished running, it saves the output as results.csv
, which is then loaded into memory by countAmpliconsAWS.R
for downstream analysis. This write/read step takes extra time, and it would be better to keep the results in memory for downstream analysis instead of writing it to the drive and then reading it back in.
The reticulate
library for R
provides an R
interface to python
that may allow us to bypass this write/read step (https://rstudio.github.io/reticulate/). I haven't used this library yet, but I'm interested in trying it out to see if it improves speed.