Skip to content

Implementation Details for the "Overwrite" Option

Marcus Fedarko edited this page Jul 14, 2018 · 5 revisions

(Regarding save_aux_file() in the preprocessing script, and the -w option)

When we call check_file_existence() before creating a new auxiliary file, a user or a process could get around this check for errors by creating a file or directory at the checked filepath after check_file_existence() is called but before we start writing to that filepath. This could result in data loss for whoever owns the recently created file/directory, or it could result in this script running into an error. In either case, it's not a desirable situation (although it is an uncommon one).

We circumvent this by using os.fdopen() wrapped to os.open(), with certain flags (based on whether or not the user passed -w) set in order to create files here. (This function is the one place where MetagenomeScope's preprocessing script directly writes to a file; all other file creation operations are done by other processes, e.g. the SPQR script or pysqlite.) This approach allows us to guarantee an error will be thrown and no data will be erroneously written if the aforemantioned race condition happens.

(Note that, for NFS, this approach only works "...when using NFSv3 or later on kernel 2.6 or later," according to the open(2) man page as of June 8, 2018. That being said, NFSv3 dates back to June 1995 and the Linux kernel v2.6 dates back to December 2003, so most modern systems shouldn't encounter this race condition.)

The use of os.open() in conjunction with the os.O_EXCL flag in order to prevent the race condition, as well as the background information for this writeup, is based on Adam Dinwoodie (username me_and)'s answer to this Stack Overflow question.

Clone this wiki locally