Skip to content

This is as set of methods for DNA sequence compression using python.

Notifications You must be signed in to change notification settings

jameswilsenach/DNACompressionTools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

DNACompressionTools

The currently available version uses a variation of the Lempel-Ziv-Welch (LZW) algorithm for lossless compression in order to compress a given string containing base pairs using k-mers of a fixed range in length.

Input (lzw_encode):

sequence - A string containing IUPAC bases

n - minimum k-mer length

m - maximum k-mer length

dictionary (optional, advanced) - customised starting dictionary for compression. Default is to include all sequences of base pairs of length n.

Output (lzw_encode):

result - encoded list of ints

dictionary - dict object used to compress the full sequence, structured dictionary[k] = i with:

k - k-mer

i - integer code

Ambiguous IUPAC symbols are assigned a compatible base pair identity at random.

About

This is as set of methods for DNA sequence compression using python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages