Skip to content

Example for finding features with epistatic effects with scikit-mdr #26

@weixuanfu

Description

@weixuanfu

It seems that the utilities in mdr.utils is designed for this purpose but there is no documentation about how to use them. I have a quick look into those codes and made the demo for calculating scores for n-way combinations and I think it maybe a way to finding feature combinations with epistatic effect. Please let me know if it is the correct way.

from mdr import MDRClassifier
import pandas as pd
from mdr.utils import n_way_models
import operator

genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-mdr/raw/development/data/GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz', sep='\t', compression='gzip')

features = genetic_data.drop('class', axis=1).values
labels = genetic_data['class'].values
feature_names = list(genetic_data.columns)

my_mdr = MDRClassifier()
my_mdr.fit(features, labels)
print("Score for using all features", my_mdr.score(features, labels))

#n: list (default: [2])
#The maximum size(s) of the MDR model to generate.
#e.g., if n == [3], all 3-way models will be generated.
n = [2]
mdr_score_list = []
#  Note that this function performs an exhaustive search through all feature combinations and can be computationally expensive.
for _, mdr_model_score, model_features in n_way_models(my_mdr, features, labels, n=n, feature_names=feature_names):
    mdr_score_list.append((model_features, mdr_model_score))
mdr_score_list.sort(key=operator.itemgetter(1), reverse=True)
print("The combination with highest score:", mdr_score_list[0])

Exported output:

Score for using all features 0.998125
The combination with highest score: (['P1', 'P2'], 0.793125)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions