Skip to content

Conversation

@sgjoshi25
Copy link
Contributor

Create filtered lists of the data based on the CV threshold and NaN filters.
Run DFA of the datasheets and Gene/RxnKO knockouts to find the full change between the conditions

…filtered datasheets based on CV and NaN filter to put into DFA
…heets. Then, it runs the outputted model against GeneKO and RxnKO knockouts to find the full change
@sgjoshi25 sgjoshi25 requested a review from ScottCampit July 28, 2020 15:02
@ScottCampit
Copy link
Member

The code works and I don't see any apparent bugs, so kudos on getting it to run. Depending on what you have tried out so far, this may be due to the way the data is processed to compute the flux activity coefficients, as from your EDA, we do expect significant changes between metabolites. Thus, I would suggest the following:

  • Play with extreme kappa/kappa2 values, like 1E-12, 1E-6, 1, and 10. If this doesn't affect the model at all, then I would move onto the next point

  • Check the flux activity coefficients directly. If they are all the same, that would explain why you're getting the same results. Then, you should think about how to normalize the data to extract more information. MAV and Quantile norms are built into DFA, but you can try other methods if you think they would work better.

  • Try -dox model if possible. This is another thing they wanted me to do, as it is a more direct control than NT.

To improve the scripts, here are some of my suggestions you may want to implement:

  • You should keep the formatting for all of your sections and livescripts consistent - for instance, dfa_tu8902.mlx seems to be more polished than dfa_tu8902_filtered_data.mlx.

  • The latter file name is a bit of a misnomer I feel - that script is filtering data by CV, not running DFA on the filtered data, as I initially thought.

  • The latter has a lot of redundant code. You can make it to a for loop using the following pseudocode:

array_avg = table2array(unfiltered_dataset_avg(:,12:end));
array_std = table2array(unfiltered_dataset_std(:,12:end));
CV = array_std ./ array_avg;

cv_values = [...]
filenames = [...]
for i = 1:length(cv_values)
    bool = all((CV >= cv_values(i)),2);
    filtered_data = unfiltered_dataset_avg(bool,2), :) = [];
    nan_filtered_data = filtered_data(sum(double(bool), 2) > 5, :) = [];
    writetable(filtered_data, filename(i), ...);
end

@ScottCampit ScottCampit added documentation Improvements or additions to documentation enhancement New feature or request labels Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants