More detail about the dataset #21
-
I would like to know some more detail about the dataset to help me understand more. There are two data type, one is "cna", one is "mut". I would like to know how do they come from. What I meant is that are they from WGS or WES? Also, how should we get them and what is the meaning of the numbers? I noticed that the number may be preprocessed with log transform. I want to know is the number means the genes experssion from the patient or is the number means how many times did the gene mutate in a genome. Also, I do not really understand the concept of the 'cna'. I found "Copy number variation (CNV) refers to the amplification or deletion of gene regions, rather than changes in single bases." However, when you do the sequencing whether the PCR effect this concept? because PCR could amplify the how genome. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
The dataset used in the homeworks contains a link in the notebook to the resource (cbioportal: https://www.cbioportal.org/study/summary?id=lgggbm_tcga_pub) CNA usually means copy number aberration/alteration. If you are curious about how CNAs are computed you can check out tools that compute copy number variations. Usually to avoid PCR artifacts, you will need a matched normal sample as reference, so that aberrations in the copy number of a gene or a genomic segment can be quantified with a certain confidence. |
Beta Was this translation helpful? Give feedback.
Transform is a method to convert a dataset to the embedding space. Basically the function will run a dataset through a trained model and extract the sample embeddings layer. see here: https://github.com/BIMSBbioinfo/flexynesis/blob/ac31a9280f256e6c9b650e0e18ada43efa77f500/flexynesis/models/direct_pred.py#L292
The model predictions for the target variables are obtained using
model.predict
method.