More detail about the dataset #21

Dewey-Wang · 2025-03-11T18:13:05Z

Dewey-Wang
Mar 11, 2025

I would like to know some more detail about the dataset to help me understand more.

There are two data type, one is "cna", one is "mut". I would like to know how do they come from. What I meant is that are they from WGS or WES? Also, how should we get them and what is the meaning of the numbers? I noticed that the number may be preprocessed with log transform. I want to know is the number means the genes experssion from the patient or is the number means how many times did the gene mutate in a genome.

Also, I do not really understand the concept of the 'cna'. I found "Copy number variation (CNV) refers to the amplification or deletion of gene regions, rather than changes in single bases." However, when you do the sequencing whether the PCR effect this concept? because PCR could amplify the how genome.

Answered by borauyar

Mar 11, 2025

Transform is a method to convert a dataset to the embedding space. Basically the function will run a dataset through a trained model and extract the sample embeddings layer. see here: https://github.com/BIMSBbioinfo/flexynesis/blob/ac31a9280f256e6c9b650e0e18ada43efa77f500/flexynesis/models/direct_pred.py#L292

The model predictions for the target variables are obtained using model.predict method.

View full answer

borauyar · 2025-03-11T18:49:45Z

borauyar
Mar 11, 2025
Maintainer

The dataset used in the homeworks contains a link in the notebook to the resource (cbioportal: https://www.cbioportal.org/study/summary?id=lgggbm_tcga_pub)
There you can explore the dataset further. You can find a link to the study as well (https://pubmed.ncbi.nlm.nih.gov/26824661/). There you should be able to find specific details on how the data was collected and generated.

CNA usually means copy number aberration/alteration. If you are curious about how CNAs are computed you can check out tools that compute copy number variations. Usually to avoid PCR artifacts, you will need a matched normal sample as reference, so that aberrations in the copy number of a gene or a genomic segment can be quantified with a certain confidence.

2 replies

Dewey-Wang Mar 11, 2025
Author

Thank you! I will check it!

I have another question about the code:

train_embeddings = model.transform(train_dataset)
flexynesis.plot_dim_reduced(train_embeddings, train_dataset.ann['STUDY'])

what does the transform mean here? I thought the model would only do the prediction and it will only give us a probability of the chance of being "LGG" or "GBM" because we set the target_variables=['STUDY']

borauyar Mar 11, 2025
Maintainer

Transform is a method to convert a dataset to the embedding space. Basically the function will run a dataset through a trained model and extract the sample embeddings layer. see here: https://github.com/BIMSBbioinfo/flexynesis/blob/ac31a9280f256e6c9b650e0e18ada43efa77f500/flexynesis/models/direct_pred.py#L292

The model predictions for the target variables are obtained using model.predict method.

Answer selected by Dewey-Wang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More detail about the dataset #21

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

More detail about the dataset #21

Uh oh!

Dewey-Wang Mar 11, 2025

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

borauyar Mar 11, 2025 Maintainer

Uh oh!

Uh oh!

Dewey-Wang Mar 11, 2025 Author

Uh oh!

Uh oh!

borauyar Mar 11, 2025 Maintainer

Dewey-Wang
Mar 11, 2025

Replies: 1 comment 2 replies

borauyar
Mar 11, 2025
Maintainer

Dewey-Wang Mar 11, 2025
Author

borauyar Mar 11, 2025
Maintainer