-
Notifications
You must be signed in to change notification settings - Fork 117
GDCquery_clinic not working for TCGA projects #639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have the same problem |
Hi, Thank you for the bug report. It seems there were changes in the API/data retrieved that broke the code. If anyone finds other issues, please let me know. I will need to rewrite the code due to those changes.
|
Hi, The function is working, and I can access the clinical data. For example, I retrieved the clinical data for the BRCA cohort, but there are too many NA values in days_to_last_follow_up. I have worked with this cohort before, and it seems like there weren’t this many NA values. Could there be an error here? |
It seems to be an issue on the API side. I have null for
days_to_last_follow_up for all the TCGA-GBM samples.
[image: Screenshot 2025-01-31 at 2.54.32 PM.png]
I will send an email to GDC regarding this issue.
…On Fri, Jan 31, 2025 at 2:49 PM SNN0 ***@***.***> wrote:
Hi,
The function is working, and I can access the clinical data. For example,
I retrieved the clinical data for the BRCA cohort, but there are too many
NA values in days_to_last_follow_up. I have worked with this cohort before,
and it seems like there weren’t this many NA values. Could there be an
error here?
—
Reply to this email directly, view it on GitHub
<#639 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQ6IWZUX4Q33WIAOVSKL2NPHVRAVCNFSM6AAAAABWIGJ64WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRYGI2TCMRWGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I really hope this issue gets resolved soon because I don’t want my project to be delayed. Do you know of any other way to access this data? I have written many of my functions based on this output. Also, can we manually access this data from the GDC portal in the same way? @tiagochst |
I just got some answers from GDC. I still need to check which changes need
to be made.
1. The GDC just deployed our latest data release, in which almost all of
the TCGA clinical data was indexed in the API instead of being largely
available in the supplemental files.
2. I think the days_to_last_followup field has been moved to the
follow_up node as "days_to_followup". Add "follow_up" to your expand
fields and you should see it. I think you will need to add an extra step
to determine which follow_up has the largest value.
This data should be the same one downloaded in this TSV file if you want to
download manually.
[image: Screenshot 2025-01-31 at 5.03.07 PM.png]
…On Fri, Jan 31, 2025 at 3:41 PM SNN0 ***@***.***> wrote:
I really hope this issue gets resolved soon because I don’t want my
project to be delayed. Do you know of any other way to access this data? I
have written many of my functions based on this output. Also, can we
manually access this data from the GDC portal in the same way? @tiagochst
<https://github.com/tiagochst>
—
Reply to this email directly, view it on GitHub
<#639 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQ6KNWH3Z2Z2LAN3YQJL2NPNWPAVCNFSM6AAAAABWIGJ64WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRYGM3DGMBQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@tiagochst Thank you so much for the prompt response and the effort in resolving the issue! I think I may need to download the data manually for now, but for some reason I can't see this screenshot image from your previous post. Any chance you can upload it again? Thanks!
|
Hi, I'm facing the same issue. |
Hello all, I just found the columns paper_Follow.up.days and paper_Days.to.death in the coldata of the summarised experiment object we get as a result of GDCPrepare (together with many other clinical data attributes).
Can someone confirm if this is equivalent to the clinical data we used to pull separately using GDCquery_clinic ? I sadly did not save the Rdataframes I pulled earlier so cannot check myself. Thanks, |
Hi Ramya,
Every column with prefix paper_ in metadata pulled from supplemental files
from the TGCA analysis groups articles.
If the clinical data continues to be updated, they will be different from
what was published years ago.
…On Mon, Feb 3, 2025 at 11:49 AM Ramya Purkanti ***@***.***> wrote:
Hello all,
I just found the columns *paper_Follow.up.days* and *paper_Days.to.death*
in the coldata of the summarised experiment object we get as a result of
GDCPrepare (together with many other clinical data attributes).
`
library(TCGAbiolinks)
query_paad_all <- GDCquery(
project = "TCGA-PAAD",
data.category = "Transcriptome Profiling",
experimental.strategy = "RNA-Seq",
workflow.type = "STAR - Counts",
data.type = "Gene Expression Quantification",
sample.type = "Primary Tumor",
access = "open")
tcga_paad_data <- GDCprepare(query_paad_all, summarizedExperiment = TRUE,
directory = TCGAbiolinks_dir)
tcga_paad_coldata <- colData(tcga_paad_data) %>% as.data.frame()
tcga_paad_coldata$paper_Follow.up.days
tcga_paad_coldata$paper_Days.to.death
`
Can someone confirm if this is equivalent to the clinical data we used to
pull separately using GDCquery_clinic ? I sadly did not save the
Rdataframes I pulled earlier so cannot check myself.
Thanks,
Ramya
—
Reply to this email directly, view it on GitHub
<#639 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQ6IEI74QRS6FZIYLIYT2N6MYLAVCNFSM6AAAAABWIGJ64WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZRGUZTMOJUHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @tiagochst, I faced the same issue with TCGA-DLBC project, all "days_to_last_follow_up" are NA. Thanks! |
An update: I pulled out the follow_ups node details but the follow_ups.days_to_follow_up does not contain a single integer, it's for each follow up event. I could not find a ready made column for days_to_last_follow_up (the one within diagnoses is all NA) anywhere. I haven't yet checked whether if we take the max of all values in follow_ups.days_to_follow_up column, if it matches the value we get by downloading manually. In case someone wants to check up, the commands to pull out the clinical data are:
|
Hello, Is there any plan to reintegrate the field
Thanks. |
have u solved this problem?? I met the same problem |
Dear developers, the issue with empty clin <- GDCquery_clinic("TCGA-LUAD", "clinical")
table(clin$days_to_last_follow_up, useNA="ifany") <NA>
585 Though the majority of patients are alive: table(clin$vital_status, useNA="ifany") Alive Dead <NA>
334 188 63 I am using the 'dev' version of TCGAbiolinks (2.35.3). Any plans for a fix? Thanks! |
Is there an update on this issue? I have tried to download the TCGA-COAD data today (April 7) with the developer version (version 2.35.3, installed from github) of TCGAbiolinks and the 'days_to_last_follow_up' column is still absent from the clinical data.
|
Thanks @tiagochst for the fix in v2.35.4! Days to follow-up are back. |
I am trying to carryout differential gene expression between drug sensitive and drug resistant patient samples. For downloading clinical samples, I have ran the following code to update TCGAbiolinks. |
@Rutwik-Garge try installing from the master branch. It seems the 'devel' branch is behind master by a few commits. |
I am still getting the the columns in data with NA |
Uh oh!
There was an error while loading. Please reload this page.
I tried to download clinical data from TCGA projects but it returned an error message (please see below). It happened for various different TCGA tumor types.
Any chance you could look into this issue? Thanks.
Version: 2.32.0
The text was updated successfully, but these errors were encountered: