Skip to content

ValueError: The output generated by func have different column names than the ones provided by get_feature_names_out. Got output with columns names: ['x0', 'x1', 'x2', 'x3', 'x4'] #43

@emmanuel-contreras

Description

@emmanuel-contreras

Hello, I am running into the error below trying to run the TPM example provided in the docstrings.

ValueError: The output generated by `func` have different column names than the ones provided by `get_feature_names_out`. 
Got output with columns names: ['x0', 'x1', 'x2', 'x3', 'x4'] and 
`get_feature_names_out` returned: ['Gene_1', 'Gene_2', 'Gene_3', 'Gene_4', 'Gene_5'].
 The column names can be overridden by setting `set_output(transform='pandas')` or 
`set_output(transform='polars')` such that the column names are set to the names provided by `get_feature_names_out`.

This is the code I am running from the example

from rnanorm.datasets import load_toy_data
from rnanorm import TPM
dataset = load_toy_data()
dataset.exp
#          Gene_1  Gene_2  Gene_3  Gene_4  Gene_5
#Sample_1     200     300     500    2000    7000
#Sample_2     400     600    1000    4000   14000
#Sample_3     200     300     500    2000   17000
#Sample_4     200     300     500    2000    2000
tpm = TPM(gtf=dataset.gtf_path).set_output(transform="pandas")
tpm.fit_transform(dataset.exp)

I also tried running the example code from this issue #20 which produces the same error message

from rnanorm import TPM
import pandas as pd
df = pd.DataFrame([[200, 400, 400], [300, 300, 800]], index=["Sample1", "Sample2"], columns=["Gene1", "Gene2", "Gene3"])
gene_lengths = pd.Series([100, 100, 200], index=["Gene1", "Gene2", "Gene3"])
df
#          Gene1  Gene2  Gene3
# Sample1    200    400    400
# Sample2    300    300    800

# In [6]: gene_lengths
# Gene1    100
# Gene2    100
# Gene3    200
# dtype: int64

TPM(gene_lengths=gene_lengths).set_output(transform="pandas").fit_transform(df)
# Out[7]:
#             Gene1     Gene2     Gene3
# Sample1  250000.0  500000.0  250000.0
# Sample2  300000.0  300000.0  400000.0

The error happens when running tpm.fit_transform(dataset.exp) this is on a new conda environment with python 3.13, pandas 2.3.2 rnanorm 2.2.0, sklearn 1.7.2, and as you can see, even having set_output(transform="pandas") the error occurs.

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions