Skip to content

Add option to ignore keys/columns when loading a dataset from jsonl(or any other data format) #7594

Open
@avishaiElmakies

Description

@avishaiElmakies

Feature request

Hi, I would like the option to ignore keys/columns when loading a dataset from files (e.g. jsonl).

Motivation

I am working on a dataset which is built on jsonl. It seems the dataset is unclean and a column has different types in each row. I can't clean this or remove the column (It is not my data and it is too big for me to clean and save on my own hardware).
I would like the option to just ignore this column when using load_dataset, since i don't need it.
I tried to look if this is already possible but couldn't find a solution. if there is I would love some help. If it is not currently possible, I would love this feature

Your contribution

I don't think I can help this time, unfortunately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions