Skip to content

Feature Requests from LoG Poster Session #406

@xingjian-zhang

Description

@xingjian-zhang

During the LoG poster session, I collected these relevant feature requests:

Explicit Data Versioning

It is important to keep the dataset version stable during the research project. Currently, our dataset versioning is based on the implicit git log of urls.json. Rather, an explicit version of the dataset might be useful - we can probably append new versions of a dataset to the urls.json, and let users specify the version during data loading.

Dataset-specific Preprocessing (Feature Extractor)

Some datasets (e.g. molecular) are small without preprocessing (~50Mb) but can expand massively after featurization (~2G). We might want to add a (user-defined) step (an abstract layer) between downloading and data loading that preprocess the dataset locally. Perhaps we should add this in the metadata.json and cache the processed dataset after first dataloading.

Support of PyG

A lot of Graph ML projects are using PyG for development (potentially because PyG has a longer age). We are going to implement a PyG data-loading pipeline in our next step.

We thank to Ladislav, Remy, Song, Semih, and all people who gave us valuable feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions