Feature Requests from LoG Poster Session

During the LoG poster session, I collected these relevant feature requests:

### Explicit Data Versioning
It is important to keep the dataset version stable during the research project. Currently, our dataset versioning is based on the implicit git log of `urls.json`. Rather, an explicit version of the dataset might be useful - we can probably append new versions of a dataset to the `urls.json`, and let users specify the version during data loading.

### Dataset-specific Preprocessing (Feature Extractor)
Some datasets (e.g. molecular) are small without preprocessing (~50Mb) but can expand massively after featurization (~2G). We might want to add a (user-defined) step (an abstract layer) between downloading and data loading that preprocess the dataset locally. Perhaps we should add this in the `metadata.json` and cache the processed dataset after first dataloading.

### Support of PyG
A lot of Graph ML projects are using PyG for development (potentially because PyG has a longer age). We are going to implement a PyG data-loading pipeline in our next step.

We thank to Ladislav, Remy, Song, Semih, and all people who gave us valuable feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Requests from LoG Poster Session #406

Explicit Data Versioning

Dataset-specific Preprocessing (Feature Extractor)

Support of PyG

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Requests from LoG Poster Session #406

Description

Explicit Data Versioning

Dataset-specific Preprocessing (Feature Extractor)

Support of PyG

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions