Skip to content

Problem about downstream task data. #2

@PosoSAgapo

Description

@PosoSAgapo

The project works perfect in Bert setting, but unfortunately does not work for many other transformer models.

As I noticed, downstream task data is provided in processed formatted that is suitable for Bert, this limits implementations in other transformer models which uses different tokenization method that is different from Bert.

The pattern_extraction.py code seems only works for generating the pre-train TacoML data, but cannot process downstream tasks data and these downstream data in data directory is provided in processed format which means could only be used by Bert. For example , in augmented MC-TACO dataset, tokens like [unused7] did not appear in pattern_extraction.py , so I guess down stream task used a different extraction code.

This puts a dead end in reproducing these results using other transformer models. Any method in processing these downstream tasks data into the format that is suitable for other transformer models other than Bert ? Or any plan in releasing these downstream task processing codes and original data ? It is such a pity if this code only works for Bert :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions