Problem about downstream task data.

The project works perfect in Bert setting, but unfortunately does not work for many other transformer models.   
  
As I noticed, downstream task data is provided in **processed** formatted that is suitable for Bert, this limits implementations in other transformer models which uses different tokenization method that is different from Bert.  
  
The **pattern_extraction.py** code seems only works for generating the **pre-train TacoML data**, but cannot process downstream tasks data and these downstream data in data directory is provided in **processed** format which means could only be used by Bert. For example , in augmented MC-TACO dataset, tokens like [unused7] did not appear in **pattern_extraction.py** , so I guess down stream task used a different extraction code.   
  
This puts a dead end in reproducing these results using other transformer models. Any method  in processing these downstream tasks data into the format that is suitable for other transformer models other than Bert ?  Or any plan in releasing these downstream task processing codes and original data ?  It is such a pity if this code only works for Bert :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem about downstream task data. #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problem about downstream task data. #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions