Skip to content

Create dataset loader for SEACrowd Instruct Multi-task Collection #723

@SamuelCahyawijaya

Description

@SamuelCahyawijaya

Dataloader name: seacrowd_instruct/seacrowd_instruct.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?seacrowd_instruct

Dataset seacrowd_instruct
Description The SEACrowd Instruct Multi-task Collection is a multi-task, instruction-formatted compilation of 29 preexisting datasets from SEACrowd. The collection comprises 332,040 question-answer pairs, to integrate the permissively licensed datasets in SEACrowd into the Data Provenance Explorer tool as well as multi-task instruction fine-tuning.
Subsets Bhinneka Korpus - Translated, CebuaNER - Translated, EMoTES-3K - Translated, Fake News Filipino - Translated, Filipino Hate Speech Tiktok - Text - Translated, Filipino Slang Spelling Normalization - Translated, id-vaccines-tweets - Translated, identifikasi-bahasa - Translated, IndoNER-Tourism - Translated, Indonesia-Chinese-MTRobustEval - Translated, LimeSoda - Translated, MABL - Translated, MKQA - Translated, Multilabel Multiclass Sentiment and Emotion Dataset from Indonesian Mobile Application Review - Translated, Myanmar (Burmese) Name Romanization with Alignment on Grapheme-Level - Translated, NTREX-128 - Translated, SPAMID-PAIR - Translated, Tagalog Profanity Dataset - Translated, Thai Depression Dataset - Translated, Thai Toxicity Tweet Corpus - Translated, Typhoon Yolanda Tweets - Translated, UIT-ViCTSD - Translated, UIT-ViOCD - Translated, Vietnamese Social Media Emotion Corpus (UIT-VSMEC) - Translated, ViHealthQA - Translated, Wisesight Thai Sentiment Corpus - Translated, Wongnai Reviews - Translated, XNLI - Translated, XStoryCloze - Translated
Languages eng, cmn, ind, mya, vie, tha, fil, tgl, khg, hmv, hmf, hnj, lao, zlm, por, tam, yue, khm, jav, abs, ceb, day, xdy, aoz
Tasks Instruction Tuning
License Unknown (unknown)
Homepage https://github.com/Data-Provenance-Initiative/Data-Provenance-Collection/blob/main/data_summaries/Seacrowd.json
HF URL https://huggingface.co/datasets/minnieliang5/seacrowd
Paper URL -

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions