Skip to content

Dataset Creation using LLMs! #96

@Sepideh-Ahmadian

Description

@Sepideh-Ahmadian

We’re so happy to have you on board with the LADy project, Calder! We use the issue pages for many purposes, but we really enjoy noting good articles and our findings on every aspect of the project.

We can use this issue page to compile all our findings about LLMs for data generation. A great article to start with is "On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey", which you can also find in the team’s article repository.

The key questions we’re exploring are: Which language models perform best in data creation (considering the domain and the task at hand), and what are their advantages and disadvantages? As you go through the suggested paper and similar ones, feel free to add and suggest articles in both the Google Doc and here.

Once we've covered the research, we’ll dive into Q1, as mentioned by Hossein in today’s session, where we’ll test the LLMs on our gathered dataset.

If you have any questions, feel free to ask here and mention either me or Hossein!

Metadata

Metadata

Labels

documentationImprovements or additions to documentationexperimentliterature-reviewSummary of the paper related to the work

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions