Skip to content

Conversation

@Akash-Nayak
Copy link
Contributor

Description of the change

This PR adds documentation providing guidance on selecting the appropriate data config based on the format of the training data.

@github-actions
Copy link

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions bot added the feat label Jun 20, 2025
Signed-off-by: Akash-Nayak <akash19nayak@gmail.com>
Copy link
Collaborator

@ashokponkumar ashokponkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then indentation seems to be off in most yamls. Can we please check it and fix it?

Comment on lines 26 to 40
- name: dataset_1
# sampling: 1.0
data_paths:
- "tests/artifacts/jsonl/twitter_complaints_small.jsonl"
# Either the below data_handlers section can be used or the dataset_text_field in the tuning config can be used for specifying the field in the dataset that contains the training text for EPT.
# In this sample ept_data, "output" field contains the text for training. Please change it according to your data.
# If your data is already tokenized data, then comment the data handlers section
data_handlers:
- name: tokenize
arguments:
remove_columns: all
batched: false
fn_kwargs:
text_column_name: "output"
max_length: 4096
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation seems to be off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have corrected the indentation.

Akash-Nayak and others added 2 commits June 30, 2025 15:03
Signed-off-by: Akash-Nayak <akash19nayak@gmail.com>
Copy link
Collaborator

@dushyantbehl dushyantbehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dushyantbehl dushyantbehl enabled auto-merge (squash) November 7, 2025 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants