Skip to content

Add an example using Optuna and Transformers #304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions notebooks/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@
title: HuatuoGPT-o1 Medical RAG and Reasoning
- local: fine_tune_chatbot_docs_synthetic
title: Documentation Chatbot with Meta Synthetic Data Kit
- local: optuna_hpo_with_transformers
title: Hyperparameter Optimization with Optuna and Transformers



Expand Down
218 changes: 218 additions & 0 deletions notebooks/en/optuna_hpo_with_transformers.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "08092aa8",
"metadata": {},
"source": [
"# Hyperparameter Optimization with Optuna and Transformers\n",
"\n",
"_Authored by: [Parag Ekbote](https://github.com/ParagEkbote)_\n",
"\n",
"In this notebook, we are going to use the [optuna](https://github.com/optuna/optuna) library to perform hyperparameter optimization on a light-weight BERT model using a small subset of the IMDB dataset. To learn more about transformers' hyperparameter search, you can check the following documentation [here](https://huggingface.co/docs/transformers/en/hpo_train).\n",
"\n",
"Firstly, we will install the following dependencies for our code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a309e1a0",
"metadata": {},
"outputs": [],
"source": [
"!pip install -q datasets evaluate transformers optuna"
]
},
{
"cell_type": "markdown",
"id": "eff9ccd6",
"metadata": {},
"source": [
"# Prepare the dataset and set the Model\n",
"\n",
"We will load the IMDB dataset which is a standard benchmark for sentiment analysis. We will define 2000 examples for the training split and 1000 examples for validation. Both sets are shuffled with a fixed seed to ensure reproducibility.\n",
"\n",
"We shall also tokenize the text and map to efficiently preprocesses all the dataset samples. Next, we will load the accuracy metric. We will also initialize the model to be instantiated for binary classification. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2cfb9d5e",
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"import evaluate\n",
"\n",
"from transformers import AutoModelForSequenceClassification\n",
"from transformers import AutoTokenizer\n",
"from transformers import set_seed\n",
"from transformers import Trainer\n",
"from transformers import TrainingArguments\n",
"\n",
"\n",
"set_seed(42)\n",
"\n",
"\n",
"train_dataset = load_dataset(\"imdb\", split=\"train\").shuffle(seed=42).select(range(2000))\n",
"valid_dataset = load_dataset(\"imdb\", split=\"test\").shuffle(seed=42).select(range(1000))\n",
"\n",
"model_name = \"lvwerra/distilbert-imdb\"\n",
"tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
"\n",
"\n",
"def tokenize(batch):\n",
" return tokenizer(batch[\"text\"], padding=\"max_length\", truncation=True, max_length=512)\n",
"\n",
"\n",
"tokenized_train = train_dataset.map(tokenize, batched=True).select_columns(\n",
" [\"input_ids\", \"attention_mask\", \"label\"]\n",
")\n",
"tokenized_valid = valid_dataset.map(tokenize, batched=True).select_columns(\n",
" [\"input_ids\", \"attention_mask\", \"label\"]\n",
")\n",
"\n",
"\n",
"metric = evaluate.load(\"accuracy\")\n",
"\n",
"\n",
"def model_init():\n",
" return AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)\n"
]
},
{
"cell_type": "markdown",
"id": "5a46fac4",
"metadata": {},
"source": [
"# Set the Metrics and define the Trainer class\n",
"\n",
"Now, we can define the metric function to calculate evaluation metrics after each eval step. We shall also define the objective function to maximize the accuracy when selecting the best hyperparameters.\n",
"\n",
"Finally, we will also define the training arguments for the Trainer that will handle the evaluation, checkpointing, logging and hyperparameter search."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0ee0bae",
"metadata": {},
"outputs": [],
"source": [
"def compute_metrics(eval_pred):\n",
" predictions = eval_pred.predictions.argmax(axis=-1)\n",
" labels = eval_pred.label_ids\n",
" return metric.compute(predictions=predictions, references=labels)\n",
"\n",
"\n",
"def compute_objective(metrics):\n",
" return metrics[\"eval_accuracy\"]\n",
"\n",
"\n",
"training_args = TrainingArguments(\n",
" eval_strategy=\"epoch\",\n",
" save_strategy=\"best\",\n",
" load_best_model_at_end=True,\n",
" logging_strategy=\"epoch\",\n",
" report_to=\"none\",\n",
")\n",
"\n",
"\n",
"trainer = Trainer(\n",
" model_init=model_init,\n",
" args=training_args,\n",
" train_dataset=tokenized_train,\n",
" eval_dataset=tokenized_valid,\n",
" processing_class=tokenizer,\n",
" compute_metrics=compute_metrics,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b10c26c6",
"metadata": {},
"source": [
"# Define the Search Space and Start the Trials\n",
"\n",
"We will now define the optuna hyperparameter search space to find the best set of hyperparameters for learning rate, weight decay and batch size. We can now launch the hyperparameter search by passing the following metrics:\n",
"\n",
"1. direction: We aim to maxime the evaluation metric\n",
"2. backend: We will use optuna for searching\n",
"3. n_trials: The number of trials optuna will be executed \n",
"4. compute_objective: The objective to minimize or maximize from the metrics returned by `evaluate`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ef88312",
"metadata": {},
"outputs": [],
"source": [
"def optuna_hp_space(trial):\n",
" return {\n",
" \"learning_rate\": trial.suggest_float(\"learning_rate\", 1e-6, 1e-4, log=True),\n",
" \"per_device_train_batch_size\": trial.suggest_categorical(\n",
" \"per_device_train_batch_size\", [16, 32, 64, 128]\n",
" ),\n",
" \"weight_decay\": trial.suggest_float(\"weight_decay\", 0.0, 0.3),\n",
" }\n",
"\n",
"\n",
"best_run = trainer.hyperparameter_search(\n",
" direction=\"maximize\",\n",
" backend=\"optuna\",\n",
" hp_space=optuna_hp_space,\n",
" n_trials=20,\n",
" compute_objective=compute_objective,\n",
")\n",
"\n",
"print(best_run)"
]
},
{
"cell_type": "markdown",
"id": "26a95ef3",
"metadata": {},
"source": [
"# Visualize the results\n",
"\n",
"After the completion of the trials, we can visualize the results in a simple manner using the `optuna` study object.\n",
"We can pass the object and plot visualizations that can help to understand the patterns in the trial outcomes. Here, we are plotting the key hyperparameters and how different hyperparameter combinations relate to performance of the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8f14007",
"metadata": {},
"outputs": [],
"source": [
"import optuna\n",
"import optuna.visualization as vis\n",
"\n",
"study = best_run.study \n",
"\n",
"optuna.visualization.plot_param_importances(study).show()\n",
"\n",
"vis.plot_parallel_coordinate(study)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "optuna",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.12.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}