huggingface · ParagEkbote · Jun 3, 2025 · Jun 3, 2025 · Jun 3, 2025 · Jun 3, 2025
diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml
@@ -82,6 +82,8 @@
           title: HuatuoGPT-o1 Medical RAG and Reasoning
         - local: fine_tune_chatbot_docs_synthetic
           title: Documentation Chatbot with Meta Synthetic Data Kit
+        - local: optuna_hpo_with_transformers
+          title: Hyperparameter Optimization with Optuna and Transformers
 
 
 

diff --git a/notebooks/en/optuna_hpo_with_transformers.ipynb b/notebooks/en/optuna_hpo_with_transformers.ipynb
@@ -0,0 +1,339 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "08092aa8",
+   "metadata": {},
+   "source": [
+    "# Hyperparameter Optimization with Optuna and Transformers\n",
+    "\n",
+    "_Authored by: [Parag Ekbote](https://github.com/ParagEkbote)_\n",
+    "\n",
+    "In this notebook, we are going to use the [optuna](https://github.com/optuna/optuna) library to perform hyperparameter optimization on a light-weight BERT model using a small subset of the IMDB dataset. To learn more about transformers' hyperparameter search, you can check the following documentation [here](https://huggingface.co/docs/transformers/en/hpo_train).\n",
+    "\n",
+    "Firstly, we will install the following dependencies for our code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a309e1a0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -q datasets evaluate transformers optuna wandb"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eff9ccd6",
+   "metadata": {},
+   "source": [
+    "# Prepare the dataset and set the model\n",
+    "\n",
+    "We will load the  IMDB dataset which is a standard benchmark for sentiment analysis. We will define 2000 examples for the training split and 1000 examples for validation. Both sets are shuffled with a fixed seed to ensure reproducibility.\n",
+    "\n",
+    "We shall also tokenize the text and map to efficiently preprocesses all the dataset samples. Next, we will load the accuracy metric. We will also initialize the model to be instantiated for binary classification. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2cfb9d5e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datasets import load_dataset\n",
+    "import evaluate\n",
+    "\n",
+    "from transformers import AutoModelForSequenceClassification\n",
+    "from transformers import AutoTokenizer\n",
+    "from transformers import set_seed\n",
+    "from transformers import Trainer\n",
+    "from transformers import TrainingArguments\n",
+    "\n",
+    "\n",
+    "set_seed(42)\n",
+    "\n",
+    "\n",
+    "train_dataset = load_dataset(\"imdb\", split=\"train\").shuffle(seed=42).select(range(2500))\n",
+    "valid_dataset = load_dataset(\"imdb\", split=\"test\").shuffle(seed=42).select(range(1000))\n",
+    "\n",
+    "model_name = \"answerdotai/ModernBERT-base\"\n",
+    "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
+    "\n",
+    "\n",
+    "def tokenize(batch):\n",
+    "    return tokenizer(batch[\"text\"], padding=\"max_length\", truncation=True, max_length=512)\n",
+    "\n",
+    "\n",
+    "tokenized_train = train_dataset.map(tokenize, batched=True).select_columns(\n",
+    "    [\"input_ids\", \"attention_mask\", \"label\"]\n",
+    ")\n",
+    "tokenized_valid = valid_dataset.map(tokenize, batched=True).select_columns(\n",
+    "    [\"input_ids\", \"attention_mask\", \"label\"]\n",
+    ")\n",
+    "\n",
+    "\n",
+    "metric = evaluate.load(\"accuracy\")\n",
+    "\n",
+    "\n",
+    "def model_init():\n",
+    "    return AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "105d9e01",
+   "metadata": {},
+   "source": [
+    "# Define Storage with Optuna\n",
+    "\n",
+    "To store all trials across sessions, we are going to be using `RDBStorage`, which allows all hyperparameter optimization trials to be stored in a  persistent SQLite database. This also allows for the Visualization and analysis of the trials to become more reproducible."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7652b74b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import optuna\n",
+    "from optuna.storages import RDBStorage\n",
+    "\n",
+    "# Define persistent storage\n",
+    "storage = RDBStorage(\"sqlite:///optuna_trials.db\")\n",
+    "\n",
+    "\n",
+    "study = optuna.create_study(\n",
+    "    study_name=\"transformers_optuna_study\",\n",
+    "    direction=\"maximize\",\n",
+    "    storage=storage,\n",
+    "    load_if_exists=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a46fac4",
+   "metadata": {},
+   "source": [
+    "# Initialize the Trainer Class and Setup Observability\n",
+    "\n",
+    "Now, we can define the metric function to calculate evaluation metrics after each eval step. We shall also define the objective function to maximize the accuracy when selecting the best hyperparameters. For observability, we can utilize Weight & Biases to log the hyperparameter trials. It is important to remember to login with your API key to Weight & Biases to track your trial.\n",
+    "\n",
+    "Finally, we will also define the training arguments for the Trainer that will handle the evaluation, checkpointing, logging and hyperparameter search."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e0ee0bae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import wandb\n",
+    "\n",
+    "def compute_metrics(eval_pred):\n",
+    "    predictions = eval_pred.predictions.argmax(axis=-1)\n",
+    "    labels = eval_pred.label_ids\n",
+    "    return metric.compute(predictions=predictions, references=labels)\n",
+    "\n",
+    "\n",
+    "def compute_objective(metrics):\n",
+    "    return metrics[\"eval_accuracy\"]\n",
+    "\n",
+    "wandb.init(project=\"hf-optuna\", name=f\"trial-{trial.number}\", reinit=True)\n",
+    "\n",
+    "training_args = TrainingArguments(\n",
+    "    output_dir=\"./results\",\n",
+    "        evaluation_strategy=\"epoch\",\n",
+    "        save_strategy=\"epoch\",\n",
+    "        load_best_model_at_end=True,\n",
+    "        logging_strategy=\"epoch\",\n",
+    "        num_train_epochs=3,\n",
+    "        report_to=\"wandb\",  # Logs to W&B\n",
+    "        logging_dir=\"./logs\",\n",
+    "        run_name=f\"trial-{trial.number}\",\n",
+    ")\n",
+    "\n",
+    "\n",
+    "trainer = Trainer(\n",
+    "    model_init=model_init,\n",
+    "    args=training_args,\n",
+    "    train_dataset=tokenized_train,\n",
+    "    eval_dataset=tokenized_valid,\n",
+    "    processing_class=tokenizer,\n",
+    "    compute_metrics=compute_metrics,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b10c26c6",
+   "metadata": {},
+   "source": [
+    "# Define the Search Space and Start the Trials\n",
+    "\n",
+    "We will now define the optuna hyperparameter search space to find the best set of hyperparameters for learning rate, weight decay and batch size. We can now launch the hyperparameter search by passing the following metrics:\n",
+    "\n",
+    "1. direction: We aim to maxime the evaluation metric\n",
+    "2. backend: We will use optuna for searching\n",
+    "3. n_trials: The number of trials optuna will be executed \n",
+    "4. compute_objective: The objective to minimize or maximize from the metrics returned by `evaluate`\n",
+    "5. study_name:  The study name is used to retrieve or continue a specific run.\n",
+    "6. storage: The backend where Optuna will store all trial data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4ef88312",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def optuna_hp_space(trial):\n",
+    "    return {\n",
+    "        \"learning_rate\": trial.suggest_float(\"learning_rate\", 1e-6, 1e-4, log=True),\n",
+    "        \"per_device_train_batch_size\": trial.suggest_categorical(\n",
+    "            \"per_device_train_batch_size\", [16, 32, 64, 128]\n",
+    "        ),\n",
+    "        \"weight_decay\": trial.suggest_float(\"weight_decay\", 0.0, 0.3),\n",
+    "    }\n",
+    "\n",
+    "\n",
+    "best_run = trainer.hyperparameter_search(\n",
+    "    direction=\"maximize\",\n",
+    "    backend=\"optuna\",\n",
+    "    hp_space=optuna_hp_space,\n",
+    "    n_trials=20,\n",
+    "    compute_objective=compute_objective,\n",
+    "    study_name=\"transformers_optuna_study\",\n",
+    "    storage=\"sqlite:///optuna_trials.db\",\n",
+    ")\n",
+    "\n",
+    "print(best_run)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26a95ef3",
+   "metadata": {},
+   "source": [
+    "# Visualize the results\n",
+    "\n",
+    "After the completion of the trials, we can visualize the results in a simple manner using the `optuna` study object.\n",
+    "We can pass the object and plot visualizations that can help to understand the patterns in the trial outcomes. Here, we are plotting the key hyperparameters and how different hyperparameter combinations relate to performance of the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a8f14007",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import optuna\n",
+    "import optuna.visualization as vis\n",
+    "\n",
+    "storage = optuna.storages.RDBStorage(\"sqlite:///optuna_trials.db\")\n",
+    "\n",
+    "study = optuna.load_study(\n",
+    "    study_name=\"transformers_optuna_study\",\n",
+    "    storage=storage\n",
+    ")\n",
+    "\n",
+    "vis.plot_param_importances(study).show()\n",
+    "\n",
+    "vis.plot_parallel_coordinate(study).show()\n",
+    "\n",
+    "vis.plot_contour(study).show()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae8def79",
+   "metadata": {},
+   "source": [
+    "# Perform the Final Training\n",
+    "\n",
+    "We can now train the model by fetching the best parameters we have discovered by performing Hyperparameter Optimization(HPO). We can now pass the optimized training arguments to configure the essential aspects of training."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "572d1ab3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import TrainingArguments, Trainer\n",
+    "\n",
+    "best_hparams = best_run.hyperparameters\n",
+    "\n",
+    "training_args = TrainingArguments(\n",
+    "    output_dir=\"./final_model\",\n",
+    "    learning_rate=best_hparams[\"learning_rate\"],\n",
+    "    per_device_train_batch_size=best_hparams[\"per_device_train_batch_size\"],\n",
+    "    weight_decay=best_hparams[\"weight_decay\"],\n",
+    "    \n",
+    "    evaluation_strategy=\"epoch\",\n",
+    "    save_strategy=\"epoch\",\n",
+    "    load_best_model_at_end=True,\n",
+    "    logging_strategy=\"epoch\",\n",
+    "    num_train_epochs=5, \n",
+    "    push_to_hub=True,    \n",
+    ")\n",
+    "\n",
+    "trainer = Trainer(\n",
+    "    model_init=model_init,  \n",
+    "    args=training_args,\n",
+    "    train_dataset=tokenized_train,\n",
+    "    eval_dataset=tokenized_valid,\n",
+    "    processing_class=tokenizer,\n",
+    "    compute_metrics=compute_metrics,\n",
+    ")\n",
+    "\n",
+    "trainer.train()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de469553",
+   "metadata": {},
+   "source": [
+    "# Uploading to Hugging Face Hub\n",
+    "\n",
+    "We can now save the trained model locally and upload it to the Hugging Face Hub. It is important to remember to login to the Hugging Face Hub using the `huggingface-cli` or `notebook_login()`. Now, we can push the model to the Hub."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b6052309",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "trainer.save_model(\"./final_model\")\n",
+    "tokenizer.save_pretrained(\"./final_model\")\n",
+    "\n",
+    "trainer.push_to_hub(\"your-username/your-model-name\")\n",
+    "tokenizer.push_to_hub(\"your-username/your-model-name\")\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "optuna",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.12.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}