diff --git a/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/.gitignore b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/.gitignore
new file mode 100644
index 00000000..a271c463
--- /dev/null
+++ b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/.gitignore
@@ -0,0 +1,39 @@
+# Model files
+models/*.gguf
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+.Python
+venv/
+env/
+
+# Jupyter
+.ipynb_checkpoints/
+*.ipynb_checkpoints
+
+# Audio files
+*.wav
+*.mp3
+*.m4a
+
+# Image files 
+*.png
+*.jpg
+*.jpeg
+!docs/*.png
+
+# Logs
+*.log
+server.log
+
+# OS
+.DS_Store
+Thumbs.db
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
\ No newline at end of file
diff --git a/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/README.md b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/README.md
new file mode 100644
index 00000000..1af91d4c
--- /dev/null
+++ b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/README.md
@@ -0,0 +1,202 @@
+# LlamaCpp Provider for Strands SDK
+
+This tutorial demonstrates using the LlamaCpp provider with Strands Agents. LlamaCpp enables running quantized models locally with advanced features like grammar constraints, multimodal support, and custom sampling parameters.
+
+## Prerequisites
+
+- Python 3.8+
+- llama.cpp with server support ([Installation Guide](https://github.com/ggerganov/llama.cpp))
+- 16GB RAM (minimum 8GB)
+- 8GB storage for model files
+
+## Quick Start
+
+### 1. Install Dependencies
+
+```bash
+pip install -r requirements.txt
+```
+
+### 2. Download Model Files
+
+Download the quantized Qwen2.5-Omni model for multimodal capabilities:
+
+```bash
+# Create models directory
+mkdir -p models && cd models
+
+# Download main model (4.68 GB)
+huggingface-cli download ggml-org/Qwen2.5-Omni-7B-GGUF \
+  Qwen2.5-Omni-7B-Q4_K_M.gguf --local-dir .
+
+# Download multimodal projector (1.55 GB)
+huggingface-cli download ggml-org/Qwen2.5-Omni-7B-GGUF \
+  mmproj-Qwen2.5-Omni-7B-Q8_0.gguf --local-dir .
+
+cd ..
+```
+
+Both files are required for audio and vision support.
+
+### 3. Start LlamaCpp Server
+
+```bash
+llama-server -m models/Qwen2.5-Omni-7B-Q4_K_M.gguf \
+  --mmproj models/mmproj-Qwen2.5-Omni-7B-Q8_0.gguf \
+  --host 0.0.0.0 --port 8080 -c 8192 -ngl 50 --jinja
+```
+
+Key parameters:
+- `-m`: Path to main model file
+- `--mmproj`: Path to multimodal projector
+- `-c`: Context window size (default: 8192)
+- `-ngl`: Number of GPU layers (0 for CPU-only)
+- `--jinja`: Enable template support for tools
+
+The server will log "loaded multimodal model" when ready.
+
+### 4. Run the Tutorial
+
+```bash
+jupyter notebook llamacpp_demo.ipynb
+```
+
+## Key Features
+
+### Grammar Constraints
+
+Control output format using GBNF (Backus-Naur Form) grammars:
+
+```python
+model.use_grammar_constraint('root ::= "yes" | "no"')
+```
+
+### Advanced Sampling
+
+LlamaCpp provides fine-grained control over text generation:
+
+- **Mirostat**: Dynamic perplexity control
+- **TFS**: Tail-free sampling for quality improvement
+- **Min-p**: Minimum probability threshold
+- **Custom sampler ordering**: Control sampling pipeline
+
+### Structured Output
+
+Generate validated JSON output using Pydantic models:
+
+```python
+agent.structured_output(MyModel, "Generate user data")
+```
+
+### Multimodal Capabilities
+
+- **Audio Input**: Process speech and audio files
+- **Vision Input**: Analyze images
+- **Combined Processing**: Simultaneous audio-visual understanding
+
+### Performance Optimization
+
+- Prompt caching for repeated queries
+- Slot-based session management
+- GPU acceleration with configurable layers
+
+## Tutorial Content
+
+The Jupyter notebook demonstrates:
+
+1. **Grammar Constraints**: Enforce specific output formats
+2. **Sampling Strategies**: Compare generation quality with different parameters
+3. **Structured Output**: Type-safe data generation
+4. **Tool Integration**: Function calling with LlamaCpp
+5. **Audio Processing**: Speech recognition and understanding
+6. **Image Analysis**: Visual content interpretation
+7. **Multimodal Agents**: Combined audio-visual processing
+8. **Performance Testing**: Optimization techniques and benchmarks
+
+## Additional Examples
+
+The `examples/` directory contains standalone Python scripts demonstrating specific features.
+
+## Parameter Reference
+
+### Standard Parameters
+
+- `temperature`: Controls randomness (0.0-2.0)
+- `max_tokens`: Maximum response length
+- `top_p`: Nucleus sampling threshold
+- `frequency_penalty`: Reduce repetition
+- `presence_penalty`: Encourage topic diversity
+
+### LlamaCpp-Specific Parameters
+
+- `grammar`: GBNF grammar string
+- `json_schema`: JSON schema for structured output
+- `mirostat`: Enable Mirostat sampling (0, 1, or 2)
+- `min_p`: Minimum probability cutoff
+- `repeat_penalty`: Penalize token repetition
+- `cache_prompt`: Enable prompt caching
+- `slot_id`: Session slot for multi-user support
+
+See the notebook for detailed parameter usage examples.
+
+## Hardware Requirements
+
+### Minimum Configuration
+- 8GB RAM
+- 4GB VRAM (or CPU-only mode)
+- 8GB storage
+
+### Recommended Configuration
+- 16GB RAM
+- 8GB+ VRAM
+- CUDA-capable GPU or Apple Silicon
+
+### GPU Acceleration
+- **NVIDIA**: Requires CUDA toolkit
+- **Apple Silicon**: Metal support included
+- **AMD**: ROCm support (experimental)
+- **CPU Mode**: Set `-ngl 0` when starting server
+
+## About Quantized Models
+
+### What is Quantization?
+
+Quantization reduces model size by using lower precision numbers (e.g., 4-bit instead of 16-bit). This enables running large language models on consumer hardware with minimal quality loss.
+
+### Qwen2.5-Omni-7B Model
+
+- **Parameters**: 7.6 billion
+- **Quantization**: 4-bit (Q4_K_M format)
+- **Size**: 4.68GB (vs ~15GB unquantized)
+- **Context**: 8,192 tokens (expandable to 32K)
+- **Languages**: 23 languages supported
+
+### LlamaCpp vs Ollama
+
+| Feature | LlamaCpp | Ollama |
+|---------|----------|--------|
+| **Model Format** | GGUF files | Modelfile abstraction |
+| **Control** | Full parameter access | Simplified interface |
+| **Features** | Grammar, multimodal, sampling | Basic generation |
+| **Use Case** | Advanced applications | Quick prototyping |
+
+LlamaCpp provides lower-level control suitable for production applications requiring specific output formats or advanced features.
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Server won't start**: Verify llama.cpp installation and model file paths
+2. **Out of memory**: Reduce GPU layers with `-ngl` parameter
+3. **No multimodal support**: Ensure both model files are downloaded
+4. **Slow performance**: Enable GPU acceleration or reduce context size
+
+### Additional Resources
+
+- [LlamaCpp Documentation](https://github.com/ggerganov/llama.cpp)
+- [GGUF Format Specification](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md)
+- [Strands SDK Documentation](https://docs.strands.dev)
+
+## License
+
+MIT
\ No newline at end of file
diff --git a/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/llamacpp.ipynb b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/llamacpp.ipynb
new file mode 100644
index 00000000..4ce4c030
--- /dev/null
+++ b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/llamacpp.ipynb
@@ -0,0 +1,604 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Building a Multimodal Agent with LlamaCpp and Strands SDK\n",
+    "\n",
+    "This notebook demonstrates how to create agents using LlamaCpp with Strands SDK. You'll learn to use quantized models locally with advanced features like grammar constraints, multimodal processing, and custom tools.\n",
+    "\n",
+    "LlamaCpp supports any GGUF-format quantized model. You can easily switch between models by downloading different GGUF files and updating the model path. Popular options include Llama, Mistral, Phi, and Qwen families. Simply change the model file in your server command to use a different model:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup and Installation\n",
+    "\n",
+    "For this tutorial, we use Qwen2.5-Omni for its multimodal capabilities (audio + vision + text).\n",
+    "\n",
+    "Before running this notebook, ensure you have:\n",
+    "\n",
+    "1. **Python 3.8+** installed\n",
+    "2. **llama.cpp** with server support ([Installation Guide](https://github.com/ggerganov/llama.cpp))\n",
+    "3. **Model files** downloaded:\n",
+    "\n",
+    "```bash\n",
+    "# Download Qwen2.5-Omni model files\n",
+    "mkdir -p models && cd models\n",
+    "\n",
+    "# Main model (4.68 GB)\n",
+    "huggingface-cli download ggml-org/Qwen2.5-Omni-7B-GGUF \\\n",
+    "  Qwen2.5-Omni-7B-Q4_K_M.gguf --local-dir .\n",
+    "\n",
+    "# Multimodal projector (1.55 GB) - Required for audio/vision\n",
+    "huggingface-cli download ggml-org/Qwen2.5-Omni-7B-GGUF \\\n",
+    "  mmproj-Qwen2.5-Omni-7B-Q8_0.gguf --local-dir .\n",
+    "\n",
+    "cd ..\n",
+    "```\n",
+    "\n",
+    "4. **Start the server**:\n",
+    "\n",
+    "```bash\n",
+    "llama-server -m models/Qwen2.5-Omni-7B-Q4_K_M.gguf \\\n",
+    "  --mmproj models/mmproj-Qwen2.5-Omni-7B-Q8_0.gguf \\\n",
+    "  --host 0.0.0.0 --port 8080 -c 8192 -ngl 50 --jinja\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Install Python Dependencies\n",
+    "\n",
+    "Install the Strands SDK and required libraries for audio processing, image handling, and notebook widgets."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install dependencies from requirements.txt\n",
+    "!pip install -q -r requirements.txt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Import Required Libraries\n",
+    "\n",
+    "Import the Strands SDK components and utility functions we'll use throughout this tutorial. The utils folder contains helper functions for audio, image, grammar, and benchmarking operations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "import json\n",
+    "from datetime import datetime\n",
+    "from typing import List, Dict, Any\n",
+    "from pathlib import Path\n",
+    "\n",
+    "# Add utils to path\n",
+    "utils_path = os.path.join(os.getcwd(), 'utils')\n",
+    "if utils_path not in sys.path:\n",
+    "    sys.path.append(utils_path)\n",
+    "\n",
+    "# Import Strands SDK\n",
+    "from strands import Agent, tool\n",
+    "from strands.models.llamacpp import LlamaCppModel\n",
+    "from pydantic import BaseModel, Field\n",
+    "\n",
+    "# Import utilities\n",
+    "from utils import (\n",
+    "    # Audio utilities\n",
+    "    AudioRecorder, create_audio_interface, display_audio_interface,\n",
+    "    \n",
+    "    # Image utilities  \n",
+    "    create_test_image, image_to_bytes, analyze_image_with_llamacpp,\n",
+    "    \n",
+    "    # Grammar and sampling utilities\n",
+    "    demonstrate_grammar_constraint, test_sampling_strategy,\n",
+    "    get_predefined_grammars, get_sampling_strategies,\n",
+    "    \n",
+    "    # Benchmark utilities\n",
+    "    benchmark_performance, run_comprehensive_benchmark\n",
+    ")\n",
+    "\n",
+    "# IPython for multimedia display\n",
+    "from IPython.display import Audio, Image as IPImage, display, HTML\n",
+    "import ipywidgets as widgets"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Define Structured Data Models\n",
+    "\n",
+    "First, define Pydantic models that will be used for type-safe structured output generation. These models ensure the AI generates data in exactly the format you need."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define structured output models\n",
+    "class TaskPlan(BaseModel):\n",
+    "    \"\"\"A structured task plan.\"\"\"\n",
+    "    title: str = Field(description=\"Brief title of the task\")\n",
+    "    steps: List[str] = Field(description=\"List of steps to complete\")\n",
+    "    estimated_time: int = Field(description=\"Estimated time in minutes\")\n",
+    "    difficulty: str = Field(description=\"Easy, Medium, or Hard\")\n",
+    "    \n",
+    "class ProductReview(BaseModel):\n",
+    "    \"\"\"A structured product review.\"\"\"\n",
+    "    product_name: str = Field(description=\"Name of the product\")\n",
+    "    rating: int = Field(description=\"Rating from 1 to 5\")\n",
+    "    pros: List[str] = Field(description=\"Positive aspects\")\n",
+    "    cons: List[str] = Field(description=\"Negative aspects\")\n",
+    "    recommendation: bool = Field(description=\"Would you recommend it?\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Sampling Parameters\n",
+    "\n",
+    "LlamaCpp offers fine-grained control over text generation through various sampling strategies. The following examples demonstrate how different parameters affect output quality and creativity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Test different sampling strategies\n",
+    "strategies = get_sampling_strategies()\n",
+    "prompt = \"Write a creative story opening about a mysterious door:\"\n",
+    "\n",
+    "# Test first 3 strategies\n",
+    "for strategy in strategies[:3]:\n",
+    "    test_sampling_strategy(\n",
+    "        params=strategy[\"params\"],\n",
+    "        name=strategy[\"name\"],\n",
+    "        prompt=prompt\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Grammar Constraints in Action\n",
+    "\n",
+    "Test predefined GBNF grammars that force specific output formats. Watch how the model's responses are constrained to match exact patterns like yes/no, multiple choice, or JSON structures."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Demonstrate various grammar constraints\n",
+    "grammars = get_predefined_grammars()\n",
+    "\n",
+    "# Test a few interesting examples\n",
+    "examples_to_test = [\"yes_no\", \"multiple_choice\", \"simple_json\", \"color_names\"]\n",
+    "\n",
+    "for grammar_name in examples_to_test:\n",
+    "    if grammar_name in grammars:\n",
+    "        grammar_info = grammars[grammar_name]\n",
+    "        \n",
+    "        demonstrate_grammar_constraint(\n",
+    "            grammar=grammar_info[\"grammar\"],\n",
+    "            prompt=grammar_info[\"example_prompt\"],\n",
+    "            description=f\"{grammar_name.upper()}: {grammar_info['description']}\",\n",
+    "            base_url=\"http://localhost:8080\",\n",
+    "            temperature=0.1,\n",
+    "            max_tokens=50\n",
+    "        )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Custom Grammar Examples\n",
+    "\n",
+    "Create your own GBNF grammar for specific use cases and use JSON schemas as an alternative constraint method. Both approaches guarantee structured output without post-processing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example: Create a custom grammar for star ratings\n",
+    "star_rating_grammar = '''root ::= rating \" stars\"\n",
+    "rating ::= \"1\" | \"2\" | \"3\" | \"4\" | \"5\"'''\n",
+    "\n",
+    "# Create model with custom grammar\n",
+    "model = LlamaCppModel(\n",
+    "    base_url=\"http://localhost:8080\",\n",
+    "    params={\"temperature\": 0.1, \"max_tokens\": 20}\n",
+    ")\n",
+    "\n",
+    "# Apply the grammar constraint\n",
+    "model.use_grammar_constraint(star_rating_grammar)\n",
+    "agent = Agent(model=model)\n",
+    "\n",
+    "# Test the constraint\n",
+    "test_prompts = [\n",
+    "    \"How would you rate this restaurant?\",\n",
+    "    \"What's your opinion on this movie?\",\n",
+    "    \"Rate the customer service experience:\"\n",
+    "]\n",
+    "\n",
+    "for prompt in test_prompts:\n",
+    "    response = agent(prompt)\n",
+    "    print(f\"{prompt} -> {response}\")\n",
+    "\n",
+    "# Example: JSON Schema constraint\n",
+    "json_model = LlamaCppModel(\n",
+    "    base_url=\"http://localhost:8080\",\n",
+    "    params={\"temperature\": 0.3, \"max_tokens\": 100}\n",
+    ")\n",
+    "\n",
+    "# Define JSON schema for structured output\n",
+    "product_schema = {\n",
+    "    \"type\": \"object\",\n",
+    "    \"properties\": {\n",
+    "        \"product_name\": {\"type\": \"string\"},\n",
+    "        \"price\": {\"type\": \"number\", \"minimum\": 0},\n",
+    "        \"category\": {\"type\": \"string\", \"enum\": [\"electronics\", \"clothing\", \"food\", \"books\"]},\n",
+    "        \"in_stock\": {\"type\": \"boolean\"}\n",
+    "    },\n",
+    "    \"required\": [\"product_name\", \"price\", \"category\", \"in_stock\"]\n",
+    "}\n",
+    "\n",
+    "json_model.use_json_schema(product_schema)\n",
+    "json_agent = Agent(model=json_model)\n",
+    "\n",
+    "response = json_agent(\"Generate information for a laptop product:\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Custom Tools\n",
+    "\n",
+    "Extend your agent's capabilities by defining custom functions that the model can call. The following tools demonstrate how to add domain-specific functionality to your agent."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define custom tools\n",
+    "@tool\n",
+    "def calculate_bmi(weight_kg: float, height_m: float) -> Dict[str, Any]:\n",
+    "    \"\"\"\n",
+    "    Calculate Body Mass Index (BMI).\n",
+    "    \n",
+    "    Args:\n",
+    "        weight_kg: Weight in kilograms\n",
+    "        height_m: Height in meters\n",
+    "        \n",
+    "    Returns:\n",
+    "        BMI value and category\n",
+    "    \"\"\"\n",
+    "    bmi = weight_kg / (height_m ** 2)\n",
+    "    \n",
+    "    if bmi < 18.5:\n",
+    "        category = \"Underweight\"\n",
+    "    elif bmi < 25:\n",
+    "        category = \"Normal weight\"\n",
+    "    elif bmi < 30:\n",
+    "        category = \"Overweight\"\n",
+    "    else:\n",
+    "        category = \"Obese\"\n",
+    "    \n",
+    "    return {\n",
+    "        \"bmi\": round(bmi, 2),\n",
+    "        \"category\": category,\n",
+    "        \"healthy_range\": \"18.5 - 24.9\"\n",
+    "    }\n",
+    "\n",
+    "@tool\n",
+    "def get_weather_description(condition: str) -> str:\n",
+    "    \"\"\"\n",
+    "    Get a poetic description of weather conditions.\n",
+    "    \n",
+    "    Args:\n",
+    "        condition: Weather condition (sunny, rainy, cloudy, etc.)\n",
+    "        \n",
+    "    Returns:\n",
+    "        Poetic weather description\n",
+    "    \"\"\"\n",
+    "    descriptions = {\n",
+    "        \"sunny\": \"Golden rays dance across azure skies\",\n",
+    "        \"rainy\": \"Silver droplets paint the world anew\",\n",
+    "        \"cloudy\": \"Cotton castles drift through endless blue\",\n",
+    "        \"snowy\": \"Crystal blankets hush the sleeping earth\"\n",
+    "    }\n",
+    "    \n",
+    "    return descriptions.get(condition.lower(), f\"The weather shows its {condition} face\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create Agent with Tools\n",
+    "\n",
+    "Initialize an agent with access to your custom tools. The agent will automatically determine when to use these tools based on the user's query."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create agent with tools\n",
+    "model = LlamaCppModel(\n",
+    "    base_url=\"http://localhost:8080\",\n",
+    "    params={\n",
+    "        \"temperature\": 0.7,\n",
+    "        \"max_tokens\": 300,\n",
+    "        \"top_k\": 40\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "agent = Agent(\n",
+    "    model=model,\n",
+    "    tools=[calculate_bmi, get_weather_description],\n",
+    "    system_prompt=\"You are a helpful assistant with access to calculation and description tools.\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Test Tool Usage\n",
+    "\n",
+    "Observe how the agent intelligently calls the appropriate tools based on natural language queries. The agent handles both single and compound tool requests seamlessly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Test tool usage\n",
+    "test_queries = [\n",
+    "    \"What's the BMI for someone who is 1.75m tall and weighs 70kg?\",\n",
+    "    \"Give me a poetic description of rainy weather\",\n",
+    "    \"Calculate BMI for 85kg and 1.80m, then describe sunny weather\"\n",
+    "]\n",
+    "\n",
+    "for query in test_queries:\n",
+    "    response = agent(query)\n",
+    "    print(f\"Q: {query}\")\n",
+    "    print(f\"A: {response}\\n\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Multimodal Processing\n",
+    "\n",
+    "Process audio and images alongside text using Qwen2.5-Omni's multimodal capabilities:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create speech recognition interface\n",
+    "recorder = AudioRecorder(sample_rate=16000)\n",
+    "\n",
+    "# Create interface for multilingual speech recognition\n",
+    "interface_components = create_audio_interface(\n",
+    "    recorder=recorder,\n",
+    "    base_url=\"http://localhost:8080\"\n",
+    ")\n",
+    "\n",
+    "# Display the interface\n",
+    "display_audio_interface(interface_components)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " Example: Audio message format for Qwen2.5-Omni\n",
+    "\n",
+    "```python\n",
+    "example_message = {\n",
+    "    \"role\": \"user\",\n",
+    "    \"content\": [\n",
+    "        {\n",
+    "            \"type\": \"audio\",\n",
+    "            \"audio\": {\n",
+    "                \"data\": \"base64_encoded_audio_data_here\",\n",
+    "                \"format\": \"wav\"\n",
+    "            }\n",
+    "        },\n",
+    "        {\n",
+    "            \"type\": \"text\", \n",
+    "            \"text\": \"Please transcribe exactly what was said. If not in English, provide: 1) Original transcription 2) Language detected 3) English translation\"\n",
+    "        }\n",
+    "    ]\n",
+    "}\n",
+    "```\n",
+    " The SDK handles this formatting automatically when using the interface above"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create and analyze test image\n",
+    "test_image = create_test_image()\n",
+    "display(test_image)\n",
+    "\n",
+    "# Analyze image\n",
+    "analysis = analyze_image_with_llamacpp(\n",
+    "    test_image,\n",
+    "    \"Describe this image in detail. What shapes and colors do you see?\",\n",
+    "    max_tokens=200\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Performance Optimization\n",
+    "\n",
+    "Compare three optimization strategies to understand the trade-offs between quality, speed, and resource usage. Each configuration is tailored for different production scenarios."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Test different performance optimization settings\n",
+    "\n",
+    "# Configuration 1: High Quality (slower)\n",
+    "high_quality_model = LlamaCppModel(\n",
+    "    base_url=\"http://localhost:8080\",\n",
+    "    params={\n",
+    "        \"temperature\": 0.3,\n",
+    "        \"top_k\": 10,\n",
+    "        \"repeat_penalty\": 1.2,\n",
+    "        \"max_tokens\": 100\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "# Configuration 2: Balanced Performance\n",
+    "balanced_model = LlamaCppModel(\n",
+    "    base_url=\"http://localhost:8080\",\n",
+    "    params={\n",
+    "        \"temperature\": 0.7,\n",
+    "        \"top_k\": 40,\n",
+    "        \"min_p\": 0.05,\n",
+    "        \"max_tokens\": 100,\n",
+    "        \"cache_prompt\": True  # Enable prompt caching\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "# Configuration 3: Speed Optimized\n",
+    "speed_model = LlamaCppModel(\n",
+    "    base_url=\"http://localhost:8080\",\n",
+    "    params={\n",
+    "        \"temperature\": 0.8,\n",
+    "        \"top_k\": 20,\n",
+    "        \"max_tokens\": 100,\n",
+    "        \"cache_prompt\": True,\n",
+    "        \"n_probs\": 0  # Disable probability computation\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "# Test each configuration\n",
+    "prompt = \"Explain machine learning in simple terms:\"\n",
+    "\n",
+    "agent_hq = Agent(model=high_quality_model)\n",
+    "response_hq = agent_hq(prompt)\n",
+    "\n",
+    "agent_balanced = Agent(model=balanced_model)\n",
+    "response_balanced = agent_balanced(prompt)\n",
+    "\n",
+    "agent_speed = Agent(model=speed_model)\n",
+    "response_speed = agent_speed(prompt)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Performance Benchmark\n",
+    "\n",
+    "Run a comprehensive benchmark to measure response times and quality across different configurations. This data helps you choose optimal settings for your specific use case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run comprehensive performance benchmark\n",
+    "benchmark_results = run_comprehensive_benchmark(base_url=\"http://localhost:8080\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Next Steps\n",
+    "\n",
+    "This notebook demonstrated the key features of LlamaCpp with Strands SDK. You can now:\n",
+    "\n",
+    "- Experiment with different GGUF models from Hugging Face\n",
+    "- Create custom grammars for your specific use cases\n",
+    "- Build production applications with local AI\n",
+    "- Explore multimodal capabilities with other models\n",
+    "\n",
+    "For more examples and documentation, visit the [Strands SDK Documentation](https://docs.strands.ai)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/requirements.txt b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/requirements.txt
new file mode 100644
index 00000000..a680aeec
--- /dev/null
+++ b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/requirements.txt
@@ -0,0 +1,26 @@
+# Core Strands SDK
+strands-agents>=1.4.0
+
+# Web framework
+fastapi>=0.115.0
+
+# Data validation
+pydantic>=2.0.0
+
+# Numerical operations
+numpy>=1.24.0
+
+# HTTP requests
+requests>=2.31.0
+
+# Audio recording and processing
+sounddevice>=0.4.6
+scipy>=1.10.0
+soundfile>=0.12.0
+
+# Image processing
+pillow>=10.0.0
+
+# Jupyter notebook support
+jupyter>=1.0.0
+ipywidgets>=8.0.0
\ No newline at end of file
diff --git a/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/README.md b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/README.md
new file mode 100644
index 00000000..500799ed
--- /dev/null
+++ b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/README.md
@@ -0,0 +1,162 @@
+# LlamaCpp Tutorial Utilities
+
+Support modules for the LlamaCpp tutorial notebook.
+
+## Modules
+
+### audio_utils.py
+Audio recording and analysis utilities.
+
+**Classes:**
+- `AudioRecorder`: Simple audio recorder for capturing microphone input
+  - Record audio with configurable duration and sample rate
+  - Play back recorded audio
+  - Convert audio to bytes for SDK integration
+
+**Functions:**
+- `create_enhanced_audio_interface()`: Creates a comprehensive Jupyter widget interface
+- `display_audio_interface()`: Displays the audio interface in notebooks
+
+**Features:**
+- Progress tracking during recording and analysis
+- Separate output areas for recording status, analysis, and transcription
+- Error handling with troubleshooting guidance
+- Support for Qwen2.5-Omni multimodal analysis
+
+### image_utils.py
+Image processing and analysis utilities.
+
+**Functions:**
+- `create_test_image()`: Create simple test images with geometric shapes
+- `create_complex_test_image()`: Create complex scenes for advanced testing
+- `image_to_bytes()`: Convert PIL images to bytes for SDK
+- `analyze_image_with_llamacpp()`: Analyze images using LlamaCpp multimodal models
+- `create_image_analysis_demo()`: Complete image analysis demonstration
+- `load_external_image()`: Load images from file paths
+- `resize_image()`: Resize images while maintaining aspect ratio
+
+**Features:**
+- Programmatic test image generation
+- Direct integration with Strands SDK
+- Error handling for analysis failures
+- Support for various image formats
+
+### grammar_utils.py
+Grammar constraints and sampling utilities.
+
+**Functions:**
+- `demonstrate_grammar_constraint()`: Test specific GBNF grammar constraints
+- `get_predefined_grammars()`: Collection of common grammar patterns
+- `test_sampling_strategy()`: Test different sampling configurations
+- `get_sampling_strategies()`: Predefined sampling strategy configurations
+- `test_structured_output()`: Generate structured output with Pydantic models
+- `run_grammar_constraints_demo()`: Comprehensive grammar demonstration
+- `run_sampling_strategies_demo()`: Comprehensive sampling demonstration
+- `create_json_grammar()`: Generate GBNF grammars from JSON schemas
+
+**Features:**
+- Pre-built grammar patterns
+- Multiple sampling strategies
+- Structured output generation
+- Response timing analysis
+
+### benchmark_utils.py
+Performance benchmarking utilities.
+
+**Functions:**
+- `benchmark_performance()`: Comprehensive performance testing
+- `analyze_benchmark_results()`: Statistical analysis of benchmark data
+- `visualize_performance()`: Text-based performance visualizations
+- `run_comprehensive_benchmark()`: Complete benchmark suite with analysis
+
+**Features:**
+- Multiple configuration testing with statistical analysis
+- Performance comparison with baseline measurements
+- Text-based charts for response time, tokens/sec, and consistency
+- Recommendations based on benchmark results
+- Error handling for failed benchmark runs
+
+## Usage Examples
+
+### Audio Recording
+```python
+from utils import AudioRecorder, create_enhanced_audio_interface, display_audio_interface
+
+# Create recorder
+recorder = AudioRecorder(sample_rate=16000)
+
+# Create interface
+interface = create_enhanced_audio_interface(recorder)
+display_audio_interface(interface)
+```
+
+### Image Analysis
+```python
+from utils import create_test_image, analyze_image_with_llamacpp
+
+# Create and analyze image
+image = create_test_image()
+analysis = analyze_image_with_llamacpp(image, "Describe this image")
+print(analysis)
+```
+
+### Grammar Constraints
+```python
+from utils import demonstrate_grammar_constraint, get_predefined_grammars
+
+# Get available grammars
+grammars = get_predefined_grammars()
+
+# Test yes/no constraint
+demonstrate_grammar_constraint(
+    grammars["yes_no"]["grammar"],
+    "Is Python interpreted?",
+    "Yes/No responses only"
+)
+```
+
+### Performance Benchmarking
+```python
+from utils import run_comprehensive_benchmark
+
+# Run complete benchmark suite
+results = run_comprehensive_benchmark()
+print(f"Fastest config: {results['summary']['fastest_config']}")
+```
+
+## Dependencies
+
+- `strands`: Strands SDK
+- `sounddevice`, `soundfile`, `scipy`: Audio processing
+- `PIL`: Image manipulation
+- `ipywidgets`: Notebook widgets
+- `pydantic`: Data validation
+- `numpy`: Numerical operations
+
+## Integration with Notebook
+
+The main notebook imports all utilities with:
+
+```python
+from utils import (
+    # Audio utilities
+    AudioRecorder, create_enhanced_audio_interface, display_audio_interface,
+    
+    # Image utilities  
+    create_test_image, analyze_image_with_llamacpp,
+    
+    # Grammar utilities
+    demonstrate_grammar_constraint, get_predefined_grammars,
+    
+    # Benchmark utilities
+    run_comprehensive_benchmark
+)
+```
+
+This keeps the notebook clean and focused on demonstrating LlamaCpp capabilities while maintaining all functionality in reusable, well-organized modules.
+
+## Notes
+
+- All functions include error handling
+- Modular design for easy extension
+- See individual module docstrings for detailed API documentation
\ No newline at end of file
diff --git a/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/__init__.py b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/__init__.py
new file mode 100644
index 00000000..a754371f
--- /dev/null
+++ b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/__init__.py
@@ -0,0 +1,75 @@
+"""
+Utils package for LlamaCpp model provider demo.
+
+This package contains helper utilities for the LlamaCpp demo notebook,
+organized into logical modules for better code organization and reusability.
+"""
+
+from .audio_recorder import (
+    AudioRecorder,
+    create_audio_interface,
+    display_audio_interface,
+    clear_audio_interface_cache
+)
+from .image_utils import (
+    create_test_image, 
+    create_complex_test_image,
+    image_to_base64, 
+    image_to_bytes,
+    analyze_image_with_llamacpp,
+    create_image_analysis_demo,
+    load_external_image,
+    resize_image
+)
+from .grammar_utils import (
+    demonstrate_grammar_constraint, 
+    test_sampling_strategy,
+    get_predefined_grammars,
+    get_sampling_strategies,
+    run_grammar_constraints_demo,
+    run_sampling_strategies_demo,
+    test_structured_output,
+    create_json_grammar
+)
+from .benchmark_utils import (
+    benchmark_performance, 
+    analyze_benchmark_results, 
+    visualize_performance,
+    run_comprehensive_benchmark
+)
+
+__all__ = [
+    # Audio utilities
+    'AudioRecorder',
+    'create_audio_interface',
+    'display_audio_interface',
+    'clear_audio_interface_cache',
+    
+    # Image utilities
+    'create_test_image',
+    'image_to_bytes',
+    'analyze_image_with_llamacpp',
+    
+    # Grammar and sampling utilities
+    'demonstrate_grammar_constraint',
+    'test_sampling_strategy',
+    'get_predefined_grammars',
+    'get_sampling_strategies',
+    
+    # Benchmark utilities
+    'benchmark_performance',
+    'run_comprehensive_benchmark',
+    
+    # Additional utilities (not used in notebook but available)
+    'create_complex_test_image',
+    'image_to_base64',
+    'create_image_analysis_demo',
+    'load_external_image',
+    'resize_image',
+    'run_grammar_constraints_demo',
+    'run_sampling_strategies_demo',
+    'test_structured_output',
+    'create_json_grammar',
+    'analyze_benchmark_results',
+    'visualize_performance',
+]
\ No newline at end of file
diff --git a/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/audio_recorder.py b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/audio_recorder.py
new file mode 100644
index 00000000..65c6724b
--- /dev/null
+++ b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/audio_recorder.py
@@ -0,0 +1,329 @@
+"""
+Audio recording utilities for the LlamaCpp tutorial.
+
+Provides audio recording and speech transcription functionality
+for multimodal AI applications.
+"""
+
+import os
+import base64
+import tempfile
+import threading
+import time
+from typing import Optional
+
+import numpy as np
+import sounddevice as sd
+import soundfile as sf
+import ipywidgets as widgets
+from IPython.display import HTML, display
+
+from strands import Agent
+from strands.models.llamacpp import LlamaCppModel
+
+
+class AudioRecorder:
+    """Audio recorder for speech capture and processing."""
+    
+    def __init__(self, sample_rate: int = 16000):
+        self.sample_rate = sample_rate
+        self.recording: Optional[np.ndarray] = None
+        self.is_recording = False
+        
+    def record(self, duration: int = 5) -> np.ndarray:
+        """Record audio for specified duration."""
+        self.recording = sd.rec(
+            int(duration * self.sample_rate),
+            samplerate=self.sample_rate,
+            channels=1,
+            dtype='float32'
+        )
+        sd.wait()
+        return self.recording
+    
+    def get_audio_bytes(self) -> bytes:
+        """Get audio data as bytes for SDK."""
+        if self.recording is None:
+            raise ValueError("No recording available")
+            
+        with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmp_file:
+            sf.write(tmp_file.name, self.recording, self.sample_rate, format='WAV')
+            tmp_filename = tmp_file.name
+        
+        with open(tmp_filename, 'rb') as f:
+            audio_bytes = f.read()
+        
+        os.unlink(tmp_filename)
+        return audio_bytes
+    
+    def play(self) -> None:
+        """Play the recorded audio."""
+        if self.recording is None:
+            raise ValueError("No recording available")
+        sd.play(self.recording, self.sample_rate)
+        sd.wait()
+
+
+# Global widget cache to prevent duplication
+_audio_interface_cache = {}
+
+def create_audio_interface(recorder: AudioRecorder, base_url: str = "http://localhost:8080") -> dict:
+    """
+    Create an audio recording interface for speech transcription.
+    
+    Args:
+        recorder: AudioRecorder instance
+        base_url: LlamaCpp server URL
+        
+    Returns:
+        Dictionary containing interface widgets and handlers
+    """
+    
+    # Use cached widgets if they exist to prevent duplication
+    cache_key = id(recorder)
+    if cache_key in _audio_interface_cache:
+        cached = _audio_interface_cache[cache_key]
+        # Clear existing outputs
+        cached['widgets']['recording_output'].clear_output()
+        cached['widgets']['analysis_output'].clear_output()
+        cached['widgets']['status_label'].value = "Ready to record"
+        cached['widgets']['play_button'].disabled = True
+        cached['widgets']['analyze_button'].disabled = True
+        cached['widgets']['progress_bar'].value = 0
+        cached['widgets']['progress_bar'].layout.visibility = 'hidden'
+        return cached
+    
+    # Basic controls
+    duration_slider = widgets.IntSlider(
+        value=5,
+        min=1,
+        max=15,
+        step=1,
+        description='Duration (sec):',
+        style={'description_width': '100px'}
+    )
+
+    record_button = widgets.Button(
+        description='Record',
+        button_style='info',
+        layout=widgets.Layout(width='80px')
+    )
+
+    play_button = widgets.Button(
+        description='Play',
+        button_style='success',
+        disabled=True,
+        layout=widgets.Layout(width='80px')
+    )
+
+    analyze_button = widgets.Button(
+        description='Transcribe',
+        button_style='primary',
+        disabled=True,
+        layout=widgets.Layout(width='90px')
+    )
+
+    clear_button = widgets.Button(
+        description='Clear',
+        button_style='warning',
+        layout=widgets.Layout(width='80px')
+    )
+
+    # Status label
+    status_label = widgets.Label(value="Ready to record")
+    
+    # Output areas
+    recording_output = widgets.Output(layout=widgets.Layout(height='50px'))
+    analysis_output = widgets.Output(layout=widgets.Layout(height='200px', overflow='auto'))
+    
+    # Progress bar
+    progress_bar = widgets.IntProgress(
+        value=0,
+        min=0,
+        max=100,
+        description='',
+        bar_style='info',
+        layout=widgets.Layout(width='100%', visibility='hidden')
+    )
+
+    def on_record_click(b):
+        """Handle record button click."""
+        recording_output.clear_output(wait=True)
+        with recording_output:
+            status_label.value = f"Recording for {duration_slider.value} seconds..."
+            progress_bar.layout.visibility = 'visible'
+            progress_bar.value = 0
+            
+            def update_progress():
+                for i in range(duration_slider.value * 10):
+                    time.sleep(0.1)
+                    progress_bar.value = (i + 1) / (duration_slider.value * 10) * 100
+            
+            progress_thread = threading.Thread(target=update_progress)
+            progress_thread.start()
+            
+            recorder.record(duration_slider.value)
+            
+            progress_thread.join()
+            progress_bar.layout.visibility = 'hidden'
+            
+            status_label.value = "Recording ready"
+            play_button.disabled = False
+            analyze_button.disabled = False
+            
+    def on_play_click(b):
+        """Handle play button click."""
+        recording_output.clear_output(wait=True)
+        with recording_output:
+            status_label.value = "Playing audio..."
+            recorder.play()
+            status_label.value = "Recording ready"
+            
+    def on_analyze_click(b):
+        """Handle analyze button click."""
+        analysis_output.clear_output(wait=True)
+        
+        status_label.value = "Transcribing audio..."
+        progress_bar.layout.visibility = 'visible'
+        progress_bar.value = 20
+        
+        try:
+            # Get audio bytes
+            audio_bytes = recorder.get_audio_bytes()
+            progress_bar.value = 40
+            
+            # Create LlamaCpp model
+            clean_base_url = base_url.rstrip('/').replace('/v1', '')
+            model = LlamaCppModel(
+                base_url=clean_base_url,
+                params={"temperature": 0.7, "max_tokens": 300}
+            )
+            agent = Agent(model=model)
+            progress_bar.value = 60
+            
+            # Create message with audio content
+            message_content = [
+                {
+                    "audio": {
+                        "source": {"bytes": audio_bytes},
+                        "format": "wav"
+                    }
+                },
+                {
+                    "text": "Please transcribe exactly what was said in this audio recording. If the speech is in a language other than English, first provide the exact transcription in the original language, then provide an English translation. Format your response as:\n1. Original transcription: [exact words spoken]\n2. Language detected: [language name]\n3. English translation: [translation if needed, or 'Already in English']"
+                }
+            ]
+            
+            progress_bar.value = 80
+            response = agent(message_content)
+            progress_bar.value = 100
+            
+            # Extract and display response
+            with analysis_output:
+                if hasattr(response, 'message') and 'content' in response.message:
+                    full_response = ""
+                    for content_block in response.message['content']:
+                        if 'text' in content_block:
+                            full_response += content_block['text']
+                    display(HTML(f'<pre style="white-space: pre-wrap; padding: 10px; background: #f5f5f5; border-radius: 5px;">{full_response}</pre>'))
+                else:
+                    display(HTML(f'<pre style="white-space: pre-wrap; padding: 10px; background: #f5f5f5; border-radius: 5px;">{str(response)}</pre>'))
+            
+            status_label.value = "Transcription complete"
+            
+        except Exception as e:
+            with analysis_output:
+                display(HTML(f'<div style="color: red; padding: 10px;">Error: {str(e)}</div>'))
+            status_label.value = "Error occurred"
+        finally:
+            progress_bar.layout.visibility = 'hidden'
+
+    def on_clear_click(b):
+        """Handle clear button click."""
+        recording_output.clear_output(wait=True)
+        analysis_output.clear_output(wait=True)
+        progress_bar.layout.visibility = 'hidden'
+        progress_bar.value = 0
+        status_label.value = "Ready to record"
+        play_button.disabled = True
+        analyze_button.disabled = True
+
+    # Register event handlers
+    record_button.on_click(on_record_click)
+    play_button.on_click(on_play_click)
+    analyze_button.on_click(on_analyze_click)
+    clear_button.on_click(on_clear_click)
+
+    # Create interface dictionary
+    interface = {
+        'widgets': {
+            'duration_slider': duration_slider,
+            'record_button': record_button,
+            'play_button': play_button,
+            'analyze_button': analyze_button,
+            'clear_button': clear_button,
+            'status_label': status_label,
+            'progress_bar': progress_bar,
+            'recording_output': recording_output,
+            'analysis_output': analysis_output,
+        },
+        'handlers': {
+            'on_record_click': on_record_click,
+            'on_play_click': on_play_click,
+            'on_analyze_click': on_analyze_click,
+            'on_clear_click': on_clear_click,
+        }
+    }
+    
+    # Cache the interface
+    _audio_interface_cache[cache_key] = interface
+    
+    return interface
+
+
+def clear_audio_interface_cache():
+    """Clear the audio interface cache to force recreation of widgets."""
+    global _audio_interface_cache
+    _audio_interface_cache.clear()
+
+
+def display_audio_interface(interface_components: dict) -> None:
+    """
+    Display the audio recording interface.
+    
+    Args:
+        interface_components: Dictionary from create_audio_interface
+    """
+    widgets_dict = interface_components['widgets']
+    
+    # Clear any existing outputs first
+    from IPython.display import clear_output
+    clear_output(wait=True)
+    
+    # Header
+    display(HTML('<h4>Speech Recognition & Translation</h4>'))
+
+    # Controls in a simple layout
+    display(widgets.HBox([
+        widgets.VBox([
+            widgets_dict['duration_slider'],
+            widgets_dict['status_label']
+        ]),
+        widgets.VBox([
+            widgets.HBox([
+                widgets_dict['record_button'], 
+                widgets_dict['play_button'], 
+                widgets_dict['analyze_button'], 
+                widgets_dict['clear_button']
+            ])
+        ])
+    ]))
+    
+    display(widgets_dict['progress_bar'])
+    
+    # Output sections
+    display(HTML('<h4>Recording Status</h4>'))
+    display(widgets_dict['recording_output'])
+    
+    display(HTML('<h4>Analysis Results</h4>'))
+    display(widgets_dict['analysis_output'])
\ No newline at end of file
diff --git a/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/benchmark_utils.py b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/benchmark_utils.py
new file mode 100644
index 00000000..13305f10
--- /dev/null
+++ b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/benchmark_utils.py
@@ -0,0 +1,364 @@
+"""
+Benchmark utilities for the LlamaCpp demo notebook.
+
+This module contains functions for performance testing, analysis,
+and visualization of different LlamaCpp configurations.
+"""
+
+import time
+from typing import Dict, List, Any
+
+from strands import Agent
+from strands.models.llamacpp import LlamaCppModel
+
+
+def benchmark_performance(prompt: str = "Explain quantum computing in simple terms.",
+                         base_url: str = "http://localhost:8080",
+                         runs_per_config: int = 5) -> List[Dict[str, Any]]:
+    """
+    Benchmark different performance settings with detailed analysis.
+    
+    Args:
+        prompt: Prompt to use for benchmarking
+        base_url: Base URL for the LlamaCpp server  
+        runs_per_config: Number of runs per configuration for averaging
+    
+    Returns:
+        List of benchmark results with comprehensive statistics
+    """
+    configs = [
+        {
+            "name": "Default (no optimization)",
+            "params": {
+                "temperature": 0.7,
+                "max_tokens": 150
+            }
+        },
+        {
+            "name": "With prompt caching",
+            "params": {
+                "temperature": 0.7,
+                "max_tokens": 150,
+                "cache_prompt": True
+            }
+        },
+        {
+            "name": "Optimized sampling",
+            "params": {
+                "temperature": 0.7,
+                "max_tokens": 150,
+                "cache_prompt": True,
+                "top_k": 30,
+                "min_p": 0.05
+            }
+        },
+        {
+            "name": "Aggressive optimization",
+            "params": {
+                "temperature": 0.7,
+                "max_tokens": 150,
+                "cache_prompt": True,
+                "top_k": 20,
+                "min_p": 0.1,
+                "repeat_penalty": 1.1,
+                "n_probs": 0  # Don't compute token probabilities
+            }
+        }
+    ]
+    
+    results = []
+    
+    print("Performance Benchmark")
+    print("=" * 80)
+    print(f"Prompt: {prompt}")
+    print(f"Runs per config: {runs_per_config}")
+    print("=" * 80)
+    
+    for config in configs:
+        print(f"\nTesting: {config['name']}")
+        print("-" * 40)
+        
+        # Ensure base_url doesn't have /v1 suffix to avoid double /v1 in URL  
+        clean_base_url = base_url.rstrip('/').replace('/v1', '')
+        model = LlamaCppModel(
+            base_url=clean_base_url,
+            params=config['params']
+        )
+        agent = Agent(model=model)
+        
+        # Warm-up run (not counted)
+        _ = agent(prompt)
+        
+        # Actual benchmark runs
+        times = []
+        responses = []
+        tokens_per_sec = []
+        
+        for i in range(runs_per_config):
+            start = time.time()
+            response = agent(prompt)
+            elapsed = time.time() - start
+            times.append(elapsed)
+            
+            # Extract text from response
+            if hasattr(response, 'message') and 'content' in response.message:
+                text_content = ""
+                for content_block in response.message['content']:
+                    if 'text' in content_block:
+                        text_content += content_block['text']
+                response_text = text_content.strip()
+            else:
+                response_text = str(response)
+            
+            responses.append(response_text)
+            
+            # Estimate tokens (rough approximation)
+            token_count = len(response_text.split())
+            tps = token_count / elapsed if elapsed > 0 else 0
+            tokens_per_sec.append(tps)
+            
+            print(f"  Run {i+1}: {elapsed:.2f}s ({tps:.1f} tokens/s)")
+        
+        # Calculate statistics
+        avg_time = sum(times) / len(times)
+        min_time = min(times)
+        max_time = max(times)
+        std_dev = (sum((t - avg_time) ** 2 for t in times) / len(times)) ** 0.5
+        avg_tps = sum(tokens_per_sec) / len(tokens_per_sec)
+        
+        result = {
+            "config": config['name'],
+            "avg_time": avg_time,
+            "min_time": min_time,
+            "max_time": max_time,
+            "std_dev": std_dev,
+            "times": times,
+            "avg_tokens_per_sec": avg_tps,
+            "params": config['params'],
+            "responses": responses,
+            "success_rate": 1.0
+        }
+        results.append(result)
+        
+        print(f"  Average: {avg_time:.2f}s ± {std_dev:.2f}s")
+        print(f"  Range: [{min_time:.2f}s - {max_time:.2f}s]")
+        print(f"  Avg tokens/s: {avg_tps:.1f}")
+    
+    print("\n" + "=" * 80)
+    return results
+
+
+def analyze_benchmark_results(results: List[Dict[str, Any]]) -> Dict[str, Any]:
+    """
+    Analyze benchmark results and provide performance insights.
+    
+    Args:
+        results: List of benchmark results from benchmark_performance()
+    
+    Returns:
+        Dictionary containing analysis results and recommendations
+    """
+    if not results:
+        return {"error": "No valid benchmark results to analyze"}
+    
+    print("Performance Analysis")
+    print("=" * 80)
+    
+    # Find baseline (default config)
+    baseline = None
+    for r in results:
+        if "Default" in r['config']:
+            baseline = r
+            break
+    
+    if not baseline:
+        baseline = results[0]  # Use first result as baseline
+    
+    baseline_time = baseline['avg_time']
+    
+    # Performance comparison table
+    print("\nPerformance Comparison:")
+    print("-" * 80)
+    print(f"{'Configuration':<35} {'Avg Time':<12} {'Speedup':<10} {'Tokens/s':<12} {'Std Dev':<10}")
+    print("-" * 80)
+    
+    for result in results:
+        speedup = baseline_time / result['avg_time'] if result['avg_time'] > 0 else 0
+        print(f"{result['config']:<35} {result['avg_time']:.2f}s{'':<6} "
+              f"{speedup:.2f}x{'':<5} {result['avg_tokens_per_sec']:.1f}{'':<8} "
+              f"±{result['std_dev']:.3f}s")
+    
+    # Find best configurations
+    fastest = min(results, key=lambda x: x['avg_time']) if results else None
+    most_consistent = min(results, key=lambda x: x['std_dev']) if results else None
+    
+    # Key findings
+    print("\n" + "=" * 80)
+    print("Key Findings:")
+    print("-" * 80)
+    
+    if fastest:
+        print(f"Fastest configuration: {fastest['config']}")
+        print(f"   - Average time: {fastest['avg_time']:.2f}s")
+        print(f"   - {(baseline_time / fastest['avg_time'] - 1) * 100:.1f}% faster than baseline")
+        print(f"   - Tokens/s: {fastest['avg_tokens_per_sec']:.1f}")
+    
+    if most_consistent:
+        print(f"\nMost consistent (lowest variance): {most_consistent['config']}")
+        print(f"   - Standard deviation: ±{most_consistent['std_dev']:.3f}s")
+        print(f"   - Success rate: {most_consistent['success_rate']:.1%}")
+    
+    # Parameter impact analysis
+    print("\nParameter Impact Analysis:")
+    for result in results:
+        if result['config'] != baseline['config']:
+            params = result['params']
+            key_params = []
+            if params.get('cache_prompt'):
+                key_params.append("prompt caching")
+            if params.get('top_k') and params['top_k'] < 40:
+                key_params.append(f"top_k={params['top_k']}")
+            if params.get('min_p'):
+                key_params.append(f"min_p={params['min_p']}")
+            if params.get('n_probs') == 0:
+                key_params.append("no prob computation")
+                
+            if key_params:
+                impact = (baseline_time - result['avg_time']) / baseline_time * 100
+                print(f"   - {', '.join(key_params)}: {impact:+.1f}% performance")
+    
+    # Recommendations
+    print("\n" + "=" * 80)
+    print("Recommendations:")
+    print("-" * 80)
+    print("1. Enable prompt caching for repeated queries (cache_prompt=True)")
+    print("2. Use top_k=20-30 with min_p=0.05-0.1 for balanced quality/speed")
+    print("3. Disable probability computation (n_probs=0) if not needed")
+    print("4. Consider the trade-off between speed and output quality")
+    print("5. Test configurations with your specific use case and model")
+    
+    analysis_results = {
+        "baseline": baseline,
+        "fastest": fastest,
+        "most_consistent": most_consistent,
+        "results": results,
+        "recommendations": [
+            "Enable prompt caching for repeated queries",
+            "Use balanced sampling parameters (top_k=20-30, min_p=0.05-0.1)",
+            "Disable probability computation if not needed",
+            "Test configurations with your specific workload"
+        ]
+    }
+    
+    return analysis_results
+
+
+def visualize_performance(results: List[Dict[str, Any]]) -> None:
+    """
+    Create text-based visualizations for performance comparison.
+    
+    Args:
+        results: List of benchmark results from benchmark_performance()
+    """
+    if not results:
+        print("No results to visualize")
+        return
+    
+    print("\nPerformance Visualization")
+    print("=" * 80)
+    
+    # Find max values for scaling
+    max_time = max(r['avg_time'] for r in results)
+    max_tps = max(r['avg_tokens_per_sec'] for r in results)
+    bar_width = 50
+    
+    # Average response time chart
+    print("\nAverage Response Time (lower is better):")
+    print("-" * 80)
+    
+    for result in results:
+        # Calculate bar length
+        bar_length = int((result['avg_time'] / max_time) * bar_width)
+        bar = '█' * bar_length
+        
+        # Format time and config name
+        time_str = f"{result['avg_time']:.2f}s"
+        config_name = result['config'][:25] + "..." if len(result['config']) > 25 else result['config']
+        
+        print(f"{config_name:<30} {bar:<{bar_width}} {time_str}")
+    
+    print("-" * 80)
+    
+    # Tokens per second comparison
+    print("\nTokens per Second (higher is better):")
+    print("-" * 80)
+    
+    for result in results:
+        # Calculate bar length  
+        bar_length = int((result['avg_tokens_per_sec'] / max_tps) * bar_width) if max_tps > 0 else 0
+        bar = '▓' * bar_length
+        
+        # Format tokens/s and config name
+        tps_str = f"{result['avg_tokens_per_sec']:.1f} tokens/s"
+        config_name = result['config'][:25] + "..." if len(result['config']) > 25 else result['config']
+        
+        print(f"{config_name:<30} {bar:<{bar_width}} {tps_str}")
+    
+    print("-" * 80)
+    
+    # Consistency comparison (standard deviation)
+    print("\nConsistency (lower standard deviation is better):")
+    print("-" * 80)
+    
+    max_std = max(r['std_dev'] for r in results)
+    
+    for result in results:
+        # Calculate bar length (inverted - shorter bars are better)
+        bar_length = int((result['std_dev'] / max_std) * bar_width) if max_std > 0 else 0
+        bar = '░' * bar_length
+        
+        # Format std dev and config name
+        std_str = f"±{result['std_dev']:.3f}s"
+        config_name = result['config'][:25] + "..." if len(result['config']) > 25 else result['config']
+        
+        print(f"{config_name:<30} {bar:<{bar_width}} {std_str}")
+    
+    print("-" * 80)
+
+
+def run_comprehensive_benchmark(base_url: str = "http://localhost:8080") -> Dict[str, Any]:
+    """
+    Run a comprehensive benchmark suite and analysis.
+    
+    Args:
+        base_url: Base URL for the LlamaCpp server
+    
+    Returns:
+        Complete benchmark results and analysis
+    """
+    print("Comprehensive LlamaCpp Performance Benchmark")
+    print("=" * 90)
+    
+    # Run benchmark
+    results = benchmark_performance(base_url=base_url)
+    
+    if not results:
+        print("No valid benchmark results obtained")
+        return {"error": "Benchmark failed"}
+    
+    # Analyze results
+    analysis = analyze_benchmark_results(results)
+    
+    # Visualize results
+    visualize_performance(results)
+    
+    return {
+        "benchmark_results": results,
+        "analysis": analysis,
+        "summary": {
+            "total_configs_tested": len(results),
+            "fastest_config": analysis.get("fastest", {}).get("config", "Unknown"),
+            "best_speedup": f"{(analysis['baseline']['avg_time'] / analysis['fastest']['avg_time']):.2f}x" if analysis.get("fastest") else "N/A",
+            "recommendations": analysis.get("recommendations", [])
+        }
+    }
\ No newline at end of file
diff --git a/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/grammar_utils.py b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/grammar_utils.py
new file mode 100644
index 00000000..d3b6007d
--- /dev/null
+++ b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/grammar_utils.py
@@ -0,0 +1,367 @@
+"""
+Grammar and sampling utilities for the LlamaCpp demo notebook.
+
+This module contains functions for demonstrating grammar constraints,
+testing different sampling strategies, and working with structured output.
+"""
+
+import json
+import time
+from typing import Dict, List, Any, Type
+
+from pydantic import BaseModel
+from strands import Agent
+from strands.models.llamacpp import LlamaCppModel
+
+
+def demonstrate_grammar_constraint(grammar: str, prompt: str, description: str,
+                                 base_url: str = "http://localhost:8080",
+                                 temperature: float = 0.1,
+                                 max_tokens: int = 50) -> str:
+    """
+    Demonstrate a specific grammar constraint with the LlamaCpp model.
+    
+    Args:
+        grammar: GBNF grammar string defining allowed outputs
+        prompt: Input prompt for the model
+        description: Human-readable description of the grammar constraint
+        base_url: Base URL for the LlamaCpp server
+        temperature: Sampling temperature
+        max_tokens: Maximum tokens to generate
+    
+    Returns:
+        Model response constrained by the grammar
+    """
+    print("=" * 60)
+    print(f"{description}")
+    print(f"Grammar: {grammar}")
+    print(f"Prompt: {prompt}")
+    print("-" * 60)
+    
+    # Create model with grammar constraint
+    # Ensure base_url doesn't have /v1 suffix to avoid double /v1 in URL
+    clean_base_url = base_url.rstrip('/').replace('/v1', '')
+    model = LlamaCppModel(
+        base_url=clean_base_url,
+        params={"temperature": temperature, "max_tokens": max_tokens}
+    )
+    
+    model.use_grammar_constraint(grammar)
+    agent = Agent(model=model)
+    
+    response = agent(prompt)
+    
+    # Extract text from response
+    if hasattr(response, 'message') and 'content' in response.message:
+        text_content = ""
+        for content_block in response.message['content']:
+            if 'text' in content_block:
+                text_content += content_block['text']
+        response_text = text_content.strip()
+    else:
+        response_text = str(response)
+    
+    print(f"Response: {response_text}")
+    print("=" * 60)
+    return response_text
+
+
+def get_predefined_grammars() -> Dict[str, Dict[str, str]]:
+    """
+    Get a collection of predefined GBNF grammars for common use cases.
+    
+    Returns:
+        Dictionary mapping grammar names to grammar definitions and descriptions
+    """
+    return {
+        "yes_no": {
+            "grammar": 'root ::= "yes" | "no" | "Yes" | "No"',
+            "description": "Yes/No responses only",
+            "example_prompt": "Is Python a compiled language?"
+        },
+        "number_1_10": {
+            "grammar": 'root ::= [1-9] | "10"',
+            "description": "Numbers from 1 to 10",
+            "example_prompt": "On a scale of 1-10, how useful is machine learning?"
+        },
+        "multiple_choice": {
+            "grammar": 'root ::= "A" | "B" | "C" | "D"',
+            "description": "Multiple choice answers (A-D)",
+            "example_prompt": "What is 2+2? A) 3, B) 4, C) 5, D) 6. Answer:"
+        },
+        "simple_json": {
+            "grammar": '''root ::= "{" ws "\\"name\\"" ws ":" ws string ws "," ws "\\"age\\"" ws ":" ws number ws "}"
+string ::= "\\"" [^"]* "\\""
+number ::= [0-9]+
+ws ::= [ \\t\\n]*''',
+            "description": "Simple JSON with name and age",
+            "example_prompt": "Generate a person with name and age in JSON:"
+        },
+        "color_names": {
+            "grammar": 'root ::= "red" | "blue" | "green" | "yellow" | "purple" | "orange" | "black" | "white"',
+            "description": "Basic color names only",
+            "example_prompt": "What color is the sky?"
+        },
+        "email_format": {
+            "grammar": '''root ::= username "@" domain "." tld
+username ::= [a-zA-Z0-9_]+ 
+domain ::= [a-zA-Z0-9]+ 
+tld ::= "com" | "org" | "net" | "edu"''',
+            "description": "Simple email format",
+            "example_prompt": "Generate a sample email address:"
+        }
+    }
+
+
+def test_sampling_strategy(params: Dict[str, Any], name: str, prompt: str,
+                          base_url: str = "http://localhost:8080") -> tuple[str, float]:
+    """
+    Test a specific sampling strategy with the LlamaCpp model.
+    
+    Args:
+        params: Dictionary of sampling parameters
+        name: Human-readable name for the strategy
+        prompt: Input prompt for testing
+        base_url: Base URL for the LlamaCpp server
+    
+    Returns:
+        Tuple of (response_text, elapsed_time)
+    """
+    print("\n" + "="*60)
+    print(f"{name}")
+    print(f"Parameters: {json.dumps(params, indent=2)}")
+    print("-" * 60)
+    
+    # Ensure base_url doesn't have /v1 suffix to avoid double /v1 in URL
+    clean_base_url = base_url.rstrip('/').replace('/v1', '')
+    model = LlamaCppModel(
+        base_url=clean_base_url,
+        params={**params, "max_tokens": 100}
+    )
+    
+    agent = Agent(model=model)
+    
+    start_time = time.time()
+    response = agent(prompt)
+    elapsed = time.time() - start_time
+    
+    # Extract text from response
+    if hasattr(response, 'message') and 'content' in response.message:
+        text_content = ""
+        for content_block in response.message['content']:
+            if 'text' in content_block:
+                text_content += content_block['text']
+        response_text = text_content.strip()
+    else:
+        response_text = str(response)
+    
+    # Truncate long responses for display
+    display_response = response_text[:200] + "..." if len(response_text) > 200 else response_text
+    print(f"Response: {display_response}")
+    print(f"\nTime: {elapsed:.2f}s")
+    print("=" * 60)
+    
+    return response_text, elapsed
+
+
+def get_sampling_strategies() -> List[Dict[str, Any]]:
+    """
+    Get a collection of predefined sampling strategies for testing.
+    
+    Returns:
+        List of dictionaries containing strategy configurations
+    """
+    return [
+        {
+            "name": "Conservative (low temp, high quality)",
+            "params": {
+                "temperature": 0.3,
+                "top_k": 10,
+                "repeat_penalty": 1.2
+            }
+        },
+        {
+            "name": "Mirostat 2 (perplexity control)",
+            "params": {
+                "temperature": 0.7,
+                "mirostat": 2,
+                "mirostat_lr": 0.1,
+                "mirostat_ent": 5.0
+            }
+        },
+        {
+            "name": "Top-k + Min-p (quality filtering)",
+            "params": {
+                "temperature": 0.7,
+                "top_k": 40,
+                "min_p": 0.05
+            }
+        },
+        {
+            "name": "TFS + Typical (tail-free sampling)",
+            "params": {
+                "temperature": 0.7,
+                "tfs_z": 0.95,
+                "typical_p": 0.95
+            }
+        },
+        {
+            "name": "Creative (high temperature)",
+            "params": {
+                "temperature": 1.0,
+                "top_k": 50,
+                "top_p": 0.9
+            }
+        }
+    ]
+
+
+def test_structured_output(output_model: Type[BaseModel], prompt: str,
+                          base_url: str = "http://localhost:8080",
+                          temperature: float = 0.5,
+                          max_tokens: int = 300) -> BaseModel:
+    """
+    Test structured output generation with a Pydantic model.
+    
+    Args:
+        output_model: Pydantic model class defining the expected structure
+        prompt: Input prompt for generation
+        base_url: Base URL for the LlamaCpp server
+        temperature: Sampling temperature
+        max_tokens: Maximum tokens to generate
+    
+    Returns:
+        Instance of the output_model with generated data
+    """
+    # Ensure base_url doesn't have /v1 suffix to avoid double /v1 in URL
+    clean_base_url = base_url.rstrip('/').replace('/v1', '')
+    model = LlamaCppModel(
+        base_url=clean_base_url,
+        params={
+            "temperature": temperature,
+            "max_tokens": max_tokens
+        }
+    )
+    
+    agent = Agent(model=model)
+    
+    # Generate structured output
+    result = agent.structured_output(output_model, prompt)
+    return result
+
+
+def run_grammar_constraints_demo(base_url: str = "http://localhost:8080") -> None:
+    """
+    Run a comprehensive demonstration of grammar constraints.
+    
+    Args:
+        base_url: Base URL for the LlamaCpp server
+    """
+    print("Grammar Constraints Demonstration")
+    print("=" * 80)
+    
+    grammars = get_predefined_grammars()
+    
+    for grammar_name, config in grammars.items():
+        try:
+            demonstrate_grammar_constraint(
+                grammar=config["grammar"],
+                prompt=config["example_prompt"],
+                description=config["description"],
+                base_url=base_url
+            )
+        except Exception as e:
+            print(f"Error testing {grammar_name}: {e}")
+            print("=" * 60)
+        
+        print()  # Add spacing between tests
+
+
+def run_sampling_strategies_demo(prompt: str = "Write a creative story opening about a mysterious door:",
+                               base_url: str = "http://localhost:8080") -> List[Dict[str, Any]]:
+    """
+    Run a comprehensive demonstration of sampling strategies.
+    
+    Args:
+        prompt: Prompt to use for testing all strategies
+        base_url: Base URL for the LlamaCpp server
+    
+    Returns:
+        List of results with response and timing information
+    """
+    print("🎲 Sampling Strategies Demonstration")
+    print("=" * 80)
+    print(f"Test prompt: {prompt}")
+    print("=" * 80)
+    
+    strategies = get_sampling_strategies()
+    results = []
+    
+    for strategy in strategies:
+        try:
+            response, elapsed = test_sampling_strategy(
+                strategy["params"], 
+                strategy["name"], 
+                prompt,
+                base_url=base_url
+            )
+            results.append({
+                "name": strategy["name"],
+                "response": response,
+                "time": elapsed,
+                "params": strategy["params"]
+            })
+        except Exception as e:
+            print(f"Error testing {strategy['name']}: {e}")
+            print("=" * 60)
+    
+    return results
+
+
+def create_json_grammar(schema: Dict[str, Any]) -> str:
+    """
+    Create a GBNF grammar from a JSON schema (simplified implementation).
+    
+    Args:
+        schema: JSON schema dictionary
+    
+    Returns:
+        GBNF grammar string
+    
+    Note:
+        This is a simplified implementation. For production use,
+        consider using llama.cpp's built-in JSON schema support.
+    """
+    # This is a basic implementation for common cases
+    if schema.get("type") == "object":
+        properties = schema.get("properties", {})
+        required = schema.get("required", [])
+        
+        # Build simple object grammar
+        pairs = []
+        for prop_name, prop_schema in properties.items():
+            if prop_schema.get("type") == "string":
+                pairs.append(f'"\"{prop_name}\"" ws ":" ws string')
+            elif prop_schema.get("type") == "integer":
+                pairs.append(f'"\"{prop_name}\"" ws ":" ws number')
+            elif prop_schema.get("type") == "boolean":
+                pairs.append(f'"\"{prop_name}\"" ws ":" ws boolean')
+        
+        if pairs:
+            pairs_rule = " ws \",\" ws ".join(pairs)
+            return f'''root ::= "{{" ws {pairs_rule} ws "}}"
+string ::= "\\"" [^"]* "\\""
+number ::= "-"? [0-9]+
+boolean ::= "true" | "false" 
+ws ::= [ \\t\\n]*'''
+    
+    # Fallback to generic JSON grammar
+    return '''root ::= object
+object ::= "{" pair ("," pair)* "}"
+pair ::= string ":" value
+string ::= "\\"" [^"]* "\\""
+value ::= string | number | boolean | "null" | array | object
+array ::= "[" (value ("," value)*)? "]"
+number ::= "-"? [0-9]+ ("." [0-9]+)?
+boolean ::= "true" | "false"
+ws ::= [ \\t\\n]*'''
\ No newline at end of file
diff --git a/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/image_utils.py b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/image_utils.py
new file mode 100644
index 00000000..079b09f4
--- /dev/null
+++ b/01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/image_utils.py
@@ -0,0 +1,369 @@
+"""
+Image utilities for the LlamaCpp demo notebook.
+
+This module contains functions for creating test images, converting images
+to various formats, and analyzing images with the LlamaCpp model.
+"""
+
+import base64
+import io
+from typing import Tuple, Union
+
+from PIL import Image, ImageDraw, ImageFont
+from IPython.display import display
+
+from strands import Agent  
+from strands.models.llamacpp import LlamaCppModel
+
+
+def create_test_image(size: Tuple[int, int] = (400, 300), 
+                     background_color: str = 'lightblue') -> Image.Image:
+    """
+    Create a simple test image for demonstration purposes.
+    
+    This function creates an image with basic geometric shapes (rectangle,
+    circle, triangle) and text for testing multimodal capabilities.
+    
+    Args:
+        size: Tuple of (width, height) for the image
+        background_color: Background color for the image
+        
+    Returns:
+        PIL Image object with test shapes and text
+    """
+    width, height = size
+    img = Image.new('RGB', size, color=background_color)
+    draw = ImageDraw.Draw(img)
+    
+    # Calculate positions based on image size
+    rect_size = min(width, height) // 4
+    circle_size = rect_size
+    
+    # Draw rectangle (red)
+    rect_x = width // 8
+    rect_y = height // 6
+    draw.rectangle([
+        rect_x, rect_y, 
+        rect_x + rect_size, rect_y + rect_size
+    ], fill='red', outline='black', width=3)
+    
+    # Draw ellipse/circle (yellow)
+    circle_x = width // 2
+    circle_y = rect_y
+    draw.ellipse([
+        circle_x, circle_y,
+        circle_x + circle_size * 1.5, circle_y + circle_size * 1.5
+    ], fill='yellow', outline='black', width=3)
+    
+    # Draw triangle (green)
+    triangle_base = width // 4
+    triangle_height = height // 4
+    triangle_x = rect_x + rect_size // 2
+    triangle_y = height * 2 // 3
+    
+    draw.polygon([
+        (triangle_x, triangle_y),
+        (triangle_x + triangle_base // 2, triangle_y + triangle_height),
+        (triangle_x - triangle_base // 2, triangle_y + triangle_height)
+    ], fill='green', outline='black', width=3)
+    
+    # Add text
+    try:
+        # Try to use a default font
+        font = ImageFont.load_default()
+    except:
+        font = None
+    
+    text_x = width // 2 - 40
+    text_y = height - 40
+    draw.text((text_x, text_y), "Test Image", fill='black', font=font)
+    
+    return img
+
+
+def create_complex_test_image(size: Tuple[int, int] = (600, 400)) -> Image.Image:
+    """
+    Create a more complex test image with various elements.
+    
+    Args:
+        size: Tuple of (width, height) for the image
+        
+    Returns:
+        PIL Image with complex scene for advanced testing
+    """
+    width, height = size
+    img = Image.new('RGB', size, color='white')
+    draw = ImageDraw.Draw(img)
+    
+    # Draw gradient background
+    for y in range(height):
+        color_value = int(255 * (1 - y / height * 0.3))
+        draw.line([(0, y), (width, y)], fill=(color_value, color_value + 20, 255))
+    
+    # Draw house
+    house_width = width // 3
+    house_height = height // 2
+    house_x = width // 6
+    house_y = height // 2
+    
+    # House base
+    draw.rectangle([
+        house_x, house_y,
+        house_x + house_width, house_y + house_height
+    ], fill='brown', outline='black', width=2)
+    
+    # Roof
+    roof_points = [
+        (house_x, house_y),
+        (house_x + house_width // 2, house_y - house_height // 3),
+        (house_x + house_width, house_y)
+    ]
+    draw.polygon(roof_points, fill='red', outline='black', width=2)
+    
+    # Door
+    door_width = house_width // 4
+    door_height = house_height // 2
+    door_x = house_x + house_width // 2 - door_width // 2
+    door_y = house_y + house_height - door_height
+    draw.rectangle([
+        door_x, door_y,
+        door_x + door_width, door_y + door_height
+    ], fill='darkred', outline='black', width=2)
+    
+    # Windows
+    window_size = house_width // 6
+    window_y = house_y + house_height // 4
+    
+    # Left window
+    draw.rectangle([
+        house_x + house_width // 6, window_y,
+        house_x + house_width // 6 + window_size, window_y + window_size
+    ], fill='lightblue', outline='black', width=2)
+    
+    # Right window
+    draw.rectangle([
+        house_x + house_width * 5 // 6 - window_size, window_y,
+        house_x + house_width * 5 // 6, window_y + window_size
+    ], fill='lightblue', outline='black', width=2)
+    
+    # Sun
+    sun_x = width * 3 // 4
+    sun_y = height // 4
+    sun_radius = width // 12
+    draw.ellipse([
+        sun_x - sun_radius, sun_y - sun_radius,
+        sun_x + sun_radius, sun_y + sun_radius
+    ], fill='yellow', outline='orange', width=3)
+    
+    # Sun rays
+    for angle in range(0, 360, 45):
+        import math
+        rad = math.radians(angle)
+        start_x = sun_x + int(sun_radius * 1.2 * math.cos(rad))
+        start_y = sun_y + int(sun_radius * 1.2 * math.sin(rad))
+        end_x = sun_x + int(sun_radius * 1.6 * math.cos(rad))
+        end_y = sun_y + int(sun_radius * 1.6 * math.sin(rad))
+        draw.line([(start_x, start_y), (end_x, end_y)], fill='orange', width=3)
+    
+    # Clouds
+    cloud_centers = [(width // 8, height // 8), (width * 7 // 8, height // 6)]
+    for cx, cy in cloud_centers:
+        for dx, dy, size in [(-20, 0, 15), (0, -5, 18), (20, 0, 15), (10, 5, 12)]:
+            draw.ellipse([
+                cx + dx - size, cy + dy - size,
+                cx + dx + size, cy + dy + size
+            ], fill='white', outline='lightgray')
+    
+    # Add title
+    try:
+        font = ImageFont.load_default()
+    except:
+        font = None
+    
+    draw.text((width // 2 - 60, height - 30), "Complex Scene", 
+              fill='black', font=font)
+    
+    return img
+
+
+def image_to_base64(img: Image.Image, format: str = 'PNG') -> str:
+    """
+    Convert PIL image to base64 string.
+    
+    Args:
+        img: PIL Image object
+        format: Image format for encoding ('PNG', 'JPEG', etc.)
+        
+    Returns:
+        Base64 encoded image string
+    """
+    buffer = io.BytesIO()
+    img.save(buffer, format=format)
+    img_base64 = base64.b64encode(buffer.getvalue()).decode('utf-8')
+    return img_base64
+
+
+def image_to_bytes(img: Image.Image, format: str = 'PNG') -> bytes:
+    """
+    Convert PIL image to bytes for direct use with Strands SDK.
+    
+    Args:
+        img: PIL Image object
+        format: Image format for encoding ('PNG', 'JPEG', etc.')
+        
+    Returns:
+        Image data as bytes
+    """
+    buffer = io.BytesIO()
+    img.save(buffer, format=format)
+    return buffer.getvalue()
+
+
+def analyze_image_with_llamacpp(image: Union[Image.Image, bytes], 
+                               prompt: str = "Describe this image in detail.",
+                               base_url: str = "http://localhost:8080",
+                               temperature: float = 0.7,
+                               max_tokens: int = 200) -> str:
+    """
+    Analyze an image using the LlamaCpp model with multimodal capabilities.
+    
+    Args:
+        image: PIL Image object or image bytes
+        prompt: Text prompt for the analysis
+        base_url: Base URL for the LlamaCpp server
+        temperature: Sampling temperature
+        max_tokens: Maximum tokens to generate
+        
+    Returns:
+        Analysis text from the model
+        
+    Raises:
+        Exception: If analysis fails
+    """
+    # Convert image to bytes if needed
+    if isinstance(image, Image.Image):
+        img_bytes = image_to_bytes(image, 'PNG')
+        img_format = 'png'
+    else:
+        img_bytes = image
+        img_format = 'png'  # Assume PNG if bytes provided
+    
+    # Create LlamaCpp model
+    # Ensure base_url doesn't have /v1 suffix to avoid double /v1 in URL
+    clean_base_url = base_url.rstrip('/').replace('/v1', '')
+    model = LlamaCppModel(
+        base_url=clean_base_url,
+        params={"temperature": temperature, "max_tokens": max_tokens}
+    )
+    agent = Agent(model=model)
+    
+    # Create message with mixed content (text + image)
+    message_content = [
+        {"text": prompt},
+        {
+            "image": {
+                "source": {"bytes": img_bytes},
+                "format": img_format
+            }
+        }
+    ]
+    
+    # Get response
+    response = agent(message_content)
+    
+    # Extract text content from AgentResult
+    if hasattr(response, 'message') and 'content' in response.message:
+        text_content = ""
+        for content_block in response.message['content']:
+            if 'text' in content_block:
+                text_content += content_block['text']
+        return text_content.strip()
+    else:
+        return str(response)
+
+
+def create_image_analysis_demo(base_url: str = "http://localhost:8080") -> None:
+    """
+    Create and run a complete image analysis demonstration.
+    
+    Args:
+        base_url: Base URL for the LlamaCpp server
+    """
+    print("🖼️ Image Analysis Demo")
+    print("=" * 60)
+    
+    # Create and display test image
+    test_image = create_test_image()
+    print("Created simple test image:")
+    display(test_image)
+    
+    # Analyze simple image
+    print("-" * 40)
+    try:
+        analysis = analyze_image_with_llamacpp(
+            test_image, 
+            "Describe this image in detail. What shapes and colors do you see?",
+            base_url=base_url
+        )
+        print(f"Analysis: {analysis}")
+    except Exception as e:
+        print(f"Error analyzing simple image: {e}")
+    
+    print("\n" + "=" * 60)
+    
+    # Create and display complex image
+    complex_image = create_complex_test_image()
+    print("Created complex test image:")
+    display(complex_image)
+    
+    # Analyze complex image
+    print("-" * 40)
+    try:
+        complex_analysis = analyze_image_with_llamacpp(
+            complex_image,
+            "Describe this scene in detail. What objects do you see and how are they arranged?",
+            base_url=base_url,
+            max_tokens=300
+        )
+        print(f"Analysis: {complex_analysis}")
+    except Exception as e:
+        print(f"Error analyzing complex image: {e}")
+    
+    print("=" * 60)
+
+
+def load_external_image(image_path: str) -> Image.Image:
+    """
+    Load an external image file.
+    
+    Args:
+        image_path: Path to the image file
+        
+    Returns:
+        PIL Image object
+        
+    Raises:
+        FileNotFoundError: If image file doesn't exist
+        Exception: If image cannot be loaded
+    """
+    try:
+        return Image.open(image_path)
+    except FileNotFoundError:
+        raise FileNotFoundError(f"Image file not found: {image_path}")
+    except Exception as e:
+        raise Exception(f"Error loading image: {e}")
+
+
+def resize_image(img: Image.Image, max_size: Tuple[int, int] = (800, 600)) -> Image.Image:
+    """
+    Resize image while maintaining aspect ratio.
+    
+    Args:
+        img: PIL Image object
+        max_size: Maximum size as (width, height)
+        
+    Returns:
+        Resized PIL Image object
+    """
+    img.thumbnail(max_size, Image.Resampling.LANCZOS)
+    return img
\ No newline at end of file