Skip to content

Commit 8033392

Browse files
committed
docs: add guide for bot reasoning guardrails
update update simplify cleanup
1 parent c8c18b0 commit 8033392

File tree

1 file changed

+197
-0
lines changed

1 file changed

+197
-0
lines changed
Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# Guardrailing Bot Reasoning Content
2+
3+
Modern reasoning-capable LLMs expose their internal thought process as reasoning traces. These traces reveal how the model arrives at its conclusions, which can be valuable for transparency but may also contain sensitive information or problematic reasoning patterns.
4+
5+
NeMo Guardrails allows you to inspect and control these reasoning traces by extracting them and making them available throughout your guardrails configuration. This enables you to write guardrails that can block responses based on the model's reasoning process, enhance moderation decisions with reasoning context, or monitor reasoning patterns.
6+
7+
```{important}
8+
The examples in this guide range from minimal toy examples (for understanding concepts) to complete reference implementations. They are designed to teach you how to access and work with `bot_thinking` in different contexts, not as production-ready code to copy-paste. Adapt these patterns to your specific use case with appropriate validation, error handling, and business logic for your application.
9+
```
10+
11+
## Accessing Reasoning Content
12+
13+
When an LLM generates a response with reasoning traces, NeMo Guardrails automatically extracts the reasoning and makes it available in three ways:
14+
15+
### In Colang Flows: `$bot_thinking` Variable
16+
17+
The reasoning content is available as a context variable in Colang output rails:
18+
19+
```colang
20+
define flow check_reasoning
21+
if $bot_thinking
22+
$captured_reasoning = $bot_thinking
23+
```
24+
25+
### In Custom Actions: `context.get("bot_thinking")`
26+
27+
When writing Python actions, you can access the reasoning via the context dictionary:
28+
29+
```python
30+
@action(is_system_action=True)
31+
async def check_reasoning(context: Optional[dict] = None):
32+
bot_thinking = context.get("bot_thinking")
33+
if bot_thinking and "sensitive" in bot_thinking:
34+
return False
35+
return True
36+
```
37+
38+
### In Prompt Templates: `{{ bot_thinking }}`
39+
40+
When rendering prompts for LLM tasks (like `self check output`), the reasoning is available as a Jinja2 template variable:
41+
42+
```yaml
43+
prompts:
44+
- task: self_check_output
45+
content: |
46+
Bot message: "{{ bot_response }}"
47+
48+
{% if bot_thinking %}
49+
Bot reasoning: "{{ bot_thinking }}"
50+
{% endif %}
51+
52+
Should this be blocked (Yes or No)?
53+
```
54+
55+
**Important**: Always check if reasoning exists before using it, as not all models provide reasoning traces.
56+
57+
## Guardrailing with Output Rails
58+
59+
Output rails can use the `$bot_thinking` variable to inspect and control responses based on reasoning content.
60+
61+
### Basic Pattern Matching
62+
63+
```colang
64+
define bot refuse to respond
65+
"I'm sorry, I can't respond to that."
66+
67+
define flow block_sensitive_reasoning
68+
if $bot_thinking
69+
if "confidential" in $bot_thinking or "internal only" in $bot_thinking
70+
bot refuse to respond
71+
stop
72+
```
73+
74+
Add this flow to your output rails in `config.yml`:
75+
76+
```yaml
77+
rails:
78+
output:
79+
flows:
80+
- block_sensitive_reasoning
81+
```
82+
83+
```{note}
84+
This demonstrates basic pattern matching for learning purposes. Real implementations should use more comprehensive validation and consider edge cases.
85+
```
86+
87+
## Guardrailing with Custom Actions
88+
89+
For complex validation logic or reusable checks across multiple flows, write custom Python actions:
90+
91+
**config/actions.py**:
92+
93+
```python
94+
from typing import Optional
95+
from nemoguardrails.actions import action
96+
97+
@action(is_system_action=True)
98+
async def check_reasoning_quality(context: Optional[dict] = None):
99+
bot_thinking = context.get("bot_thinking")
100+
101+
if not bot_thinking:
102+
return True
103+
104+
forbidden_patterns = [
105+
"proprietary information",
106+
"trade secret",
107+
"confidential data"
108+
]
109+
110+
for pattern in forbidden_patterns:
111+
if pattern.lower() in bot_thinking.lower():
112+
return False
113+
114+
return True
115+
```
116+
117+
**config/rails/reasoning_check.co**:
118+
119+
```colang
120+
define bot refuse to respond
121+
"I'm sorry, I can't respond to that."
122+
123+
define flow quality_check_reasoning
124+
$is_safe = execute check_reasoning_quality
125+
126+
if not $is_safe
127+
bot refuse to respond
128+
stop
129+
```
130+
131+
Add to your output rails:
132+
133+
```yaml
134+
rails:
135+
output:
136+
flows:
137+
- quality_check_reasoning
138+
```
139+
140+
## Using Reasoning in Self-Check Output
141+
142+
This is the **complete reference implementation** from [examples/configs/self_check_thinking/](https://github.com/NVIDIA/NeMo-Guardrails/tree/bc799fbb05e1f12f1ba79461f7f7378b3af50c22/examples/configs/self_check_thinking), showing how `bot_thinking` is used in practice. This pattern provides reasoning traces to your self-check output rail, allowing the moderation LLM to make more informed decisions.
143+
144+
### Configuration
145+
146+
**config.yml**:
147+
148+
```yaml
149+
models:
150+
- type: main
151+
engine: <your_engine>
152+
model: <your_reasoning_model>
153+
- type: self_check_output
154+
model: <your_moderation_model>
155+
engine: <your_engine>
156+
157+
rails:
158+
output:
159+
flows:
160+
- self check output
161+
```
162+
163+
**prompts.yml**:
164+
165+
```yaml
166+
prompts:
167+
- task: self_check_output
168+
content: |
169+
Your task is to check if the bot message below complies with the company policy.
170+
171+
Company policy for the bot:
172+
- messages should not contain any explicit content
173+
- messages should not contain abusive language or offensive content
174+
- messages should not contain any harmful content
175+
- messages should not contain racially insensitive content
176+
- if a message is a refusal, should be polite
177+
178+
Bot message: "{{ bot_response }}"
179+
180+
{% if bot_thinking %}
181+
Bot thinking/reasoning: "{{ bot_thinking }}"
182+
{% endif %}
183+
184+
Question: Should the message be blocked (Yes or No)?
185+
Answer:
186+
```
187+
188+
The `{% if bot_thinking %}` conditional ensures the prompt works with both reasoning and non-reasoning models. When reasoning is available, the self-check LLM can evaluate both the final response and the reasoning process.
189+
190+
**Explore the complete implementation**: You can find the full working configuration in [examples/configs/self_check_thinking/](https://github.com/NVIDIA/NeMo-Guardrails/tree/bc799fbb05e1f12f1ba79461f7f7378b3af50c22/examples/configs/self_check_thinking) with all files ready to use as a reference for your own implementation.
191+
192+
## See Also
193+
194+
- [LLM Configuration - Using LLMs with Reasoning Traces](../configuration-guide/llm-configuration.md#using-llms-with-reasoning-traces) - API response handling and breaking changes
195+
- [Output Rails](../../getting-started/5-output-rails/README.md) - General guide on output rails
196+
- [Self-Check Output Example](https://github.com/NVIDIA/NeMo-Guardrails/tree/bc799fbb05e1f12f1ba79461f7f7378b3af50c22/examples/configs/self_check_thinking) - Complete working configuration
197+
- [Custom Actions](../../colang-language-syntax-guide.md#actions) - Guide on writing custom actions

0 commit comments

Comments
 (0)