Skip to content

Commit 209b51d

Browse files
image updates
1 parent c8ca54c commit 209b51d

File tree

1 file changed

+14
-40
lines changed

1 file changed

+14
-40
lines changed

docs/guardrails/images.md

Lines changed: 14 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,41 @@
1-
# Images (WIP)
1+
# Images
22

33
<div class='subtitle'>
4-
Guardrail the visual perception of your agentic system.
4+
Secure images given to, or produced by, your agentic system.
55
</div>
66

77
At the core of computer vision agents is the ability to perceive their environment through images, typically by taking screenshots to assess the current state. This visual perception allows agents to understand interfaces, identify interactive elements, and make decisions based on what they "see."
88

9-
For security and privacy reasons, it is important to ensure that all visual information an agent processes is validated and well-scoped, to prevent exposure of sensitive information or inappropriate content.
10-
11-
Guardrails provide you a powerful way to enforce visual security policies, and to limit the agent's perception to only the visual information that is necessary and appropriate for the task at hand.
9+
Additionally, some systems may allow users to submit images, posing additional risks.
1210

1311
<div class='risks'/>
1412
> **Image Risks**<br/>
15-
> Since images are an agent's window to perceive the world, they can expose sensitive or inappropriate content. For example, an insecure vision agent could:
13+
> Images may be produced by, or provided to, an agentic system, presenting potential security risks. For example, an insecure agent could:
1614
17-
> * Capture personally identifiable information (PII) like names or addresses
15+
> * Capture **personally identifiable information (PII)** like names or addresses.
1816
>
19-
> * View credentials such as passwords, API keys, or access tokens
17+
> * View credentials such as **passwords, API keys, or access tokens**.
2018
>
21-
> * Capture copyrighted material that shouldn't be processed or shared
22-
23-
## Checking Image Content
24-
25-
**Example**: Checking for PII in images
26-
27-
```python
28-
from invariant.parsers import ocr
19+
> * Get **prompt injected** from text in an image.
2920
30-
raise "PII in image text" if:
31-
(img: Image)
32-
image_text := ocr(img)
33-
any(pii(image_text))
34-
```
35-
36-
**Example**: Check copyrighted material
3721

38-
39-
// Maybe something that uses the information in the image
40-
// So combine with like toolcalls?
41-
```python
42-
from ...
43-
44-
raise "Copyrighted text in image" if:
45-
(msg: Assistant)
46-
images := image(msg) # Extract all images in a single message
47-
copyright(ocr(images))
48-
```
22+
Guardrails provide you a powerful way to enforce visual security policies, and to limit the agent's perception to only the visual information that is necessary and appropriate for the task at hand.
4923

5024

5125
## ocr <span class="parser-badge"/>
5226
```python
5327
def ocr(
54-
data: Union[str, List[str]],
28+
data: str, List[str],
5529
config: Optional[dict]
5630
) -> List[str]
5731
```
58-
Parser to extract text from images.
32+
Given an image as input, this parser extracts and returns the text in the image using [Tesseract](https://github.com/tesseract-ocr/tesseract).
5933

6034
**Parameters**
6135

6236
| Name | Type | Description |
6337
|-------------|--------|----------------------------------------|
64-
| `data` | `Union[str, List[str]]` | A single base64 encoded image or a list of base64 encoded images. |
38+
| `data` | `str, List[str]` | A single base64 encoded image or a list of base64 encoded images. |
6539

6640
**Returns**
6741

@@ -70,7 +44,7 @@ Parser to extract text from images.
7044
| `List[str]` | A list of extracted pieces of text from `data`. |
7145

7246
### Analyzing Text in Images
73-
The `ocr` function is a <span class="parser-badge" size-mod="small"></span> so it returns the data found from parsing its content, in this case extracting text from an image. The extracted text can then be used for further detection, for example detecting a prompt injection in an image, like the example below.
47+
The `ocr` function is a <span class="parser-badge" size-mod="small"></span> so it returns the data found from parsing its content; in this case any text present in an image will be extracted. The extracted text can then be used for further detection, for example detecting a prompt injection in an image, like the example below.
7448

7549
**Example:** Image Prompt Injection Detection.
7650
```python
@@ -89,7 +63,7 @@ raise "Found Prompt Injection in Image" if:
8963

9064
```python
9165
def image(
92-
content: Union[Content | List[Content]]
66+
content: Content | List[Content]
9367
) -> List[Image]
9468
```
9569
Given some `Content`, this <span class="builtin-badge" size-mod="small"></span> extracts all images. This is useful when messages may contain mixed content.
@@ -98,7 +72,7 @@ Given some `Content`, this <span class="builtin-badge" size-mod="small"></span>
9872

9973
| Name | Type | Description |
10074
|-------------|--------|----------------------------------------|
101-
| `content` | `Union[Content | List[Content]]` | A single instance of `Content` or a list of `Content`, possibly with mixed types. |
75+
| `content` | `Content | List[Content]` | A single instance of `Content` or a list of `Content`, possibly with mixed types. |
10276

10377
**Returns**
10478

0 commit comments

Comments
 (0)