image updates

lbeurerkellner · lbeurerkellner · commit 209b51d136c3 · 2025-04-14T13:56:01.000+02:00
diff --git a/docs/guardrails/images.md b/docs/guardrails/images.md
@@ -1,67 +1,41 @@
-# Images (WIP)
+# Images
 
 <div class='subtitle'>
-Guardrail the visual perception of your agentic system.
+Secure images given to, or produced by, your agentic system.
 </div>
 
 At the core of computer vision agents is the ability to perceive their environment through images, typically by taking screenshots to assess the current state. This visual perception allows agents to understand interfaces, identify interactive elements, and make decisions based on what they "see."
 
-For security and privacy reasons, it is important to ensure that all visual information an agent processes is validated and well-scoped, to prevent exposure of sensitive information or inappropriate content.
-
-Guardrails provide you a powerful way to enforce visual security policies, and to limit the agent's perception to only the visual information that is necessary and appropriate for the task at hand.
+Additionally, some systems may allow users to submit images, posing additional risks.
 
 <div class='risks'/>
 > **Image Risks**<br/>
-> Since images are an agent's window to perceive the world, they can expose sensitive or inappropriate content. For example, an insecure vision agent could:
+> Images may be produced by, or provided to, an agentic system, presenting potential security risks. For example, an insecure agent could:
 
-> * Capture personally identifiable information (PII) like names or addresses
+> * Capture **personally identifiable information (PII)** like names or addresses.
 > 
-> * View credentials such as passwords, API keys, or access tokens
+> * View credentials such as **passwords, API keys, or access tokens**.
 > 
-> * Capture copyrighted material that shouldn't be processed or shared
-
-## Checking Image Content
-
-**Example**: Checking for PII in images
-
-```python
-from invariant.parsers import ocr
+> * Get **prompt injected** from text in an image.
 
-raise "PII in image text" if:
-    (img: Image)
-    image_text := ocr(img)
-    any(pii(image_text))
-```
-
-**Example**: Check copyrighted material
 
-
-// Maybe something that uses the information in the image
-// So combine with like toolcalls?
-```python
-from ...
-
-raise "Copyrighted text in image" if:
-    (msg: Assistant)
-    images := image(msg) # Extract all images in a single message
-    copyright(ocr(images))
-```
+Guardrails provide you a powerful way to enforce visual security policies, and to limit the agent's perception to only the visual information that is necessary and appropriate for the task at hand.
 
 
 ## ocr <span class="parser-badge"/>
 ```python
 def ocr(
-    data: Union[str, List[str]],
+    data: str, List[str],
     config: Optional[dict]
 ) -> List[str]
 ```
-Parser to extract text from images.
+Given an image as input, this parser extracts and returns the text in the image using [Tesseract](https://github.com/tesseract-ocr/tesseract).
 
 **Parameters**
 
 | Name        | Type   | Description                            |
 |-------------|--------|----------------------------------------|
-| `data`      | `Union[str, List[str]]` | A single base64 encoded image or a list of base64 encoded images. |
+| `data`      | `str, List[str]` | A single base64 encoded image or a list of base64 encoded images. |
 
 **Returns**
 
@@ -70,7 +44,7 @@ Parser to extract text from images.
 | `List[str]` | A list of extracted pieces of text from `data`. |
 
 ### Analyzing Text in Images
-The `ocr` function is a  <span class="parser-badge" size-mod="small"></span> so it returns the data found from parsing its content, in this case extracting text from an image. The extracted text can then be used for further detection, for example detecting a prompt injection in an image, like the example below.
+The `ocr` function is a  <span class="parser-badge" size-mod="small"></span> so it returns the data found from parsing its content; in this case any text present in an image will be extracted. The extracted text can then be used for further detection, for example detecting a prompt injection in an image, like the example below.
 
 **Example:** Image Prompt Injection Detection.
 ```python
@@ -89,7 +63,7 @@ raise "Found Prompt Injection in Image" if:
 
 ```python
 def image(
-    content: Union[Content | List[Content]]
+    content: Content | List[Content]
 ) -> List[Image]
 ```
 Given some `Content`, this <span class="builtin-badge" size-mod="small"></span> extracts all images. This is useful when messages may contain mixed content.
@@ -98,7 +72,7 @@ Given some `Content`, this <span class="builtin-badge" size-mod="small"></span>
 
 | Name        | Type   | Description                            |
 |-------------|--------|----------------------------------------|
-| `content`      | `Union[Content | List[Content]]` | A single instance of `Content` or a list of `Content`, possibly with mixed types. |
+| `content`      | `Content | List[Content]` | A single instance of `Content` or a list of `Content`, possibly with mixed types. |
 
 **Returns**