-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix(images): Generate Non-Square Images #5626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR implements comprehensive support for non-square image generation throughout the Onyx application stack. Previously, the system could only generate square (1:1 aspect ratio) images, but now users can create landscape and portrait images with different dimensions.The changes span the entire image generation pipeline from backend to frontend. On the backend, the image generation tool now supports shape detection through multiple methods: JSON parameter parsing, metadata extraction, and keyword detection in user queries. The system maps shapes (square, landscape, portrait) to specific pixel dimensions that vary by model - for example, gpt-image-1
uses different dimension sets than other models. A new heartbeat mechanism was added to provide real-time feedback about expected image dimensions during generation.
On the frontend, components were updated to handle dynamic aspect ratios instead of fixed square dimensions. The GeneratingImageDisplay
and InMessageImage
components now accept optional width/height props and calculate CSS aspect ratios dynamically. The streaming models were extended with new packet types and interfaces to communicate dimension metadata between backend and frontend.
The architecture maintains backward compatibility by making all new dimension fields optional throughout the codebase. When explicit dimensions aren't provided, the system falls back to natural image dimensions or defaults to square ratios. This ensures existing functionality continues working while enabling the new non-square capabilities.
Important Files Changed
Changed Files
Filename | Score | Overview |
---|---|---|
web/src/app/chat/services/streamingModels.ts |
5/5 | Adds heartbeat packet types and dimension metadata to support real-time image generation updates |
web/src/app/chat/components/tools/GeneratingImageDisplay.tsx |
4/5 | Updates component to accept width/height props and display images with dynamic aspect ratios |
backend/onyx/tools/tool_implementations/images/image_generation_tool.py |
3/5 | Enhances image generation with size parsing and dimension metadata - contains potential string iteration issue |
backend/onyx/server/query_and_chat/streaming_models.py |
5/5 | Adds shape and dimension fields to ImageGenerationToolHeartbeat class |
web/src/app/chat/components/files/images/InMessageImage.tsx |
4/5 | Implements dynamic aspect ratio handling and natural dimension detection for image display |
backend/onyx/agents/agent_search/dr/sub_agents/image_generation/dr_image_generation_2_act.py |
4/5 | Adds comprehensive shape detection and dimension calculation logic for image generation requests |
web/src/app/chat/message/messageComponents/renderers/ImageToolRenderer.tsx |
4/5 | Integrates heartbeat dimension data and passes dimensions to image display components |
backend/onyx/agents/agent_search/dr/sub_agents/image_generation/models.py |
5/5 | Extends GeneratedImage model with optional width, height, and shape fields |
Confidence score: 3/5
- This PR requires careful review due to potential issues in the image generation tool implementation and complex cross-stack changes
- Score reflects concerns about a potential bug in string iteration and the complexity of the shape detection logic that could cause runtime errors
- Pay close attention to
backend/onyx/tools/tool_implementations/images/image_generation_tool.py
whereresponse.revised_prompt
is being iterated as if it were a collection when it's a string field
Sequence Diagram
sequenceDiagram
participant User
participant ImageGenerationTool
participant DR_Agent as "DR Image Generation Agent"
participant LiteLLM as "LiteLLM API"
participant FileStore as "File Store"
participant Frontend as "Frontend UI"
User->>DR_Agent: "Request image with specific shape"
DR_Agent->>DR_Agent: "_extract_prompt_and_shape(query, metadata)"
DR_Agent->>DR_Agent: "_normalize_shape(raw_shape)"
DR_Agent->>ImageGenerationTool: "run(prompt, shape)"
ImageGenerationTool->>ImageGenerationTool: "Start background thread for generation"
loop "While generating"
ImageGenerationTool->>Frontend: "yield ToolResponse(IMAGE_GENERATION_HEARTBEAT_ID)"
Frontend->>Frontend: "Show GeneratingImageDisplay with progress"
end
ImageGenerationTool->>ImageGenerationTool: "_generate_image(prompt, shape, format)"
ImageGenerationTool->>LiteLLM: "image_generation(prompt, model, size, format)"
LiteLLM-->>ImageGenerationTool: "ImageGenerationResponse with dimensions"
ImageGenerationTool->>DR_Agent: "yield ToolResponse(IMAGE_GENERATION_RESPONSE_ID)"
DR_Agent->>FileStore: "save_files(urls, base64_files)"
FileStore-->>DR_Agent: "file_ids"
DR_Agent->>DR_Agent: "Create GeneratedImage objects with dimensions"
DR_Agent->>Frontend: "Stream ImageGenerationToolDelta with images"
Frontend->>Frontend: "Update ImageToolRenderer with completed images"
Frontend->>User: "Display InMessageImage with download button"
User->>Frontend: "Click download button"
Frontend->>FileStore: "fetch(buildImgUrl(fileId))"
FileStore-->>Frontend: "Image blob"
Frontend->>User: "Download image file"
Additional Comments (1)
-
backend/onyx/tools/tool_implementations/images/image_generation_tool.py
, line 464-467 (link)logic: Potential bug: iterating over
revised_prompt
string instead of collection.revised_prompt
is a string field, not iterable.
8 files reviewed, 5 comments
shape: str | None = None | ||
width: int | None = None | ||
height: int | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider using the ImageShape enum from the tool implementation instead of str | None
for the shape field to ensure type safety and consistency across the codebase
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/onyx/server/query_and_chat/streaming_models.py
Line: 75:77
Comment:
**style:** Consider using the ImageShape enum from the tool implementation instead of `str | None` for the shape field to ensure type safety and consistency across the codebase
How can I resolve this? If you propose a fix, please make it concise.
onLoad={() => setImageLoaded(true)} | ||
onLoad={(event) => { | ||
setImageLoaded(true); | ||
if (!width || !height) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The condition !width || !height
could be problematic if one dimension is 0 but the other is valid. Consider using !width && !height
or checking for falsy values more explicitly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: web/src/app/chat/components/files/images/InMessageImage.tsx
Line: 72:72
Comment:
**logic:** The condition `!width || !height` could be problematic if one dimension is 0 but the other is valid. Consider using `!width && !height` or checking for falsy values more explicitly.
How can I resolve this? If you propose a fix, please make it concise.
run_kwargs: dict[str, str] = {"prompt": image_prompt} | ||
if image_shape_enum: | ||
run_kwargs["shape"] = image_shape_enum.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Type annotation inconsistency - run_kwargs is declared as dict[str, str]
but image_shape_enum.value
could be None, causing a type mismatch if shape is added
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/onyx/agents/agent_search/dr/sub_agents/image_generation/dr_image_generation_2_act.py
Line: 191:193
Comment:
**logic:** Type annotation inconsistency - run_kwargs is declared as `dict[str, str]` but `image_shape_enum.value` could be None, causing a type mismatch if shape is added
How can I resolve this? If you propose a fix, please make it concise.
revised_prompt: str | ||
width: int | None = None | ||
height: int | None = None | ||
shape: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider using an enum for shape values instead of string to maintain consistency with ImageShape enum used elsewhere in the codebase
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/onyx/agents/agent_search/dr/sub_agents/image_generation/models.py
Line: 10:10
Comment:
**style:** Consider using an enum for shape values instead of string to maintain consistency with ImageShape enum used elsewhere in the codebase
How can I resolve this? If you propose a fix, please make it concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 8 files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good frontend changes. For backend the tool args already contain shape; we should be extracting shape from the generated tool args and passing it through the graph to be passed into the image gen tool call rather than doing static analysis on the prompt.
3b565db
to
9bd1bee
Compare
Closing in favor for: #5631 |
Description
[Provide a brief description of the changes in this PR]
Allowing people to generate images that are not square. This allows the user to set certain images with landscape or vertical dimensions to be able to change the layout.
Fixing the loading state and download button as well when image is generated
How Has This Been Tested?
[Describe the tests you ran to verify your changes]
Ran this locally and allows to test multiple different prompts to ensure that the image can generate different types of dimensions.
Additional Options