-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
Overview
This issue tracks the implementation of image content handling in FileReadObservation, which was previously implemented in OpenHands PR #10207 (OpenHands/OpenHands#10207).
Background
The OpenHands repository previously had functionality to detect and process image data URLs in FileReadObservation, wrapping them in ImageContent instead of TextContent. This ensures proper handling of image content in agent conversations.
Changes Required
The implementation should include:
1. ConversationMemory Updates
Update the conversation memory to:
- Detect image data URLs (base64-encoded images with format
data:image/[type];base64,[data]) inFileReadObservationcontent - Wrap image data in
ImageContentinstead ofTextContentfor proper LLM processing - Handle images in tool messages by adding them as separate user messages (workaround for LLM API limitations)
2. Runtime File Reading
Update the action execution server to:
- Read image files (
.png,.jpg,.jpeg,.bmp,.gif) as base64-encoded data URLs - Return image content in the format:
data:image/[type];base64,[encoded_data] - Prioritize image handling over binary file error handling
- Skip OH-ACI file editor for image files
3. Tool Description Updates
Update the str_replace_editor tool description to:
- Clearly indicate support for viewing image file types
- Update the binary file handling documentation
Original Implementation Details
ConversationMemory Changes
import re
# In _process_observation method for FileReadObservation:
if isinstance(obs, FileReadObservation):
content = []
if re.match(r'^data:image/[^;]+;base64,', obs.content.strip()):
content.append(ImageContent(image_urls=[obs.content.strip()]))
else:
content.append(TextContent(text=obs.content))
message = Message(role='user', content=content)Image Content in Tool Messages
When a tool message contains images, add them as a separate user message:
# After creating tool message, check for images
if message.contains_image:
return [
Message(
role='user',
content=[
content
for content in message.content
if isinstance(content, ImageContent)
],
vision_enabled=vision_is_active,
)
]Action Execution Server Changes
# Check if file is an image
is_image_file = action.path.lower().endswith(
('.png', '.jpg', '.jpeg', '.bmp', '.gif')
)
# Skip OH-ACI for images
if action.impl_source == FileReadSource.OH_ACI and not is_image_file:
# Use file editor...
pass
# Read image files as base64
if is_image_file:
with open(filepath, 'rb') as file:
image_data = file.read()
encoded_image = base64.b64encode(image_data).decode('utf-8')
# Determine image type from extension
ext = filepath.split('.')[-1].lower()
if ext == 'jpg':
ext = 'jpeg'
content = f'data:image/{ext};base64,{encoded_image}'
return FileReadObservation(path=filepath, content=content)
# Check for binary files AFTER image handling
if is_binary(action.path):
return ErrorObservation('ERROR_BINARY_FILE')Related Issues
- Original PR: Handle image content in
FileReadObservationOpenHands#10207 - Related issue: Workaround: Add image content as user message after tool message OpenHands#10200 (Workaround for images in tool messages)
Testing
The implementation should be tested with:
- Reading various image file formats (PNG, JPG, JPEG, BMP, GIF)
- Verifying base64 encoding and data URL format
- Confirming proper
ImageContentwrapping in conversation memory - Testing image handling in both agent and non-agent contexts (FileReadSource.DEFAULT vs FileReadSource.AGENT)
- Ensuring images in tool messages are properly moved to user messages
Notes
- This functionality was closed in the OpenHands repository with the intention to move it to the software-agent-sdk
- The implementation includes a workaround for LLM API limitations that don't allow images in tool messages (per OpenAI community discussion: https://community.openai.com/t/allowing-images-in-non-user-messages/804176/13)
Metadata
Metadata
Assignees
Labels
No labels