Handle image content in FileReadObservation

## Overview

This issue tracks the implementation of image content handling in `FileReadObservation`, which was previously implemented in OpenHands PR #10207 (https://github.com/OpenHands/OpenHands/pull/10207).

## Background

The OpenHands repository previously had functionality to detect and process image data URLs in `FileReadObservation`, wrapping them in `ImageContent` instead of `TextContent`. This ensures proper handling of image content in agent conversations.

## Changes Required

The implementation should include:

### 1. ConversationMemory Updates

Update the conversation memory to:
- Detect image data URLs (base64-encoded images with format `data:image/[type];base64,[data]`) in `FileReadObservation` content
- Wrap image data in `ImageContent` instead of `TextContent` for proper LLM processing
- Handle images in tool messages by adding them as separate user messages (workaround for LLM API limitations)

### 2. Runtime File Reading

Update the action execution server to:
- Read image files (`.png`, `.jpg`, `.jpeg`, `.bmp`, `.gif`) as base64-encoded data URLs
- Return image content in the format: `data:image/[type];base64,[encoded_data]`
- Prioritize image handling over binary file error handling
- Skip OH-ACI file editor for image files

### 3. Tool Description Updates

Update the `str_replace_editor` tool description to:
- Clearly indicate support for viewing image file types
- Update the binary file handling documentation

## Original Implementation Details

### ConversationMemory Changes

```python
import re

# In _process_observation method for FileReadObservation:
if isinstance(obs, FileReadObservation):
    content = []
    if re.match(r'^data:image/[^;]+;base64,', obs.content.strip()):
        content.append(ImageContent(image_urls=[obs.content.strip()]))
    else:
        content.append(TextContent(text=obs.content))
    message = Message(role='user', content=content)
```

### Image Content in Tool Messages

When a tool message contains images, add them as a separate user message:

```python
# After creating tool message, check for images
if message.contains_image:
    return [
        Message(
            role='user',
            content=[
                content
                for content in message.content
                if isinstance(content, ImageContent)
            ],
            vision_enabled=vision_is_active,
        )
    ]
```

### Action Execution Server Changes

```python
# Check if file is an image
is_image_file = action.path.lower().endswith(
    ('.png', '.jpg', '.jpeg', '.bmp', '.gif')
)

# Skip OH-ACI for images
if action.impl_source == FileReadSource.OH_ACI and not is_image_file:
    # Use file editor...
    pass

# Read image files as base64
if is_image_file:
    with open(filepath, 'rb') as file:
        image_data = file.read()
        encoded_image = base64.b64encode(image_data).decode('utf-8')
        # Determine image type from extension
        ext = filepath.split('.')[-1].lower()
        if ext == 'jpg':
            ext = 'jpeg'
        content = f'data:image/{ext};base64,{encoded_image}'
    return FileReadObservation(path=filepath, content=content)

# Check for binary files AFTER image handling
if is_binary(action.path):
    return ErrorObservation('ERROR_BINARY_FILE')
```

## Related Issues

- Original PR: https://github.com/OpenHands/OpenHands/pull/10207
- Related issue: https://github.com/OpenHands/OpenHands/issues/10200 (Workaround for images in tool messages)

## Testing

The implementation should be tested with:
- Reading various image file formats (PNG, JPG, JPEG, BMP, GIF)
- Verifying base64 encoding and data URL format
- Confirming proper `ImageContent` wrapping in conversation memory
- Testing image handling in both agent and non-agent contexts (FileReadSource.DEFAULT vs FileReadSource.AGENT)
- Ensuring images in tool messages are properly moved to user messages

## Notes

- This functionality was closed in the OpenHands repository with the intention to move it to the software-agent-sdk
- The implementation includes a workaround for LLM API limitations that don't allow images in tool messages (per OpenAI community discussion: https://community.openai.com/t/allowing-images-in-non-user-messages/804176/13)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle image content in FileReadObservation #1003

Overview

Background

Changes Required

1. ConversationMemory Updates

2. Runtime File Reading

3. Tool Description Updates

Original Implementation Details

ConversationMemory Changes

Image Content in Tool Messages

Action Execution Server Changes

Related Issues

Testing

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handle image content in FileReadObservation #1003

Description

Overview

Background

Changes Required

1. ConversationMemory Updates

2. Runtime File Reading

3. Tool Description Updates

Original Implementation Details

ConversationMemory Changes

Image Content in Tool Messages

Action Execution Server Changes

Related Issues

Testing

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions