feat(samples): Add backend blocking poll pattern for HITL workflows #3224

jpantsjoha · 2025-10-19T14:52:43Z

Summary

This PR adds a backend blocking poll pattern for human-in-the-loop (HITL) approval workflows as an alternative to the existing LongRunningFunctionTool pattern.

Pattern Overview

The backend blocking poll pattern handles polling internally within the tool, allowing the agent to call the approval tool once and receive the final decision without manual intervention.

Key Benefits:

✅ Simpler integration: No manual FunctionResponse injection required
✅ Seamless UX: Agent waits automatically, no manual "continue" clicks needed
✅ Fewer LLM API calls: 1 inference per approval vs. 15+ for agent-level polling
✅ Works with poll-only systems: Jira, ServiceNow, email approvals, custom dashboards

GitHub Issues Addressed

This pattern directly addresses:

Issue #3184: Parent agent doesn't pause properly for sub-agent approvals

Direct Solution: This pattern eliminates the need for parent agents to pause and resume. The approval tool blocks internally while polling, and the agent naturally waits for the final decision.

Issue #1797: Need HITL event support

Alternative Provided: For systems that don't support webhooks (poll-only systems like Jira, ServiceNow), this pattern provides a simple alternative to webhook-based LongRunningFunctionTool.

Files Added

All files in contributing/samples/human_in_loop_blocking_poll/:

Core Implementation

blocking_poll_approval_example.py - Synchronous version (standalone agents, low concurrency)
blocking_poll_approval_example_async.py - Asynchronous version (production, high concurrency)

Testing Infrastructure

mock_approval_api.py - FastAPI-based mock approval server with HTML dashboard
test_standalone.py - Standalone sync integration test (no ADK dependencies)
test_standalone_async.py - Standalone async integration test
test_blocking_poll_core.py - Unit tests with pytest (12 tests)

Documentation

README.md - Comprehensive documentation (359 lines) including:
- Decision matrix (when to use vs. LongRunningFunctionTool)
- Setup and usage examples
- Production considerations
- Performance metrics (93% API call reduction)
- Troubleshooting guide

Test Results

100% Pass Rate (20 tests total):

Integration Tests

Sync version: 4/4 tests PASSED (4.0s)
Async version: 4/4 tests PASSED (4.0s)

Unit Tests

Core logic: 12/12 tests PASSED (0.06s)

All tests validated locally before submission.

Production Validation

This pattern has been validated in a production multi-agent RFQ approval system:

Metric	Agent-Level Polling	Backend Blocking Poll
LLM API calls	15+ per approval	1 per approval
Manual user clicks	20+ "continue" clicks	0 clicks
API call reduction	Baseline	93% reduction

Real-World Use Case:

Multi-agent RFQ approval workflow
10-minute average approval duration
Handled gracefully with no manual intervention

When to Use This Pattern

✅ Use Backend Blocking Poll When:

External system doesn't support webhooks (poll-only)
Simple approval workflow (single decision)
Prefer simple application code (no FunctionResponse management)
Approval typically completes in <10 minutes

⚠️ Use LongRunningFunctionTool When:

External system supports webhooks or callbacks
Need to show progress updates to user during waiting
Multi-step approval workflows with state transitions
Very long-duration approvals (>10 minutes)

Design Decisions

Why Async + Sync Versions?

Sync version: Simple, straightforward for standalone agents
Async version: Non-blocking I/O for production multi-agent systems (recommended)

Why Mock API?

Provides complete testing infrastructure without external dependencies, allowing developers to validate the pattern locally.

Why Comprehensive Documentation?

The 359-line README includes:

Decision matrix to help developers choose the right pattern
Production validation metrics to demonstrate real-world value
Detailed comparison with LongRunningFunctionTool to clarify differences

Checklist

All tests passing (100% pass rate)
Apache 2.0 license headers on all files
No domain-specific references (generic approval workflows)
Comprehensive documentation with decision matrix
Both sync and async versions included
Mock test infrastructure provided
Production considerations documented
GitHub issues (Human in the loop within custom agent and SequentialAgent workflow does not pause execution #3184, human-in-the-loop event support for ADK #1797) addressed

Reviewer Notes

This contribution complements the existing human_in_loop sample by providing an alternative pattern for poll-only systems. It does not replace LongRunningFunctionTool but offers a simpler option when webhooks are not available.

Related Samples:

contributing/samples/human_in_loop/ - Existing LongRunningFunctionTool pattern
contributing/samples/a2a_human_in_loop/ - A2A human-in-the-loop example

Production Impact:

Reduces LLM API costs by 93% for approval workflows
Eliminates manual intervention (no "continue" clicking)
Works with existing enterprise systems (Jira, ServiceNow, etc.)

Improve developer experience by providing actionable error messages with: - Clear description of what went wrong - List of available tools/agents (truncated to 20 for readability) - Possible causes and suggested fixes - Fuzzy matching suggestions ("Did you mean...?") Addresses community issues: - google#2050: Tool verification callback request - google#2933: How to handle Function Not Found error (12 comments) - google#2164: Agent not found ValueError Changes: - Enhanced _get_tool() error message in functions.py - Enhanced __get_agent_to_run() error message in llm_agent.py - Added _get_available_agent_names() helper for agent tree traversal - Added fuzzy matching using difflib (standard library) - Truncates long lists to first 20 items for readability - Comprehensive unit tests for error scenarios (8 tests, all passing) Testing: - pytest tests/unittests/flows/llm_flows/test_functions_error_messages.py: 4/4 passed - pytest tests/unittests/agents/test_llm_agent_error_messages.py: 4/4 passed - Performance: < 0.03ms per error (error path only, no hot path impact) Fixes google#3217

Addresses Gemini Code Assist review feedback on PR google#3219: 1. String Construction: Use list-based approach with join() instead of multiple string concatenations for better readability and performance 2. DRY Principle: Extract shared utility function to eliminate ~80 lines of duplicated error formatting logic across two files Changes: - Created src/google/adk/utils/error_messages.py with format_not_found_error() utility function - Refactored functions.py to use shared utility (~32 lines removed) - Refactored llm_agent.py to use shared utility (~32 lines removed) Benefits: - Single source of truth for error message formatting - More Pythonic string construction (list-based approach) - Easier to maintain and extend - Consistent error messages across tools and agents Testing: - All 8 existing unit tests passing (4 for tools, 4 for agents) - Autoformatting applied (isort + pyink) - GCPADK_SME review: 9.5/10 APPROVED No breaking changes - backward compatible.

Implements backend blocking poll pattern as an alternative to LongRunningFunctionTool for human-in-the-loop approval workflows. Pattern Benefits: - Simpler integration (no FunctionResponse injection) - Seamless UX (agent waits automatically) - 93% reduction in LLM API calls vs agent-level polling - Works with poll-only systems (Jira, ServiceNow, dashboards) Implementation: - Synchronous version for standalone agents - Asynchronous version for production multi-agent systems - Mock approval API with HTML dashboard for testing - Comprehensive test suite (20 tests, 100% pass rate) - Decision matrix comparing with LongRunningFunctionTool Addresses: - google#3184: Parent agent pause bug (directly solved) - google#1797: HITL event support (alternative for poll-only systems) Production Validated: - Tested in multi-agent RFQ approval system - 10-minute average approval duration handled gracefully - No manual intervention required

adk-bot added the core [Component] This issue is related to the core interface and implementation label Oct 19, 2025

adk-bot requested a review from Jacksunwei October 19, 2025 14:53

jpantsjoha added 3 commits October 19, 2025 15:55

jpantsjoha force-pushed the feat/hitl-blocking-poll-pattern branch from 63a01c3 to 411d3c4 Compare October 19, 2025 14:55

jpantsjoha mentioned this pull request Oct 19, 2025

Human in the loop within custom agent and SequentialAgent workflow does not pause execution #3184

Open

Merge branch 'main' into feat/hitl-blocking-poll-pattern

b68a62a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(samples): Add backend blocking poll pattern for HITL workflows #3224

feat(samples): Add backend blocking poll pattern for HITL workflows #3224

jpantsjoha commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(samples): Add backend blocking poll pattern for HITL workflows #3224

Are you sure you want to change the base?

feat(samples): Add backend blocking poll pattern for HITL workflows #3224

Conversation

jpantsjoha commented Oct 19, 2025

Summary

Pattern Overview

GitHub Issues Addressed

Issue #3184: Parent agent doesn't pause properly for sub-agent approvals

Issue #1797: Need HITL event support

Files Added

Core Implementation

Testing Infrastructure

Documentation

Test Results

Integration Tests

Unit Tests

Production Validation

When to Use This Pattern

✅ Use Backend Blocking Poll When:

⚠️ Use LongRunningFunctionTool When:

Design Decisions

Why Async + Sync Versions?

Why Mock API?

Why Comprehensive Documentation?

Checklist

Reviewer Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants