Skip to content

Commit ced0adf

Browse files
aliafzalfacebook-github-bot
authored andcommitted
Configerator based PlanLoader implementation (pytorch#3356)
Summary: Pull Request resolved: pytorch#3356 Add ConfigeratorPlanLoader an implementation of the PlanLoader interface to enable: **Key Features:** 1. Plan Retrieval: Loads compressed sharding plans from Configerator using plan_id 2. Database Integration: Queries PlannerStatsDB to get storage location and context hash 3. Decompression: Uses zstd to decompress stored plan data 4. Thrift Conversion: Deserializes Thrift structures and converts back to Python ShardingOption objects 5. Error Handling: Failure scenarios with configurable fallback behavior **Error Handling & Fallback Scenarios:** The implementation supports two distinct error handling modes controlled by `enable_fallback`: **Normal Mode (enable_fallback=False - Default):** - Raises `PlannerError` with `PLAN_LOADING_FAILED` type for any failure - Error scenarios include: - Network connectivity issues (Configerator service unavailable) - Invalid plan id or config path - Data decompression failures - Thrift deserialization errors - Thrift-to-Python conversion failures **Fallback Mode (enable_fallback=True):** - Returns `None` instead of raising exceptions on loading failures - Logs detailed warning messages with plan_id, config_path, and error details - Enables graceful degradation where system can fall back to alternative planning strategies - Suitable for development, experimentation, or scenarios prioritizing availability over strict error handling - Warning logs include full context for debugging: plan ID, Configerator path, and original error Reviewed By: mserturk Differential Revision: D81573577 fbshipit-source-id: 93e84c86fb0b9bccd443a93e2b5785e1bc06a349
1 parent 228f430 commit ced0adf

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

torchrec/distributed/planner/types.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -796,6 +796,7 @@ class PlannerErrorType(Enum):
796796
PARTITION = "partition"
797797
OTHER = "other"
798798
PLANNER_INPUT_CONTEXT_MISMATCH = "planner_input_context_mismatch"
799+
PLAN_LOADING_FAILED = "plan_loading_failed"
799800

800801

801802
class PlannerError(Exception):

0 commit comments

Comments
 (0)