Skip to content

OSX : 'TensorFlow.js v4.22.0' / webgpu - Err: "...Failed to execute 'mapAsync' on 'GPUBuffer'..." #8571

@DXXS

Description

@DXXS

System information

OSX, tf.js v4.22.0, Chrome 137.0.7151.120 or MS Edge 138.0.3351.65

Standalone code to reproduce the issue

I've put a copy of my AI codeveloped tic-tac-toe reinforcement learning code, which encountered said error, as I was running it, at:

https://computacor.com/WebGPU_TicTacToe_Ov7_GPUError.html

Describe the current behavior

Execution ~hurls about 2/3 the way through learning the game, at this point in the console log:
"Initializing application...
Loading TensorFlow.js...
Loaded TensorFlow.js v4.22.0
Attempting webgpu backend...
✅ Using webgpu backend
🤖 AI initialized. Click 'Start Training'.
🚀 Starting robust self-play training...
Goal: Achieve 100% draw rate consistently.
Ep 500/100000 | Draws: 11.2% | P1 Wins: 60.4% | P2 Wins: 28.4% | ε: 0.779
...
Ep 64000/100000 | Draws: 95.4% | P1 Wins: 2.4% | P2 Wins: 2.2% | ε: 0.010
Ep 64500/100000 | Draws: 96.4% | P1 Wins: 2.6% | P2 Wins: 1.0% | ε: 0.010
Ep 65000/100000 | Draws: 96.4% | P1 Wins: 2.2% | P2 Wins: 1.4% | ε: 0.010
Ep 65500/100000 | Draws: 94.6% | P1 Wins: 2.8% | P2 Wins: 2.6% | ε: 0.010
[06:09:37 PM] [UNHANDLEDREJECTION] AbortError: Failed to execute 'mapAsync' on 'GPUBuffer': A valid external Instance reference no longer exists."

I subsequently tried it in "MS Edge 138.0.3351.65", which hurled similarly, with a somewhat different error, shortly thereafter:
"...
WebGPU_TicTacToe_Ov7.html:157 Ep 64500/100000 | Draws: 97.0% | P1 Wins: 2.2% | P2 Wins: 0.8% | ε: 0.010
WebGPU_TicTacToe_Ov7.html:157 Ep 65000/100000 | Draws: 95.4% | P1 Wins: 2.8% | P2 Wins: 1.8% | ε: 0.010
WebGPU_TicTacToe_Ov7.html:157 Ep 65500/100000 | Draws: 52.2% | P1 Wins: 46.4% | P2 Wins: 1.4% | ε: 0.010
WebGPU_TicTacToe_Ov7.html:157 Ep 66000/100000 | Draws: 2.2% | P1 Wins: 95.8% | P2 Wins: 2.0% | ε: 0.010
WebGPU_TicTacToe_Ov7.html:1 A valid external Instance reference no longer exists.
WebGPU_TicTacToe_Ov7.html:412 Uncaught (in promise) AbortError: Failed to execute 'mapAsync' on 'GPUBuffer': A valid external Instance reference no longer exists."

[ NOTE: >>> After third parties indicated that this may occur in low memory situations, subsequent extended debugging indicates that a couple of tensors are being ~lost each time through train(), even though system shows CPU memory remains plenty & sufficient <<<]

Describe the expected behavior

If low memory is indeed the entire issue, I think that tf.js functions should at least be throwing a more intelligible error, indicating it's out of memory (& maybe why & how much). I also think that the refusal of tf.tidy() to allow for async subfunction calls (even if awaited?) appears to exacerbate the difficulty in manual performance of garbage collection for tf.

Other info / logs

Partial logs are listed in ^^current behavior^^ & can be seen via executing sample code.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions