-
Notifications
You must be signed in to change notification settings - Fork 102
Closed
Description
Hi, thank you for this repository! I've been trying to use BenchMARL with Melting Pot, but I keep encountering a Pytorch device mismatch error whenever I run my experiment. I'm currently using the configuration in #78, where the algorithm is IPPO, the train_device is "cuda", and the sampling_device is "cpu". The stack trace is below:
Traceback (most recent call last):
File "/home/gridsan/rfan/Melting-Pot-MARL/melting_pot_run.py", line 27, in hydra_experiment
experiment.run()
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/experiment/experiment.py", line 649, in run
raise err
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/experiment/experiment.py", line 641, in run
self._collection_loop()
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/experiment/experiment.py", line 718, in _collection_loop
group_batch = self.algorithm.process_batch(group, group_batch)
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/algorithms/ippo.py", line 246, in process_batch
loss.value_estimator(
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/objectives/value/advantages.py", line 79, in new_func
return fun(self, *args, **kwargs)
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/objectives/value/advantages.py", line 68, in new_fun
return fun(self, *args, **kwargs)
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/tensordict/nn/common.py", line 328, in wrapper
return func(_self, tensordict, *args, **kwargs)
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/objectives/value/advantages.py", line 1468, in forward
value, next_value = self._call_value_nets(
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/objectives/value/advantages.py", line 527, in _call_value_nets
data_out = _vmap_func(
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/_functorch/apis.py", line 203, in wrapped
return vmap_impl(
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 331, in vmap_impl
return _flat_vmap(
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 479, in _flat_vmap
batched_outputs = func(*batched_inputs, **kwargs)
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/objectives/utils.py", line 539, in decorated_module
return module(*module_args)
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/models/common.py", line 161, in forward
tensordict = self._forward(tensordict)
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/models/cnn.py", line 281, in _forward
cnn_out = self.cnn.forward(input)
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/modules/models/multiagent.py", line 153, in forward
output = self._empty_net(inputs)
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/modules/models/models.py", line 542, in forward
out = super().forward(inputs)
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
input = module(input)
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
return F.conv2d(
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
I'm running Python 3.10.14, Torch 2.5.1, and BenchMARL 1.5.0.
Metadata
Metadata
Assignees
Labels
No labels