Skip to content

Device Mismatch Error with cpu and cuda #213

@ImNotRog

Description

@ImNotRog

Hi, thank you for this repository! I've been trying to use BenchMARL with Melting Pot, but I keep encountering a Pytorch device mismatch error whenever I run my experiment. I'm currently using the configuration in #78, where the algorithm is IPPO, the train_device is "cuda", and the sampling_device is "cpu". The stack trace is below:

Traceback (most recent call last):
  File "/home/gridsan/rfan/Melting-Pot-MARL/melting_pot_run.py", line 27, in hydra_experiment
    experiment.run()
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/experiment/experiment.py", line 649, in run
    raise err
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/experiment/experiment.py", line 641, in run
    self._collection_loop()
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/experiment/experiment.py", line 718, in _collection_loop
    group_batch = self.algorithm.process_batch(group, group_batch)
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/algorithms/ippo.py", line 246, in process_batch
    loss.value_estimator(
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/objectives/value/advantages.py", line 79, in new_func
    return fun(self, *args, **kwargs)
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/objectives/value/advantages.py", line 68, in new_fun
    return fun(self, *args, **kwargs)
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/tensordict/nn/common.py", line 328, in wrapper
    return func(_self, tensordict, *args, **kwargs)
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/objectives/value/advantages.py", line 1468, in forward
    value, next_value = self._call_value_nets(
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/objectives/value/advantages.py", line 527, in _call_value_nets
    data_out = _vmap_func(
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/_functorch/apis.py", line 203, in wrapped
    return vmap_impl(
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 331, in vmap_impl
    return _flat_vmap(
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/_functorch/vmap.py", line 479, in _flat_vmap
    batched_outputs = func(*batched_inputs, **kwargs)
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/objectives/utils.py", line 539, in decorated_module
    return module(*module_args)
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/models/common.py", line 161, in forward
    tensordict = self._forward(tensordict)
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/benchmarl/models/cnn.py", line 281, in _forward
    cnn_out = self.cnn.forward(input)
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/modules/models/multiagent.py", line 153, in forward
    output = self._empty_net(inputs)
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gridsan/rfan/.local/lib/python3.10/site-packages/torchrl/modules/models/models.py", line 542, in forward
    out = super().forward(inputs)
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
    input = module(input)
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 554, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/state/partition1/llgrid/pkg/anaconda/python-ML-2025a/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
    return F.conv2d(
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

I'm running Python 3.10.14, Torch 2.5.1, and BenchMARL 1.5.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions