A Question in Batch Size and torch.Tensor.expand

Thank you for providing this reproduction!

I have a question on the grouped convolution: in this [line](https://github.com/kaijieshi7/Dynamic-convolution-Pytorch/blob/4befa50c97de72cd093316edb29522e8ebd8fc5e/dynamic_conv.py#LL88C23-L88C23) you use the grouped convolution to solve the mini-batch training problem. 

Could we use the `torch.Tensor.expand` to replace the grouped convolution, like:

```
weight_prime = weight.expand(K, weight.shape[0], weight.shape[1], weight.shape[2], weight.shape[3])
weight = torch.mm(softmax_attention, weight_prime).view(-1, x.shape[1], self.kernel_size, self.kernel_size)
```

In this way, we might aggregate the attention weight and the convolution weight together. However, this may cause another problem. If batch size ($\mathcal{B}$) is larger than 1, the attention weight would be a matrix with $\mathcal{B} \times K$, I think we can use `torch.mean(attention_weight, dim=0)` or `torch.max(attention_weight, dim=0)` since they are calculated within the batch, in which the range is very close. 

I am not sure whether this calculation is equivalent to the  [line](https://github.com/kaijieshi7/Dynamic-convolution-Pytorch/blob/4befa50c97de72cd093316edb29522e8ebd8fc5e/dynamic_conv.py#LL88C23-L88C23) :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A Question in Batch Size and torch.Tensor.expand #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

A Question in Batch Size and torch.Tensor.expand #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions