s2anet问题： #3870

jackie8310 · 2021-08-03T03:13:11Z

用release2.2中s2anet_1x_spine.yml训练spine_coco数据集，配置文件如下：
BASE: [
'../datasets/spine_coco.yml',
'../runtime.yml',
'base/s2anet_optimizer_1x.yml',
'base/s2anet.yml',
'base/s2anet_reader.yml',
]

weights: output/s2anet_1x_spine/model_final

for 4 card

LearningRate:
base_lr: 0.00125

S2ANetHead:
anchor_strides: [8, 16, 32, 64, 128]
anchor_scales: [4]
anchor_ratios: [4.0]
anchor_assign: RBoxAssigner
stacked_convs: 2
feat_in: 256
feat_out: 256
num_classes: 9
align_conv_type: 'AlignConv' # AlignConv Conv
align_conv_size: 3
use_sigmoid_cls: True
reg_loss_weight: [1.0, 1.0, 1.0, 1.0, 1.05]
cls_loss_weight: [1.05, 1.0]
reg_loss_type: 'l1'

训练12epoch后：mAP(0.50, 11point) = 9.45%

[08/03 10:48:35] ppdet.engine INFO: Epoch: [11] [200/230] learning_rate: 0.000013 fam_cls_loss: 0.093969 fam_reg_loss: 1.341817 odm_cls_loss: 0.100388 odm_reg_loss: 0.080870 loss: 1.627664 eta: 0:00:15 batch_cost: 0.5041 data_cost: 0.0002 ips: 1.9838 images/s
[08/03 10:48:53] ppdet.utils.checkpoint INFO: Save checkpoint: output/s2anet_1x_spine
[08/03 10:48:54] ppdet.engine INFO: Eval iter: 0
[08/03 10:49:36] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.
[08/03 10:49:36] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[08/03 10:49:36] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 9.45%
[08/03 10:49:36] ppdet.engine INFO: Total sample number: 57, averge FPS: 1.3343229141074622
[08/03 10:49:36] ppdet.engine INFO: Best test bbox ap is 0.094.

训练自己的数据集（通过roLabelImg 来标注旋转矩形框，并转成coco），训练24epoch后：mAP(0.50, 11point) = 3.1%。
问题：1、首先两个数据集没有问题，为何mAP很低？
2、其中anchor_scales: [4]和 anchor_ratios: [1.0] 与anchor生成相关，如果自己数据集anchor分布很分散可能多个尺寸更好回归，当anchor_ratios设置多种比例时就会报错（如下），只能固定一种吗？该如何调整以提高mAP？
Traceback (most recent call last):
File "tools/train.py", line 139, in
main()
File "tools/train.py", line 135, in main
run(FLAGS, cfg)
File "tools/train.py", line 110, in run
trainer.train(FLAGS.eval)
File "/home/aistudio/PaddleDetection-release-2.2/ppdet/engine/trainer.py", line 357, in train
outputs = model(data)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, **kwargs)
File "/home/aistudio/PaddleDetection-release-2.2/ppdet/modeling/architectures/meta_arch.py", line 26, in forward
out = self.get_loss()
File "/home/aistudio/PaddleDetection-release-2.2/ppdet/modeling/architectures/s2anet.py", line 97, in get_loss
loss = self._forward()
File "/home/aistudio/PaddleDetection-release-2.2/ppdet/modeling/architectures/s2anet.py", line 73, in _forward
self.s2anet_head(body_feats)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, **kwargs)
File "/home/aistudio/PaddleDetection-release-2.2/ppdet/modeling/heads/s2anet_head.py", line 441, in forward
featmap_size, self.anchor_strides[feat_idx])
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, **kwargs)
File "/home/aistudio/PaddleDetection-release-2.2/ppdet/modeling/heads/s2anet_head.py", line 90, in forward
all_anchors = self.base_anchors[:, :] + shifts[:, :]
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py", line 250, in impl
return math_op(self, other_var, 'axis', axis)
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [3, 4] and the shape of Y = [13824, 4]. Received [3] in X is not equal to [13824] in Y at i:0.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.h:169)
[operator < elementwise_add > error]

lyuwenyu · 2021-08-04T03:19:13Z

其中anchor_scales: [4]和 anchor_ratios: [1.0]
有尝试其他的比例嘛也会出现这种错误？

lyuwenyu · 2021-08-04T09:11:34Z

建议用8卡的配置文件，4卡的话得修改optimizer的超参

jackie8310 · 2021-08-04T11:01:25Z

@lyuwenyu ,用了最新的develop分支，单卡train，LR：0.00125，epoch:36，LOSS有下降，但mAP(0.50, 11point) = 4.72%
08/04 18:49:33] ppdet.engine INFO: Eval iter: 200
[08/04 18:49:45] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.
[08/04 18:49:45] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[08/04 18:49:45] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 4.72%
修改ratios，如下：
anchor_strides: [8, 16, 32, 64, 128]
anchor_scales: [4]
anchor_ratios: [1.0, 16.0]
报错：
Traceback (most recent call last):
File "tools/train.py", line 139, in
main()
File "tools/train.py", line 135, in main
run(FLAGS, cfg)
File "tools/train.py", line 110, in run
trainer.train(FLAGS.eval)
File "/home/aistudio/PaddleDetection-develop/ppdet/engine/trainer.py", line 360, in train
outputs = model(data)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, **kwargs)
File "/home/aistudio/PaddleDetection-develop/ppdet/modeling/architectures/meta_arch.py", line 26, in forward
out = self.get_loss()
File "/home/aistudio/PaddleDetection-develop/ppdet/modeling/architectures/s2anet.py", line 97, in get_loss
loss = self._forward()
File "/home/aistudio/PaddleDetection-develop/ppdet/modeling/architectures/s2anet.py", line 73, in _forward
self.s2anet_head(body_feats)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, **kwargs)
File "/home/aistudio/PaddleDetection-develop/ppdet/modeling/heads/s2anet_head.py", line 441, in forward
featmap_size, self.anchor_strides[feat_idx])
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, **kwargs)
File "/home/aistudio/PaddleDetection-develop/ppdet/modeling/heads/s2anet_head.py", line 90, in forward
all_anchors = self.base_anchors[:, :] + shifts[:, :]
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py", line 250, in impl
return math_op(self, other_var, 'axis', axis)
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [2, 4] and the shape of Y = [16384, 4]. Received [2] in X is not equal to [16384] in Y at i:0.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.h:169)
[operator < elementwise_add > error]

结论：只能设置单一ratios, 问题：数据集anchor分布不均衡，可能多个ratios更好拟合以达到更高mAP, 该如何设置呢？

yumeng88 · 2021-08-05T11:46:12Z

我遇到和你相同的问题，单卡训练，边训练边评估，训练的mAP也只有3.5左右，如果是训练完成在评估(我测试了好几个数据集，其中有一个的结果可以达到4.1，其余的全是0)，目前还没有解决这个问题。

lyuwenyu · 2021-08-06T08:30:27Z

用develop试一下吧

jackie8310 · 2021-08-07T08:34:52Z

@lyuwenyu , 是用的最新develop分支，单卡train，LR：0.00125，epoch:36，LOSS有下降，推荐spine数据集和自己的数据集mAP(0.50, 11point) = 4.72%，这是最高的情况。

lyuwenyu added question Further information is requested help wanted Extra attention is needed labels Aug 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s2anet问题： #3870

s2anet问题： #3870

jackie8310 commented Aug 3, 2021

lyuwenyu commented Aug 4, 2021

lyuwenyu commented Aug 4, 2021

jackie8310 commented Aug 4, 2021

yumeng88 commented Aug 5, 2021

lyuwenyu commented Aug 6, 2021

jackie8310 commented Aug 7, 2021

s2anet问题： #3870

s2anet问题： #3870

Comments

jackie8310 commented Aug 3, 2021

for 4 card

lyuwenyu commented Aug 4, 2021

lyuwenyu commented Aug 4, 2021

jackie8310 commented Aug 4, 2021

yumeng88 commented Aug 5, 2021

lyuwenyu commented Aug 6, 2021

jackie8310 commented Aug 7, 2021