-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Face detection Validation #1514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please note the error log. This model needs many GPU memory. From the log, there is no enough GPU memory on your GPU card. Maybe you can paste |
It says : Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Actually, no matter how I adjust the value, the same problem occurs. And I have 12G gpu (Taitan xp), it seems enough for it. And where can I change the image like u said: "You also can try a small image for a testing." I changed the " image_shape = [3, 1024, 1024] " in 302 line of widerface_eval.py. It doesn't work. |
Yeah, 12G gpu is enough for one testing. Please make sure there is no job runing before you testing.
The image shape is depended on real input image, not this setting in the |
@abcdvzz Is there any progress ? |
No, Could u pls tell me where I can resize the input image if It's not : image_shape = [3, 1024, 1024] " in 302 line of widerface_eval.py? |
|
I met the same question, could you pls tell me how to deal with it? I have 4 GPU of 1080ti, but still met this problem. |
I cannot solve it. And I have to use another version... |
any one solved the problem? I met the same problem,thanks a lot |
I met the same problem |
@LeLiu You can try the following solution:
|
@LeLiu, another suggestion. Could you run “nvidia-smi” in your terminal to make sure the GPU device you are using has enough available space? That is, no other processes are using the GPU |
@chengduoZH @zhhsplendid
my code was working well on the CPUs but crashed while using the GPU。the following is the error log.
|
run commond: export CUDA_VISIBLE_DEVICES=0
python -u train.py --batch_size=4 --pretrained_model=vgg_ilsvrc_16_fc_reduced --data_dir=/home/users/data/WIDERFACE/ dataset in |-- wider_face_split
| |-- readme.txt
| |-- wider_face_test_filelist.txt
| |-- wider_face_test.mat
| |-- wider_face_train_bbx_gt.txt
| |-- wider_face_train.mat
| |-- wider_face_val_bbx_gt.txt
| `-- wider_face_val.mat
|-- WIDER_test
| `-- images
|-- WIDER_train
| `-- images
`-- WIDER_val
`-- images log:
And although I set |
@qingqing01 I didn't run the WIDERFACE face detection model. I ran a code written by myself and use private data, just met the same problem of this issue. sorry I didn't make it clear. With batch size 1,4,32,64,128, it all failed. Could it be some issues of GPU/CUDA configuration(but other programs use CUDA worked well)? |
@LeLiu Please note, the reader process use Python multi-process, if failed once, you must kill all the processes. You can try to run PyramidBox. Or if it's convenient, you can give me your code, I can try on my machine. |
thank you again. I think I've sloved this problem. I read the log carefully and found that a lot of memory was being used(more than 10GB). After simplifying the network and using a smaller batch size (8), the error disappeared.
But I'm still not sure if I have the same problem with @abcdvzz , because I do not understand the log very well. |
i have problem with train.py. I debug, it show error in line
and it show
and my nvidia
|
Uh oh!
There was an error while loading. Please reload this page.
When I ran the validation code, I encoutered this error. Pls help me . I raised lots of questions yesterday. Pls, or I'll be fired soon.
10:20:21.553225 13318 device_context.cc:203] Please NOTE: device: 0, CUDA Capability: 61, Driver Version: 9.2, Runtime Version: 9.0
W1210 10:20:21.553249 13318 device_context.cc:210] device: 0, cuDNN Version: 7.0.
W1210 10:20:22.738585 13318 system_allocator.cc:122] Cannot malloc 217.012 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 5e-06
W1210 10:20:22.738677 13318 legacy_allocator.cc:161] Cannot allocate 217.011719MB in GPU 0, available 201.375000MB
W1210 10:20:22.738684 13318 legacy_allocator.cc:164] total 12787122176
W1210 10:20:22.738692 13318 legacy_allocator.cc:165] GpuMinChunkSize 256.000000B
W1210 10:20:22.738700 13318 legacy_allocator.cc:168] GpuMaxChunkSize 59.314453kB
W1210 10:20:22.738708 13318 legacy_allocator.cc:171] GPU memory used: 902.250000kB
Traceback (most recent call last):
File "widerface_eval.py", line 317, in
infer(args, config)
File "widerface_eval.py", line 63, in infer
[det2, det3] = multi_scale_test(image, max_shrink)
File "widerface_eval.py", line 203, in multi_scale_test
det_b = detect_face(image, bt)
File "widerface_eval.py", line 121, in detect_face
return_numpy=False)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 472, in run
self.executor.run(program.desc, scope, 0, True, True)
RuntimeError: parallel_for failed: out of memory
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
what(): cudaFree{Host} failed in GPUAllocator::Free.: an illegal memory access was encountered at [/paddle/paddle/fluid/memory/detail/system_allocator.cc:150]
PaddlePaddle Call Stacks:
0 0x7fa26295ce86p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486
1 0x7fa2641fda0ap paddle::memory::detail::GPUAllocator::Free(void*, unsigned long, unsigned long) + 266
2 0x7fa2641fb922p paddle::memory::detail::BuddyAllocator::Free(void*) + 1122
3 0x7fa2641f78a5p paddle::memory::allocation::LegacyAllocator::Free(paddle::memory::allocation::Allocation*) + 69
4 0x7fa262960949p std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 57
5 0x7fa262961cfdp paddle::framework::Variable::PlaceholderImplpaddle::framework::LoDTensor::~PlaceholderImpl() + 61
6 0x7fa26419999dp paddle::framework::Scope::~Scope() + 141
7 0x7fa2641998a1p paddle::framework::Scope::DropKids() + 81
8 0x7fa26419992dp paddle::framework::Scope::~Scope() + 29
9 0x7fa26295a80ap
*** Aborted at 1544408422 (unix time) try "date -d @1544408422" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGABRT (@0x3e800003406) received by PID 13318 (TID 0x7fa2b30c2700) from PID 13318; stack trace: ***
@ 0x7fa2b2cb9390 (unknown)
@ 0x7fa2b2913428 gsignal
@ 0x7fa2b291502a abort
@ 0x7fa2a891884d __gnu_cxx::__verbose_terminate_handler()
@ 0x7fa2a89166b6 (unknown)
@ 0x7fa2a89156a9 (unknown)
@ 0x7fa2a8916005 __gxx_personality_v0
@ 0x7fa2a8e37f83 (unknown)
@ 0x7fa2a8e38487 _Unwind_Resume
@ 0x7fa2641fbc75 paddle::memory::detail::BuddyAllocator::Free()
@ 0x7fa2641f78a5 paddle::memory::allocation::LegacyAllocator::Free()
@ 0x7fa262960949 std::_Sp_counted_base<>::_M_release()
@ 0x7fa262961cfd paddle::framework::Variable::PlaceholderImpl<>::~PlaceholderImpl()
@ 0x7fa26419999d paddle::framework::Scope::~Scope()
@ 0x7fa2641998a1 paddle::framework::Scope::DropKids()
@ 0x7fa26419992d paddle::framework::Scope::~Scope()
The text was updated successfully, but these errors were encountered: