This version changed the default behavior of IO by using pin_memory, waving the need of reading
the whole image stack into memory.
This version also introduces gradient accumulation, enable training on gpus with smaller amount of memories.
The recommended training script in this version is train_multi, whose usage is compatible with train_cv in previous version.
train_multi also supports multi-body dynamics~