Skip to content

Commit 45fdd38

Browse files
[Fea] Support tensorboardX and add corresponding guidance (#812)
* support tensorboardX for viv as demo and add tensorboardX guide in user_guide.md * fix comma
1 parent 7c6f6aa commit 45fdd38

File tree

8 files changed

+97
-30
lines changed

8 files changed

+97
-30
lines changed

docs/zh/examples/viv.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -130,9 +130,9 @@ examples/fsi/viv.py:53:54
130130

131131
接下来我们需要指定训练轮数和学习率,此处我们按实验经验,使用 10000 轮训练轮数,并每隔 10000 个epochs评估一次模型精度。
132132

133-
``` yaml linenums="41"
133+
``` yaml linenums="42"
134134
--8<--
135-
examples/fsi/conf/viv.yaml:41:56
135+
examples/fsi/conf/viv.yaml:42:57
136136
--8<--
137137
```
138138

docs/zh/user_guide.md

+52-8
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,7 @@ pip install paddle2onnx
274274
[Paddle2ONNX] Start to parsing Paddle model...
275275
[Paddle2ONNX] Use opset_version = 13 for ONNX export.
276276
[Paddle2ONNX] PaddlePaddle model is exported as ONNX format now.
277-
[2024/03/02 05:47:51] ppsci MESSAGE: ONNX model has been exported to: ./inference/aneurysm.onnx
277+
ppsci MESSAGE: ONNX model has been exported to: ./inference/aneurysm.onnx
278278
```
279279
280280
### 1.3 模型推理预测
@@ -410,6 +410,9 @@ PaddleScience 提供了多种推理配置组合,可通过命令行进行组合
410410
3. 运行 `aneurysm.py` 的推理功能,同时指定推理引擎为 TensorRT。
411411
412412
``` sh
413+
# 运行前需设置指定GPU,否则可能无法启动 TensorRT
414+
export CUDA_VISIBLE_DEVICES=0
415+
413416
python aneurysm.py mode=infer \
414417
INFER.device=gpu \
415418
INFER.engine=tensorrt \
@@ -556,7 +559,47 @@ solver = ppsci.solver.Solver(
556559
solver.eval()
557560
```
558561
559-
### 1.7 使用 VisualDL 记录实验
562+
### 1.7 实验过程可视化
563+
564+
#### 1.7.1 TensorBoardX
565+
566+
[TensorBoardX](https://github.com/lanpa/tensorboardX) 是基于 TensorBoard 编写可视化分析工具,以丰富的图表呈现训练参数变化趋势、数据样本、模型结构、PR曲线、ROC曲线、高维数据分布等。帮助用户清晰直观地理解深度学习模型训练过程及模型结构,进而实现高效的模型调优。
567+
568+
PaddleScience 支持使用 TensorBoardX 记录训练过程中的基础实验数据,包括 train/eval loss,eval metric,learning rate 等基本信息,可按如下步骤使用该功能。
569+
570+
1. 安装 Tensorboard 和 TensorBoardX
571+
572+
``` sh
573+
pip install tensorboard tensorboardX
574+
```
575+
576+
2. 在案例代码的 `Solver` 实例化时指定 `use_tbd=True`,然后再启动案例训练
577+
578+
``` py hl_lines="3"
579+
solver = ppsci.solver.Solver(
580+
...,
581+
use_tbd=True,
582+
)
583+
```
584+
585+
3. 可视化记录数据
586+
587+
根据上述步骤,在训练时 TensorBoardX 会自动记录数据并保存到 `${solver.output_dir}/tensorboard` 目录下,具体所在路径在实例化 `Solver` 时,会自动打印在终端中,如下所示。
588+
589+
``` log hl_lines="3" hl_lines="2"
590+
ppsci MESSAGE: TensorboardX tool is enabled for logging, you can view it by running:
591+
tensorboard --logdir outputs_VIV/2024-01-01/08-00-00/tensorboard
592+
```
593+
594+
!!! tip
595+
596+
也可以输入 `tensorboard --logdir ./outputs_VIV`,一次性在网页上展示 `outputs_VIV` 目录下所有训练记录,便于对比。
597+
598+
在终端里输入上述可视化命令,并用浏览器进入 TensorBoardX 给出的可视化地址,即可在浏览器内查看记录的数据,如下图所示。
599+
600+
![tensorboardx_preview](https://paddle-org.bj.bcebos.com/paddlescience/docs/user_guide/tensorboardx_preview.JPG)
601+
602+
#### 1.7.2 VisualDL
560603
561604
[VisualDL](https://www.paddlepaddle.org.cn/paddle/visualdl) 是飞桨推出的可视化分析工具,以丰富的图表呈现训练参数变化趋势、数据样本、模型结构、PR曲线、ROC曲线、高维数据分布等。帮助用户清晰直观地理解深度学习模型训练过程及模型结构,进而实现高效的模型调优。
562605
@@ -568,30 +611,31 @@ PaddleScience 支持使用 VisualDL 记录训练过程中的基础实验数据
568611
pip install -U visualdl
569612
```
570613
571-
2. 在案例代码的 `Solver` 实例化时指定 `use_visualdl=True`,然后再启动案例训练
614+
2. 在案例代码的 `Solver` 实例化时指定 `use_vdl=True`,然后再启动案例训练
572615
573616
``` py hl_lines="3"
574617
solver = ppsci.solver.Solver(
575618
...,
576-
use_visualdl=True,
619+
use_vdl=True,
577620
)
578621
```
579622
580623
3. 可视化记录数据
581624
582-
根据上述步骤,在训练时 VisualDL 会自动记录数据并保存到 `${solver.output_dir}/vdl` 的目录中。`vdl` 所在路径在实例化 `Solver` 时,会自动打印在终端中,如下所示。
625+
根据上述步骤,在训练时 VisualDL 会自动记录数据并保存到 `${solver.output_dir}/vdl` 目录下,具体所在路径在实例化 `Solver` 时,会自动打印在终端中,如下所示。
583626
584-
``` log hl_lines="3"
627+
``` log hl_lines="4"
585628
Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.8, Runtime API Version: 11.6
586629
device: 0, cuDNN Version: 8.4.
587-
ppsci INFO: VisualDL tool enabled for logging, you can view it by running: 'visualdl --logdir outputs_darcy2d/2023-10-08/10-00-00/TRAIN.epochs=400/vdl --port 8080'.
630+
ppsci INFO: VisualDL tool enabled for logging, you can view it by running:
631+
visualdl --logdir outputs_darcy2d/2023-10-08/10-00-00/TRAIN.epochs=400/vdl --port 8080
588632
```
589633
590634
在终端里输入上述可视化命令,并用浏览器进入 VisualDL 给出的可视化地址,即可在浏览器内查看记录的数据,如下图所示。
591635
592636
![visualdl_record](https://paddle-org.bj.bcebos.com/paddlescience/docs/user_guide/VisualDL_preview.png)
593637
594-
### 1.8 使用 WandB 记录实验
638+
#### 1.7.3 WandB
595639
596640
[WandB](https://wandb.ai/) 是一个第三方实验记录工具,能在记录实验数据的同时将数据上传到其用户的私人账户上,防止实验记录丢失。
597641

examples/fsi/conf/viv.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ mode: train # running mode: train/eval
2727
seed: 42
2828
output_dir: ${hydra:run.dir}
2929
log_freq: 20
30+
use_tbd: false
3031

3132
VIV_DATA_PATH: "./VIV_Training_Neta100.mat"
3233

examples/fsi/viv.py

+1
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,7 @@ def train(cfg: DictConfig):
111111
lr_scheduler,
112112
cfg.TRAIN.epochs,
113113
cfg.TRAIN.iters_per_epoch,
114+
use_tbd=cfg.use_tbd,
114115
save_freq=cfg.TRAIN.save_freq,
115116
log_freq=cfg.log_freq,
116117
eval_during_train=cfg.TRAIN.eval_during_train,

ppsci/solver/printer.py

+2
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ def log_train_info(
103103
step=trainer.global_step,
104104
vdl_writer=trainer.vdl_writer,
105105
wandb_writer=trainer.wandb_writer,
106+
tbd_writer=trainer.tbd_writer,
106107
)
107108

108109

@@ -145,4 +146,5 @@ def log_eval_info(
145146
step=trainer.global_step,
146147
vdl_writer=trainer.vdl_writer,
147148
wandb_writer=trainer.wandb_writer,
149+
tbd_writer=trainer.tbd_writer,
148150
)

ppsci/solver/solver.py

+24-2
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ class Solver:
7070
seed (int, optional): Random seed. Defaults to 42.
7171
use_vdl (Optional[bool]): Whether use VisualDL to log scalars. Defaults to False.
7272
use_wandb (Optional[bool]): Whether use wandb to log data. Defaults to False.
73+
use_tbd (Optional[bool]): Whether use tensorboardX to log data. Defaults to False.
7374
wandb_config (Optional[Dict[str, str]]): Config dict of WandB. Defaults to None.
7475
device (Literal["cpu", "gpu", "xpu"], optional): Runtime device. Defaults to "gpu".
7576
equation (Optional[Dict[str, ppsci.equation.PDE]]): Equation dict. Defaults to None.
@@ -130,6 +131,7 @@ def __init__(
130131
seed: int = 42,
131132
use_vdl: bool = False,
132133
use_wandb: bool = False,
134+
use_tbd: bool = False,
133135
wandb_config: Optional[Mapping] = None,
134136
device: Literal["cpu", "gpu", "xpu"] = "gpu",
135137
equation: Optional[Dict[str, ppsci.equation.PDE]] = None,
@@ -337,8 +339,8 @@ def dist_wrapper(model: nn.Layer) -> paddle.DataParallel:
337339
if is_master:
338340
self.vdl_writer = vdl.LogWriter(osp.join(output_dir, "vdl"))
339341
logger.info(
340-
"VisualDL tool is enabled for logging, you can view it by "
341-
f"running: 'visualdl --logdir {self.vdl_writer._logdir} --port 8080'."
342+
"VisualDL is enabled for logging, you can view it by "
343+
f"running:\nvisualdl --logdir {self.vdl_writer._logdir} --port 8080"
342344
)
343345

344346
# set WandB tool
@@ -354,6 +356,25 @@ def dist_wrapper(model: nn.Layer) -> paddle.DataParallel:
354356
if is_master:
355357
self.wandb_writer = wandb.init(**wandb_config)
356358

359+
# set TensorBoardX tool
360+
self.tbd_writer = None
361+
if use_tbd:
362+
try:
363+
import tensorboardX
364+
except ModuleNotFoundError:
365+
raise ModuleNotFoundError(
366+
"Please install 'tensorboardX' with `pip install tensorboardX` first."
367+
)
368+
with misc.RankZeroOnly(self.rank) as is_master:
369+
if is_master:
370+
self.tbd_writer = tensorboardX.SummaryWriter(
371+
osp.join(output_dir, "tensorboard")
372+
)
373+
logger.message(
374+
"TensorboardX is enabled for logging, you can view it by "
375+
f"running:\ntensorboard --logdir {self.tbd_writer.logdir}"
376+
)
377+
357378
self.global_step = 0
358379

359380
# log paddlepaddle's version
@@ -462,6 +483,7 @@ def train(self) -> None:
462483
epoch_id,
463484
self.vdl_writer,
464485
self.wandb_writer,
486+
self.tbd_writer,
465487
)
466488

467489
# visualize after evaluation

ppsci/utils/logger.py

+12-2
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
if TYPE_CHECKING:
3232
import visualdl # isort:skip
3333
import wandb # isort:skip
34+
import tensorboardX as tbd
3435

3536
_logger: logging.Logger = None
3637

@@ -200,6 +201,7 @@ def scalar(
200201
step: int,
201202
vdl_writer: Optional["visualdl.LogWriter"] = None,
202203
wandb_writer: Optional["wandb.run"] = None,
204+
tbd_writer: Optional["tbd.SummaryWriter"] = None,
203205
):
204206
"""This function will add scalar data to VisualDL or WandB for plotting curve(s).
205207
@@ -210,14 +212,22 @@ def scalar(
210212
wandb_writer (wandb.run): Run object of WandB to record metrics. Defaults to None.
211213
"""
212214
if vdl_writer is not None:
213-
for name, value in metric_dict.items():
214-
vdl_writer.add_scalar(name, value, step)
215+
with misc.RankZeroOnly() as is_master:
216+
if is_master:
217+
for name, value in metric_dict.items():
218+
vdl_writer.add_scalar(name, value, step)
215219

216220
if wandb_writer is not None:
217221
with misc.RankZeroOnly() as is_master:
218222
if is_master:
219223
wandb_writer.log({"step": step, **metric_dict})
220224

225+
if tbd_writer is not None:
226+
with misc.RankZeroOnly() as is_master:
227+
if is_master:
228+
for name, value in metric_dict.items():
229+
tbd_writer.add_scalar(name, value, global_step=step)
230+
221231

222232
def advertise():
223233
"""

ppsci/utils/symbolic.py

+3-16
Original file line numberDiff line numberDiff line change
@@ -107,19 +107,6 @@
107107
}
108108

109109

110-
def _numerator_of_derivative(expr: sp.Basic) -> sp.Basic:
111-
if not isinstance(expr, sp.Derivative):
112-
raise TypeError(
113-
f"expr({expr}) should be of type sp.Derivative, but got {type(expr)}"
114-
)
115-
if len(expr.args) <= 2:
116-
if expr.args[1][1] == 1:
117-
return expr.args[0]
118-
return sp.Derivative(expr.args[0], (expr.args[1][0], expr.args[1][1] - 1))
119-
else:
120-
return sp.Derivative(*expr.args[:-1])
121-
122-
123110
def _cvt_to_key(expr: sp.Basic) -> str:
124111
"""Convert sympy expression to a string key, mainly as retrieval key in dict.
125112
@@ -585,7 +572,7 @@ def _visualize_graph(nodes: List[sp.Basic], graph_filename: str):
585572
}
586573
naming_counter = {k: 0 for k in SYMPY_BUILTIN_NAME}
587574

588-
def get_operator_name(node):
575+
def get_operator_name(node: sp.Function):
589576
ret = f"{SYMPY_BUILTIN_NAME[node.func]}_{naming_counter[node.func]}"
590577
naming_counter[node.func] += 1
591578
return ret
@@ -601,8 +588,8 @@ def add_edge(u: str, v: str, u_color: str = C_DATA, v_color: str = C_DATA):
601588
Args:
602589
u (str): Name of begin node u.
603590
v (str): Name of end node v.
604-
u_color (str, optional): _description_. Defaults to C_DATA.
605-
v_color (str, optional): _description_. Defaults to C_DATA.
591+
u_color (str, optional): Color of node u. Defaults to '#feb64d'.
592+
v_color (str, optional): Color of node v. Defaults to '#feb64d'.
606593
"""
607594
graph.add_node(u, style="filled", shape="ellipse", color=u_color)
608595
graph.add_node(v, style="filled", shape="ellipse", color=v_color)

0 commit comments

Comments
 (0)