|
4 | 4 | - PHENICX |
5 | 5 | - URMP |
6 | 6 |
|
7 | | -它们的共性是都比较小,方便下载,适合用于评估。 |
| 7 | +它们的共性是都比较小,方便下载,多音色,适合用于评估;且均为显示演奏的录制,结果更令人信服。 |
8 | 8 |
|
9 | 9 | 首先要对数据集进行处理:音频要转换为22050Hz、标注要转化为统一格式(选择了mid,之所以不用数据集中的midi是因为那是原曲的midi而不是演奏的midi。演奏的标注一般以表格的形式给出),每个数据集的处理分别在: |
10 | 10 | - [bach10.ipynb](bach10.ipynb) |
11 | 11 | - [phenicx.ipynb](phenicx.ipynb) |
12 | 12 | - [urmp.ipynb](urmp.ipynb) |
13 | 13 |
|
14 | | -然后记录模型的运行结果,后面调阈值二值化的时候就不需要反复计算了。最后进行模型的帧级评估,并用细分的方式得到最佳阈值,为后续的音符创建提供参考。对于“音色无关转录”,帧级评估非常简单,见[eval_basicamt.ipynb](eval_basicamt.ipynb)。 |
| 14 | +然后记录模型的运行结果,后面调阈值二值化的时候就不需要反复计算了。最后进行模型的帧级评估,并用细分的方式得到最佳阈值,为后续的音符创建提供参考。对于“音色无关转录”,帧级评估非常简单,见[eval_basicamt.ipynb](eval_basicamt.ipynb)。对于“音色分离转录”,需要用PIT进行匹配,即选择得分最高的排列;而“得分最高”可以用最小误差代替,见[eval_septimbre.ipynb](eval_septimbre.ipynb)。 |
15 | 15 |
|
16 | | -评估使用了库mir_eval。我安装了mirdata库,它的依赖包含了mir_eval。注意根据[mirdata:issue627](https://github.com/mir-dataset-loaders/mirdata/issues/627)所言,需要先去jams的github库下载源码安装最新jams,然后再安装mirdata(不过好像JAMS已经被mirdata新的pull request删除了?也许从最新的源码上安装会好很多)。 |
| 16 | +评估使用了库`mir_eval`。我安装了`mirdata`库,它的依赖包含了`mir_eval`。注意根据[mirdata:issue627](https://github.com/mir-dataset-loaders/mirdata/issues/627)所言,需要先去`jams`的github库下载源码安装最新`jams`,然后再安装`mirdata`(不过好像`jams`已经被`mirdata`新的`pull request`删除了?也许从最新的源码上安装会好很多)。 |
| 17 | + |
| 18 | +仅仅进行了帧级评估,因为音符化引入了新参数,会影响对模型能力的判断。 |
| 19 | + |
| 20 | +## 结果 |
| 21 | +### 音色无关转录 |
| 22 | +- **basicamt**: 我的“音色无关转录”模型 |
| 23 | +- **basicpitch**: 使用我的数据集训练的basicpitch模型 |
| 24 | +- **basicpitch_raw**: 其论文中使用的模型,使用pip安装 |
| 25 | + |
| 26 | +<table border="2"> |
| 27 | + <caption>音色无关转录模型评估结果</caption> |
| 28 | + <tr> |
| 29 | + <th>数据集</th> <th>指标</th> <th>basicpitch</th> <th>basicamt</th> <th>basicpitch_raw</th> |
| 30 | + </tr> |
| 31 | + <tr> <td rowspan="5">BACH10<br>合奏</td> |
| 32 | + <td>阈值</td> |
| 33 | + <td>0.212</td> <td>0.15312</td> <td>0.356</td> |
| 34 | + </tr> |
| 35 | + <tr> |
| 36 | + <td>Acc</td> |
| 37 | + <td>0.64909</td> <td>0.67135</td> <td>0.68033</td> |
| 38 | + </tr> |
| 39 | + <tr> |
| 40 | + <td>P</td> |
| 41 | + <td>0.77487</td> <td>0.80869</td> <td>0.81744</td> |
| 42 | + </tr> |
| 43 | + <tr> |
| 44 | + <td>R</td> |
| 45 | + <td>0.79957</td> <td>0.79789</td> <td>0.80097</td> |
| 46 | + </tr> |
| 47 | + <tr> |
| 48 | + <td>F1</td> |
| 49 | + <td>0.78689</td> <td>0.80291</td> <td><strong>0.80916</strong></td> |
| 50 | + </tr> |
| 51 | + <tr> <td rowspan="5">BACH10<br>所有</td> |
| 52 | + <td>阈值</td> |
| 53 | + <td>0.385</td> <td>0.30184</td> <td>0.47428</td> |
| 54 | + </tr> |
| 55 | + <tr> |
| 56 | + <td>Acc</td> |
| 57 | + <td>0.79255</td> <td>0.76612</td> <td>0.71582</td> |
| 58 | + </tr> |
| 59 | + <tr> |
| 60 | + <td>P</td> |
| 61 | + <td>0.88961</td> <td>0.87571</td> <td>0.83161</td> |
| 62 | + </tr> |
| 63 | + <tr> |
| 64 | + <td>R</td> |
| 65 | + <td>0.87691</td> <td>0.85777</td> <td>0.83598</td> |
| 66 | + </tr> |
| 67 | + <tr> |
| 68 | + <td>F1</td> |
| 69 | + <td><strong>0.87913</strong></td> <td>0.86087</td> <td>0.83157</td> |
| 70 | + </tr> |
| 71 | + <tr> <td rowspan="5">PHENICX<br>合奏</td> |
| 72 | + <td>阈值</td> |
| 73 | + <td>0.13912</td> <td>0.06464</td> <td>0.2032</td> |
| 74 | + </tr> |
| 75 | + <tr> |
| 76 | + <td>Acc</td> |
| 77 | + <td>0.33753</td> <td>0.42476</td> <td>0.26585</td> |
| 78 | + </tr> |
| 79 | + <tr> |
| 80 | + <td>P</td> |
| 81 | + <td>0.52686</td> <td>0.58628</td> <td>0.37882</td> |
| 82 | + </tr> |
| 83 | + <tr> |
| 84 | + <td>R</td> |
| 85 | + <td>0.48936</td> <td>0.60524</td> <td>0.46700</td> |
| 86 | + </tr> |
| 87 | + <tr> |
| 88 | + <td>F1</td> |
| 89 | + <td>0.50307</td> <td><strong>0.59512</strong></td> <td>0.41823</td> |
| 90 | + </tr> |
| 91 | + <tr> <td rowspan="5">URMP<br>合奏</td> |
| 92 | + <td>阈值</td> |
| 93 | + <td>0.19984</td> <td>0.13840</td> <td>0.30640</td> |
| 94 | + </tr> |
| 95 | + <tr> |
| 96 | + <td>Acc</td> |
| 97 | + <td>0.52076</td> <td>0.57102</td> <td>0.36206</td> |
| 98 | + </tr> |
| 99 | + <tr> |
| 100 | + <td>P</td> |
| 101 | + <td>0.68602</td> <td>0.74857</td> <td>0.49911</td> |
| 102 | + </tr> |
| 103 | + <tr> |
| 104 | + <td>R</td> |
| 105 | + <td>0.68401</td> <td>0.70437</td> <td>0.53963</td> |
| 106 | + </tr> |
| 107 | + <tr> |
| 108 | + <td>F1</td> |
| 109 | + <td>0.68058</td> <td><strong>0.72323</strong></td> <td>0.51706</td> |
| 110 | + </tr> |
| 111 | + <tr> <td rowspan="5">URMP<br>独奏</td> |
| 112 | + <td>阈值</td> <td>0.3848</td> <td>0.35183</td> <td>0.49</td> |
| 113 | + </tr> |
| 114 | + <tr> |
| 115 | + <td>Acc</td> |
| 116 | + <td>0.67248</td> <td>0.68630</td> <td>0.40660</td> |
| 117 | + </tr> |
| 118 | + <tr> |
| 119 | + <td>P</td> |
| 120 | + <td>0.82921</td> <td>0.85788</td> <td>0.54802</td> |
| 121 | + </tr> |
| 122 | + <tr> |
| 123 | + <td>R</td> |
| 124 | + <td>0.77676</td> <td>0.76998</td> <td>0.56436</td> |
| 125 | + </tr> |
| 126 | + <tr> |
| 127 | + <td>F1</td> |
| 128 | + <td>0.79619</td> <td><strong>0.80552</strong></td> <td>0.55520</td> |
| 129 | + </tr> |
| 130 | + <tr> |
| 131 | + <td>参数量</td> |
| 132 | + <td>CQT<br>19944</td><td>56517<br>不含CQT</td> <td>46564<br>含CQT</td> |
| 133 | + <td>27518<br>不含CQT</td> |
| 134 | + </tr> |
| 135 | +</table> |
| 136 | + |
| 137 | +> 注:`basicpitch_raw`的参数量按照论文给出的图,用pytorch搭建并计算。两个`basicpitch`的实际参数量应该加上CQT参数,但由于其未训练CQT参数,故表中没有加。 |
| 138 | +
|
| 139 | +在`basicamt`和`basicpitch`的比较中,可以发现我的模型更小但更强;在`basicpitch`和`basicpitch_raw`的比较中,可以发现我的数数据集竟然更好?此外`basicpitch`的阈值普遍高于`basicamt`,这是损失函数选择不同带来的影响。 |
| 140 | + |
| 141 | +### 音色分离转录 |
| 142 | +- septimbre: 我的“音色分离转录”模型 |
| 143 | + |
| 144 | +暂时找不到可以比较的同类开源模型。帧级评估如下: |
| 145 | +<table border="2"> |
| 146 | + <caption>音色分离转录模型septimbre评估结果</caption> |
| 147 | + <thead> |
| 148 | + <tr> |
| 149 | + <th rowspan="1">类型</th> |
| 150 | + <th colspan="4">音色无关转录</th> |
| 151 | + <th colspan="4">音色分离转录</th> |
| 152 | + </tr> |
| 153 | + <tr> |
| 154 | + <th>混合数</th> |
| 155 | + <th>Acc</th> |
| 156 | + <th>P</th> |
| 157 | + <th>R</th> |
| 158 | + <th>F1</th> |
| 159 | + <th>Acc</th> |
| 160 | + <th>P</th> |
| 161 | + <th>R</th> |
| 162 | + <th>F1</th> |
| 163 | + </tr> |
| 164 | + </thead> |
| 165 | + <tbody> |
| 166 | + <tr> |
| 167 | + <td>2</td> |
| 168 | + <td>0.680</td> |
| 169 | + <td>0.823</td> |
| 170 | + <td>0.789</td> |
| 171 | + <td>0.804</td> |
| 172 | + <td>0.419</td> |
| 173 | + <td>0.586</td> |
| 174 | + <td>0.558</td> |
| 175 | + <td>0.563</td> |
| 176 | + </tr> |
| 177 | + <tr> |
| 178 | + <td>3</td> |
| 179 | + <td>0.625</td> |
| 180 | + <td>0.786</td> |
| 181 | + <td>0.749</td> |
| 182 | + <td>0.766</td> |
| 183 | + <td>0.270</td> |
| 184 | + <td>0.436</td> |
| 185 | + <td>0.402</td> |
| 186 | + <td>0.407</td> |
| 187 | + </tr> |
| 188 | + </tbody> |
| 189 | +</table> |
| 190 | + |
| 191 | +可以发现混合数越多效果越差,且进行分离后准确率大打折扣,这非常正常;但即使不进行分离,也比不过basicamt,说明聚类损失影响了amt损失。 |
| 192 | + |
| 193 | +## 文件结构 |
| 194 | +``` |
| 195 | +│ eval_basicamt.ipynb [eval my timbre-independent transcription model] |
| 196 | +│ eval_basicpitch.ipynb [eval basic-pitch model trained with my data] |
| 197 | +│ eval_basicpitch_raw.ipynb[eval basic-pitch model trained by its author] |
| 198 | +│ eval_septimbre.ipynb [eval my timbre-separation transcription model] |
| 199 | +| |
| 200 | +│ bach10.ipynb [pre-process of BACH10 dataset] |
| 201 | +│ phenicx.ipynb [pre-process of PHENICX dataset] |
| 202 | +│ urmp.ipynb [pre-process of URMP dataset] |
| 203 | +| |
| 204 | +│ README.md [this file] |
| 205 | +│ |
| 206 | +├─basicamt [from ./eval_basicamt.ipynb] |
| 207 | +│ ├─BACH10_eval |
| 208 | +│ │ 01-AchGottundHerr@0.npy |
| 209 | +│ │ ... |
| 210 | +│ │ 10-NunBitten@4.npy |
| 211 | +│ │ |
| 212 | +│ ├─PHENICX_eval |
| 213 | +│ │ ... |
| 214 | +│ │ mozart.npy |
| 215 | +│ │ |
| 216 | +│ └─URMP_eval |
| 217 | +│ 01_Jupiter_vn_vc@0.npy |
| 218 | +│ ... |
| 219 | +│ 44_K515_vn_vn_va_va_vc@5.npy |
| 220 | +│ |
| 221 | +├─basicpitch [from ./eval_basicpitch.ipynb] |
| 222 | +| ... (same with ./basicamt) |
| 223 | +│ |
| 224 | +├─basicpitch_raw [from ./eval_basicpitch_raw] |
| 225 | +| ... (same with ./basicamt) |
| 226 | +│ |
| 227 | +├─septimbre [from ./eval_septimbre.ipynb] |
| 228 | +│ └─BACH10_eval |
| 229 | +│ ├─01-AchGottundHerr_1&2 |
| 230 | +│ │ emb.npy |
| 231 | +│ │ midi.npy |
| 232 | +│ │ note.npy |
| 233 | +| ... |
| 234 | +│ │ |
| 235 | +│ └─10-NunBitten_3&4 |
| 236 | +│ ... |
| 237 | +│ |
| 238 | +├─BACH10_processed [from ./bach10.ipynb] |
| 239 | +│ ├─01-AchGottundHerr@0 |
| 240 | +│ │ 01-AchGottundHerr.mid |
| 241 | +│ │ 01-AchGottundHerr.npy |
| 242 | +│ │ 01-AchGottundHerr.wav |
| 243 | +| ... |
| 244 | +│ │ |
| 245 | +│ └─10-NunBitten@4 |
| 246 | +│ ... |
| 247 | +│ |
| 248 | +├─PHENICX_processed [from ./phenicx.ipynb] |
| 249 | +│ ├─beethoven |
| 250 | +│ │ beethoven.mid |
| 251 | +│ │ beethoven.npy |
| 252 | +│ │ beethoven.wav |
| 253 | +| ... |
| 254 | +| | |
| 255 | +│ └─mozart |
| 256 | +│ ... |
| 257 | +│ |
| 258 | +└─URMP_processed [from ./urmp.ipynb] |
| 259 | + ├─01_Jupiter_vn_vc@0 |
| 260 | + │ 01_Jupiter_vn_vc.mid |
| 261 | + │ 01_Jupiter_vn_vc.npy |
| 262 | + │ 01_Jupiter_vn_vc.wav |
| 263 | + ... |
| 264 | + │ |
| 265 | + └─44_K515_vn_vn_va_va_vc@5 |
| 266 | + 5_vc_44_K515.mid |
| 267 | + 5_vc_44_K515.npy |
| 268 | + 5_vc_44_K515.wav |
| 269 | +``` |
0 commit comments