two pitch?

the first pitch in the sample() as follow:
https://github.com/lucidrains/naturalspeech2-pytorch/blob/659bec7f7543e7747e809e950cc2f84242fbeec7/naturalspeech2_pytorch/naturalspeech2_pytorch.py#L1478-L1479

the second pitch in the forward() of Naturalspeech2 as follow:
https://github.com/lucidrains/naturalspeech2-pytorch/blob/659bec7f7543e7747e809e950cc2f84242fbeec7/naturalspeech2_pytorch/naturalspeech2_pytorch.py#L1543-L1556

1. Personally, I think the first pitch is from the prompt, and the second pitch is from the training data, right?
2. Personally, I think the prompt is a small part of the training data, such as the training data is10s, from which prompt takes 2s, right?
3. Because the input format of the prompt and the training data is the same, why are the calculation methods of pitch different?

	if not exists(pitch):
	assert exists(audio) and audio.ndim == 2
	assert exists(self.target_sample_hz)

	if self.calc_pitch_with_pyworld:
	pitch = compute_pitch_pyworld(
	audio,
	sample_rate = self.target_sample_hz,
	hop_length = self.mel_hop_length
	)
	else:
	pitch = compute_pitch_pytorch(audio, self.target_sample_hz)

	pitch = rearrange(pitch, 'b n -> b 1 n')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

two pitch? #40

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	duration, pitch = self.duration_pitch(phoneme_enc, prompt_enc)
	pitch = rearrange(pitch, 'b n -> b 1 n')

two pitch? #40

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions