Skip to content

two pitch? #40

@a897456

Description

@a897456

the first pitch in the sample() as follow:

duration, pitch = self.duration_pitch(phoneme_enc, prompt_enc)
pitch = rearrange(pitch, 'b n -> b 1 n')

the second pitch in the forward() of Naturalspeech2 as follow:

if not exists(pitch):
assert exists(audio) and audio.ndim == 2
assert exists(self.target_sample_hz)
if self.calc_pitch_with_pyworld:
pitch = compute_pitch_pyworld(
audio,
sample_rate = self.target_sample_hz,
hop_length = self.mel_hop_length
)
else:
pitch = compute_pitch_pytorch(audio, self.target_sample_hz)
pitch = rearrange(pitch, 'b n -> b 1 n')

  1. Personally, I think the first pitch is from the prompt, and the second pitch is from the training data, right?
  2. Personally, I think the prompt is a small part of the training data, such as the training data is10s, from which prompt takes 2s, right?
  3. Because the input format of the prompt and the training data is the same, why are the calculation methods of pitch different?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions