Using text prompts to control rhythm features in singing voice synthesis (SVS) offers a convenient method for non-professional musicians to generate target voice. However, due to the diversity and ...
Abstract: We propose a novel description-based controllable text-to-speech (TTS) method with cross-lingual control capability. To address the lack of audio-description paired data in the target ...