IndexTTS2 Thai LoRA

Zero-Shot Voice Cloning for Thai Speech

About

This page demonstrates the voice cloning quality of our Thai LoRA fine-tune of IndexTTS2. The model was trained on Thai speech data and supports zero-shot voice cloning for Thai text, while retaining the original model's English and Chinese capabilities.

For each speaker, we show:

  • Reference — the original voice clip used as the speaker prompt (TH001.wav)
  • Same Text — TTS generation of the reference clip's transcription (reconstruction test)
  • Different Text — TTS generation of a fixed Thai sentence: "ปัญญาประดิษฐ์กำลังเปลี่ยนแปลงวิถีชีวิตและการทำงานของเรา"
  • EN Generated — English TTS using the Thai voice reference (cross-lingual test, bilingual speakers only)
  • EN Groundtruth — the speaker's actual English recording for comparison

All generated audio uses seed=42 for reproducibility.

Thai-Only Speakers (1–12)

Speaker Gender Reference Same Text Different Text

Bilingual Speakers (13–24)

Speaker Gender Reference Same Text Different Text EN Generated EN Groundtruth