SSML

Speech Synthesis Markup Language (SSML) is an XML-based standard that provides fine-grained control over how synthesised speech sounds. SSML tags specify pauses, emphasis, speaking rate, pitch variations, pronunciation of abbreviations or numbers, and the insertion of pre-recorded audio clips. For example, SSML can instruct a TTS engine to spell out an account number digit by digit rather than reading it as a whole number, or to pause before delivering important information. SSML is essential for producing professional-quality, brand-appropriate voice interactions. NiCE Cognigy Voice Gateway supports full SSML, enabling precise voice design across all supported TTS engines.

For enterprise teams, SSML matters because real-world outcomes depend on how the capability is integrated, governed, and measured — not just on the underlying technology. For example, SSML can instruct a TTS engine to spell out an account number digit by digit rather than reading it as a whole number, or to pause before delivering important information.

Key Points

  • XML-based markup language for precise control of synthesised speech behaviour
  • Controls pauses, emphasis, speed, pitch, pronunciation, and audio insertion
  • Enables natural-sounding number and abbreviation reading in voice conversations
  • Essential for professional, brand-consistent voice interaction design
  • Fully supported in NiCE Cognigy Voice Gateway across all TTS engine integrations