A COMPREHENSIVE ANALYSIS OF ARTIFICIAL SPEECH SYNTHESIS TECHNOLOGIES, THEIR CLASSIFICATION, AND EXISTING APPROACHES AND ALGORITHMS FOR DISTINGUISHING HUMAN SPEECH FROM SYNTHETIC SPEECH

Authors

  • S.U.Nasirov Tashkent University of Information Technologies named after Muhammad al-Khorezmi

Keywords:

Speech synthesis, Text-to-Speech, Zero-shot TTS, Wolfgang von Kempelen, Neural audio codec, Assistive communication

Abstract

This article traces the history of speech synthesis from Wolfgang von Kempelen’s 18th-century mechanical speech device - often overshadowed by his deceptive “Mechanical Turk” -to modern zero-shot text-to-speech (TTS) systems. Unlike early mechanical synthesizers that required manual manipulation, contemporary TTS uses large-scale, multi-speaker datasets, neural audio codecs, and language models to generate speech in unseen voices from short reference samples. The article also highlights key applications of TTS, including assistive communication (e.g., Stephen Hawking), hands-free reading, and human-computer interaction.

Downloads

Published

2026-06-14