The software has become advanced enough to generate both masculine and feminine voices. Today, neural networks are widely applied to accessibility tools, virtual assistants, customer service, audiobooks, and more. The development of formant synthesis in the mid-20th century helped simulate human vocal tracts, but the end results were still quite robotic.įast forward to the 21st century, with the discovery of neural networks, advanced systems were developed that trained on a large number of datasets of human speech, resulting in human-sounding voices. Wolfgang von Kempelen’s “acoustic-mechanical speech machine” from the 18th century may have significantly contributed to text-to-speech technology, but it was far from a realistic text-to-speech voice tool. What is a realistic voice for text-to-speech Since then, the technology has evolved to produce realistic voices for text-to-speech. However, the results were quite robotic-sounding.
In the beginning days of text-to-speech (TTS), mechanical devices such as tubes and chambers were used to emulate human voices.