MY KOLKATA EDUGRAPH
ADVERTISEMENT
regular-article-logo Sunday, 05 May 2024

AI that simulates voices with threesecond samples

The technology first analyses how a person sounds and then breaks that information into discrete components

Mathures Paul Published 12.01.23, 02:50 PM

At a time when the world is waking up to ChatGPT, the power of AI is getting stronger on the text-to-speech front. Microsoft researchers have announced a new AI model called VALL-E that can simulate a person’s voice when given a three-second audio sample. This comes days after Apple quietly launched (not in India) a feature called “digital narration”. The feature is for its Books app and it will let users listen to written titles as audiobooks, narrated by artificial intelligence. It’s a feature that may upend the fast-growing audiobook market.

VALL-E generates discrete audio codec codes from text and acoustic prompts. The technology first analyses how a person sounds and then breaks that information into discrete components and uses training data to match what it “knows” about how that voice would sound beyond the three-second sample.

ADVERTISEMENT

The upside of the technology is better text-to-speech accessibility for people with visual impairment. But what about a future where podcasts are streamed with guests that are AI models?

To keep things under check, VALL-E codes haven’t been provided in the public space for people to experiment.

Follow us on:
ADVERTISEMENT