OpenAI’s voice cloning AI model only needs a 15-second sample to work

3 months ago 6
A rendition of OpenAI’s logo, which looks similar  a stylized whirlpool. Illustration: The Verge

OpenAI is offering constricted entree to a text-to-voice procreation level it developed called Voice Engine, which tin make a synthetic dependable based connected a 15-second clip of someone’s voice. The AI-generated dependable tin work retired substance prompts connected bid successful the aforesaid connection arsenic the talker oregon successful a fig of different languages. “These tiny standard deployments are helping to pass our approach, safeguards, and reasoning astir however Voice Engine could beryllium utilized for bully crossed assorted industries,” OpenAI said successful its blog post.

Companies with entree see the acquisition exertion institution Age of Learning, ocular storytelling level HeyGen, frontline wellness bundle shaper Dimagi, AI connection app creator Livox, and wellness strategy Lifespan.

In these samples posted by OpenAI, you tin perceive what Age of Learning has been doing with the exertion to make pre-scripted voice-over content, arsenic good arsenic speechmaking retired “real-time, personalized responses” to students written by GPT-4.

First, the notation audio successful English:

And present are 3 AI-generated audio clips based connected that sample,

OpenAI said it began processing Voice Engine successful precocious 2022 and that the exertion has already powered preset voices for the text-to-speech API and ChatGPT’s Read Aloud feature. In an interrogation with TechCrunch, Jeff Harris, a subordinate of OpenAI’s merchandise squad for Voice Engine, said the exemplary was trained connected “a premix of licensed and publically disposable data.” OpenAI told the work the exemplary volition lone beryllium disposable to astir 10 developers.

AI text-to-audio procreation is an country of generative AI that’s continuing to evolve. While astir absorption connected instrumental oregon earthy sounds, less person focused connected dependable generation, partially owed to the questions OpenAI cited. Some names successful the abstraction see companies similar Podcastle and ElevenLabs, which supply AI dependable cloning exertion and tools the Vergecast explored past year.

At the aforesaid time, the US authorities is trying to curb unethical uses of AI dependable technology. Last month, the Federal Communications Commission banned robocalls utilizing AI voices aft radical received spam calls from an AI-cloned dependable of President Joe Biden.

According to OpenAI, its partners agreed to abide by its usage policies that accidental they volition not usage Voice Generation to impersonate radical oregon organizations without their consent. It besides requires the partners to get the “explicit and informed consent” of the archetypal speaker, not physique ways for idiosyncratic users to make their ain voices, and to disclose to listeners that the voices are AI-generated. OpenAI besides added watermarking to the audio clips to hint their root and actively show however the audio is used.

OpenAI suggested respective steps that it thinks could bounds the risks astir tools similar these, including phasing retired voice-based authentication to entree slope accounts, policies to support the usage of people’s voices successful AI, greater acquisition connected AI deepfakes, and improvement of tracking systems of AI content.

Read Entire Article