🚀 Infinite Custom Voices, AI Music, & Sound effects with the ElevenLabs API🚀
🎙️ Infinite AI Audio: Custom Voices, Music, & Sound Effects with 11 Labs
This week, Startup School delivered an “off the hook” session featuring our partner, 11 Labs, with Thor (Developer Experience) walking us through their groundbreaking V3 engine and the vast possibilities of generative audio media. This is a must-read for anyone building next-gen applications! 👇
🧠 Part 1: The V3 Engine & Expressive Speech
The core of 11 Labs’ offering is the 11 V3 Model, which aims to solve the problem of flat, one-dimensional AI voices, born from the founder’s own “childhood trauma” with poor movie dubbing in Poland.
🎭 Directing the Voice Actor
The V3 model is described as the most expressive and human-like text-to-speech engine available. It can be controlled using director-like natural language tags:
Emotion & Tone: Use tags like [laughter], [whispering], or [sigh] to control non-verbal sounds.
Pacing & Accent: The model naturally incorporates characteristics like tone, accent, inflection, pitch, emotion, and pacing to create realistic dialogue.
Multimodal Prompts: You can mix in sound effects directly into your speech prompt, such as adding the sound of a cheering crowd for a sports commentary example.
✍️ Real-Time Transcription: Scribe
A brand-new launch is the Scribe Model, offering real-time, high-accuracy transcription across over 90 different languages. A demo showed its potential for live-captioning and instant translation, leveraging Chrome’s built-in AI features.
🗣️ Part 2: Infinite Custom Voices
The session showcased how 11 Labs provides a massive library of voices while giving developers the tools to create their own.
Voice Cloning: Users can clone their own voice for use in generating speech, a feature utilized by platforms like HeyGen and Tavis.
Iconic Voices Marketplace: A large library of licensed voices, including legendary figures like Judy Garland, Richard Feynman, and Michael Caine, are available for licensing in commercial applications.
✨ Voice Design: This feature is the key to infinite voices. Users can generate a completely new, unique voice simply by describing its characteristics (e.g., “an older woman with a thick southern accent, sweet and sarcastic”).
Voice Changer: An API endpoint allows developers to take an existing audio clip and re-generate it using a different voice while preserving the original speech’s direction and flow.
🎧 Part 3: Royalty-Free Music & Sound Effects
11 Labs is focused on all aspects of AI audio, including generating ambient sound and commercial music.
Sound Effects: Generate any sound effect by description, including “infinite sound effects” that are designed to loop seamlessly (e.g., rain on a tent, ocean waves) for meditation or ambient apps. All sound effects generated are royalty-free for use in your applications.
Music Generation: The music model is an industry-first as it’s trained exclusively on fully licensed data. This means all music generated can be used in your application without licensing fees or royalties. A demo showed a Jingle Maker that scrapes a website to create a targeted, funky rap jingle.
Stem Separation: For musicians, an API is available to separate an audio file into its individual components, or “stems,” such as vocals/lyrics, drums, guitar, and bass, allowing for post-production editing.