Narrating Your Webcam with Your Own Voice and Mister T

August 2, 2024 1 minute read

When I’m trying to get people excited about AI and LLMs, showing them audio and video stuff usually does the trick. A demo I saw early on that blew me away was the ‘narrating your webcam as David Attenborough’ demo

It shifted my brain the first time I saw it. What’s even more interesting is how little coding effort is required to do something this sophisticated – like using cloned voices to narrate and describe images using specific tone and language.

I took this demo and modified it to use my own cloned voice, and played around with a few different people (Mister T!). It’s kind of hard to describe, and if you’re interested, just either watch the demo above or watch this video of me playing around with it.

What’s so notable is that all the hard work is done by OpenAI and ElevenLabs. The python code to do this is ~150 lines and probably could be much smaller.

LLMs give us an entirely new set of tools to build things. Most people still have no idea what they’re capable of, and that’s what makes this space so exciting to me. This might be a toy example, but what are the non-toys that you can build that use the same underlying concepts and capabilities? That’s why I love talking about this stuff with people, because if I can get more people to see what’s possible, maybe they’ll see real problems or opportunities in their lives and their jobs where these tools can have an enormous impact.

What are you building with this stuff? Where do you need help? What are you curious about? I’d love to hear from you, it’s an amazing time to be building with AI and LLMs.

Twitter Facebook LinkedIn

Related Posts

How I Prototype Agentic Workflows Before Writing Agent Code

Tool Calling and the Value of Understanding AI More Deeply

Building Dashboards with AI