Outthinkers

#111—Tobias Dengel: Voice Technology: Unlocking Efficiency and Evolution

February 23, 2024 Outthinker - Tobias Dengel Episode 111

Tobias Dengel is President of WillowTree, a TELUS International Company. WillowTree designs and builds digital experiences for the world’s largest brands, and they sit at the forefront of this breakthrough. Tobias understands voice technology's profound, wide-ranging implications for every industry, including marketing, healthcare, hospitality, manufacturing, media, and more. He counsels leaders in all these fields about how their companies must adapt to the coming age of voice.

He is the recent author, with co-author Karl Weber, of THE SOUND OF THE FUTURE : The Coming Age of Voice Technology,  a dive deep into the sweeping changes we can expect as voice technology gains traction.

Tobias’ insights will really open up your imagination around the future of human-machine communication, particularly around how voice technology, accelerated by recent developments in AI, have the potential to radically alter the way we live and how companies do business.

In this podcast, he shares:

  • How voice technology offers a significant advantage in communication efficiency, and will drastically improve productivity across our lives and many sectors  
  • Why this efficiency applies more to humans communicating to machines and not the other way around 
  • How our interactions with machines will transition from being uni-modal to multi-modal with machines reacting in real-time to our requests in multiple formats. 
  • What individuals can expect will change with everyday tasks and jobs, and for business leaders—where to anticipate opportunities for adoption of voice technology in their company. 
  • How this next technological revolution will mirror the smartphone one in many ways—and how it’ll differ 

__________________________________________________________________________________________
Episode Timeline:

00:00—Highlight from today's episode
1:19—Introducing Tobias + the topic of today’s episode
3:09—If you really know me, you know that...
4:19—What's your definition of strategy?
5:21—Why is voice an advantage over other forms of communication?
9:25—What has changed about voice recognition software over time, and how does that lead us to today with Gen AI?
13:14—Could you talk to us about the various modes of communication, particularly humans vs. technology?
15:47—Who do you think will winning or losing across industries as voice tech takes over?
15:18—What are the first steps someone should take in pursuing new ideas?
18:04—Where should people start to identify where a business might implement voice technology?
27:55—How can people follow you and continue learning from you?

__________________________________________________________________________________________

Additional Resources:

Personal site: https://www.tobiasdengel.com/
LinkedIn: https://www.linkedin.com/in/tobiasdengel
Twitter: twitter.com/tobiasdengel

All content © 2024 Outthinkers.

Thank you to our guests, thank you to our executive producer, Karina Reyes, our editor, Zach Ness, and the rest of the team. If you like what you heard, please follow, download, and subscribe. I'm your host, Kaihan Krippendorff. Thank you for listening.

Thank you to our guest. Thank you to our executive producer, Karina Reyes, our editor, Zach Ness, and the rest of the team. If you like what you heard, please follow, download, and subscribe. I'm your host, Kaihan Krippendorff. Thank you for listening.

Follow us at outthinkernetworks.com/podcast

 
 

Kaihan Krippendorff: Tobias, thank you so much for being here. It is great to see you again. And I'm excited to dig in. 

 

Where are you dialing in from? 

 

Tobias Dengel: I'm in from Charlottesville, Virginia where where you are headquartered. 

 

Kaihan Krippendorff: Oh, great. I've got so much I wanna cover with you. So I'm gonna jump right in. I'm gonna open up with the same 2 questions I ask all of our guests. The first 1, just for us to get to know you a little better. 

 

Personally, I should say to our guest that we went to school together. If anyone sees the recording, we're not we're not recording the video to be transmitted, but we use little clips they'll see my pen logo on my shirt, and we both went to pen together. And so it's really nice to see you again. 

 

Tobias Dengel: It's great to be here. 

 

Kaihan Krippendorff: Awesome. So first question. If you complete the sentence for me, if you really know me, you know that. 

 

Tobias Dengel: I love the clash. 

 

Kaihan Krippendorff: Oh, I love the clash. Yes. What's your song? 

 

Tobias Dengel: You know, I there are so many, but I love clampdown. I love a lot of the songs on the Santa Nesta album, which Mhmm. That's some deep tracks there. 

 

Kaihan Krippendorff: Gotcha. 

 

Tobias Dengel: Who's this skin is 1 of my favorite songs up, 

 

Kaihan Krippendorff: Santa. 

 

Tobias Dengel: I love I I don't know. It was the first album I ever bought, and it kinda goes, you know, I'm sure there's lots of theories as to how that works in the human brain, but combat rock was the first album I bought as a kid, and I am a clash addict to this day. Wow. 

 

Kaihan Krippendorff: This is a topic of strategy. You've led businesses, you've built businesses, especially technology businesses, certainly think a lot about strategy and thought about it. Maybe it's not like the central thing that you do, but you are a strategist. So What's your definition of strategy? 

 

Tobias Dengel: So 1 of my favorite quotes is a Yogi Birrah quote, and there are many great Yogi Birrah 1. Favorite is in theory, there's no difference between theory and practice, but in practice, there is. And so to me, Yeah. To me, a great strategy is where theory meets practice. And we're constantly you know, I'm constantly looking at where the theoretical evolution of technology currently is, but the strategy is how do you apply that in practice for your own business or for your clients. 

 

And so that meeting of theory and practice to me is it's the center point of a good strategy. 

 

Kaihan Krippendorff: I like that. So it's not theory by itself. It's not practice by itself. It's the requisite of having both. Perfect. 

 

Love it. So let's talk about voice. Why is voice an advantage? 

 

Tobias Dengel: So the core reason we all wanna use voice interfaces with our devices is it's 3 times as fast to speak as it is to type. And on a device, it might be 5 times as fast. That's why we are so tempted to speak into our smartphones all the time. You see it driving down the street. You see it walk down the sidewalk. 

 

The problem is it's also much faster, about 3 times as fast to read as it is listen. So the core problem with all the voice applications today have been that we've tried to simulate a human conversation, which is actually not super efficient. What we wanna do is speak to machines, but then have the machines either do things or respond graphically to us, read something, etcetera. And this multimodal approach to human machine interaction is where we believe the future is, and it's gonna be the real breakthrough. And so once you once you get your hand around that, then all of a sudden a lot of things start to open up in terms of how this is all gonna 

 

Kaihan Krippendorff: fast saying, well, that's that that kinda blows my mind because I I think that most of our interactions we assume are uni modal that if I'm voice, then I'm receiving voice if I'm reading, if I'm typing that I'm receiving. But that makes a lot of sense. Why is this efficiency of communication so important because I was reading your book. I was thinking that, first, we were just grunting and giving symbols, and then we learned a verbal language, and then we learned a written language. And these kind of surges in human development could correspond with these adoption of new communication modes. 

 

Could you talk to us a little bit about that? Why is this efficiency of communication important? 

 

Tobias Dengel: Yeah. So every breakthrough that we look at in terms of communications about speed. Right? I mean, years ago, communications were about as fast as a horse could move across the planet. That's how fast information flowed, and then we got wired together first through the telegraph, etcetera. 

 

And so everything is about speed of communication, but where the human communication with machines is really stagnated is that we're basically still using keyboards that were invented, I mean, for typewriters, a hundred and 50 years ago. And because they existed, they've been the primary way. Now we, of course, tap and swipe and and use mice, but the keyboard is the primary way, and it's just it there there's been no evolution that medium. And if we've learned anything in the last 25, 30 years of digital, it's that speed always wins. 1 of my favorite stats is for every second that a page takes to load a web page, you lose about 20 percent of the audience after the first 5 seconds. 

 

So we're just extraordinarily impatient. The the younger generation's even more impatient than than you and I are. If you can do anything faster, it's gonna win. And when we when we're looking at these voice experiences, right, to take it something we do many of us several times a month or even a week is trying to figure out what movie to go to. If you wanna go to your app and figure out what movies are playing tonight and and buy a movie ticket, it takes about 2, 2 and a half minutes on average. 

 

If you could ask a device that it takes 4, 5 seconds, but you don't wanna listen to Siri or Alexa list you know, movies with 3 3 show times. You and I experienced that when we were in college, it's called Movie Fund. But what you want is to see it on the screen. And then say, alright. Give me 2 tickets for Star Wars at 8PM. 

 

And so now you've taken, like, something that takes 2, 3 minutes to do today, and you can do it in 10:15, 20 seconds. And that's always gonna win, and, you know, that's a, you know, interesting consumer example. But when you start thinking about productivity at the workplace and how much time, we all spend typing and how much more efficient that can become. And by the way, this is all powered by Gen AI. It's gonna be this is a really exciting time in terms of productivity in the white collar space, which is which has lagged. 

 

Kaihan Krippendorff: Gotcha. So let's bring in Gen AI because I'm thinking, you know, way back when I used Dragon, naturally speaking, and I loaded onto my computer, and I had a thing. And it you know, kind of what what has changed since then, and then let's add Jan AI. What has changed since 15 years ago when we had these voice recognition software. 

 

Tobias Dengel: Yeah. So when you think about voice recognition is in a kind of human interface with devices, there are 3 things that have to happen. First is transcription. Right? The machine has to turn the sounds that it hears into words. 

 

And then the second 10. Like, what does the user mean by that? And then the third is what's the response the correct response. All 3 of those, but in particular, the first 2 are heavily informed and supported and pushed forward by large language models. So large language models at the end of the day are primarily just predictors of what the next word is that someone had would have said. 

 

Right? It's a giant prediction algorithm. And so they're incredibly good at if the transcription device today gets 90 percent of the sentence that you had spoken, it's incredibly good at them saying, you know what? If that was 90 percent, I can tell you with 99.99 percent accuracy, what the complete sentence was that was trying that And we're seeing that every day. I don't know if you've noticed, but if you're using Alexa or if you're using Siri, 1 of the things that's changed over the last years is you say something And then it kinda auto corrects itself. 

 

It might take 2 or 3 seconds to do it, but that's really a large language model typically being applied to what you're saying. To correct it. And so that is gonna make 1 of the big frustrations for all of us has been at 90 percent accuracy, transcription isn't really good enough. Right? Because then you have to spend all this time correcting it. 

 

And so large language models, gen AI is gonna get it much more accurate. And then the second piece is you know, we all know this because Chennai is so good at summarizing things. Figuring out what the intent is also gonna get much more accurate 1 of the 1 of the stats that Google released a few years ago is that in English, there are 4000 different ways that people set their alarms. Which you'd think is a you'd think there were 5 ways. Wow. 

 

Kaihan Krippendorff: That's crazy. 

 

Tobias Dengel: Problem of human speech. Right? And so that's where these large language models are so good. 

 

Kaihan Krippendorff: I see what you're saying. So we said try to kind of rephrase it in my language. Hard to get above 90 percent accuracy. But, really, what you care about is that the machine understands what you mean or what your intent is and with large language models, we can extract that intent more accurately even if we only get accuracy up to 90 percent. Is that Gotcha. 

 

And what was the third phase? You said the first step was yeah. What was the third step? 

 

Tobias Dengel: The third step is in how the machine responds. And that's what we have to graduate from a voice response to a multimodal response. And, you know, there's a ton of work being done right now in conversational design, but a big piece of it is what is the optimal response and what was the translating the intent of the user when they ask what movies are playing tonight that machine knows, alright, this is basically an API call that's asking me show movies between the hours of, you know, 6 SCIM and 9PM, whatever it is. And so calculating that and then figuring out what the optimal response is to that. And so that third piece is also heavily informed by large language models because some of that will be a fluid response. 

 

Right? I think that 1 of the big changes that's gonna happen in terms of computing is this of a fixed page being responsive is gonna change and it's gonna be much more optimized and personalized. And that's, again, an area where these gen AI LMs are so efficient. 

 

Kaihan Krippendorff: You say multi mode, and I'm here. I'm thinking of 2 modes. There's I can either visual or auditory are there other modes? 

 

Tobias Dengel: Yeah. So there's this whole concept of communication, you know, amongst humans, some depending on the analysis you look at, you believe, but somewhere between 50 and 75 percent of communication between humans when they're having a conversation is nonverbal. And we're getting there with machines. Right? Like, if you enter your password incorrectly, your phone vibrates, so that's a 4 you know, that's a firm mechanic or when you talk yeah. 

 

Exactly. And when you're talking, you know, the Siri, the little line kind of changes color. No. That's visual, of course, but it's not text. So, like, there's a whole emerging set of communications between humans and devices. 

 

And what's really interesting to me is throughout history, communication has been a call and response mechanic. Right? I say something you respond. You say something I respond. But with multimodal, you can be talking to your smartphone or your computer, and it's reacting real time. 

 

And so now we have concurrent is emerging concurrent communication, which really tightens up the communication cycle even more. So as an example, you know, we've worked with 1 of the large pizza companies. We've we're working on a at an app where as you speak, the pizza order is being assembled in the background on the app. And so when you have finished speaking, the order is finished. Right? 

 

And so Mhmm. 

 

Kaihan Krippendorff: Mhmm. Mhmm. 

 

Tobias Dengel: Mhmm. That's a whole new thing. Right? 

 

Kaihan Krippendorff: Right. 

 

Tobias Dengel: Smart communication. So super exciting. 

 

Kaihan Krippendorff: Right. Wow. Wanna ask you a question? I don't I I don't remember you writing about this. I don't know if you have any kind of thought on it, but what about, like, nonverbal communication with machines? 

 

Like, just direct brain communication with machines? 

 

Tobias Dengel: It's funny. Facebook, Meta actually did experiments on this a few years ago where they put nodes into people's had to who weren't able to speak, so that was the use case. And they were able to basically short circuit the whole need to to speak and and process thoughts directly into a machine action. So I think that's where we end up in now 50, a hundred, 200 years. I think at least right now, it requires a probe to be inserted into your brain, which I don't think most this are are ready to do. 

 

But that's obviously the perfect state, right, if you don't even need to take your thought and translate it into speech. But Why not in our lifetime? Yeah. Okay. 

 

Kaihan Krippendorff: Okay. Okay. So I wanna talk a little bit about the business implications of this. Who do you see wins or loses? Those could be industries? 

 

I know you write about when Apple and introduced the iPhone, how that led to the disruption of the of of taxi cabs, and that's not something that you would immediately, linearly, predict, who do you see winning and losing? 

 

Tobias Dengel: Yeah. So I think as we've gotten used to over the last 20 years, the big companies, the hyperscalers are gonna be huge winners in this. I think Microsoft, in particular, is very well positioned because they bought Nuance. And so Oops. Nuance is the parent of Dragon. 

 

Kaihan Krippendorff: Ugh. Gotcha. 

 

Tobias Dengel: So Microsoft is extraordinarily well positioned because they have the leaders in speech, and they have the leaders in, you know, obviously, Gen AI. And they are in the business of providing software, so they're platform agnostics so they can provide, you know, technology across the system. So Microsoft, we're very well positioned. But as is, you know, Amazon, Apple, etcetera. They all have some strength and weaknesses. 

 

Google certainly when it comes into these platforms. I would say it's really hard to predict who the winners and losers in any industry are gonna be. I would just say that it's a time where innovation is absolutely critical and how apps and websites, but particularly apps evolved to be voice first and how that, you know, becomes the first stop for customer service for clients, for any company, I think, is really, really important. You think about airlines. Right? 

 

Most of our experience with our airline today is through the app. That is gonna get turbocharged through a voice first experience. Right? If your flight changes, you're gonna wanna talk to your delta app or your United or American or whatever app and say, hey, change my flight, do this, and have that happen real time with your app versus having to talk to another human being. And you can use that example everywhere, wherever we are interacting with human beings whose job it is basically to be translators for a need into a system, which is a lot of what customer service does today or the people that take your order at a quick serve restaurant, etcetera, they're basically just translating your order into a machine, and that step is gonna get eliminated. 

 

And that's why you see McDonald's as an example, made huge investments in this space. 

 

Kaihan Krippendorff: So you may have already answered this, but just walk me through this scenario. I am a company. You know, I've got a company. I I'm a airline. I'm a quick serve restaurant. 

 

I'm a Car Wash Company, I'm a whatever, a publisher, and I bring you in, and I say, alright. So where in my company should I be looking at leaning into voice or converting to voice. How do you what's the work plan or the schema that you use to identify and prioritize the opportunities? 

 

Tobias Dengel: Yeah. So there's several different use cases and, you know, in the book, we the whole second half of the book is about how to find these use cases prioritize them. I would put them in a couple of really important categories. 1 is just pure efficiency. Where are humans interacting with a keyboard. 

 

Right? And how do we make that more efficient? And so that's a huge category of use cases. The second is where are folks in the field or in their job not in a in a non ideal place to use a keyboard to enter information where they have to do that today, which basically causes an interruption. So for example, we work with a large beverage company They've got tens of thousands of people in the field every day delivering sodas and repairing fountain machines, etcetera. 

 

They're working with their hands. And it's much more efficient for them to interact with the systems either, like, processing orders or ordering parts or finding where their next location is or how many cases of x they're supposed to deliver by voice first experience than having to interrupt what they're doing by pulling up their app, which they do today. 1 of my favorite examples in the book is Cafe Pacific Airlines. They've got you know, they try and turn a plane in 8 minutes. It's gotta get cleaned in 8 minutes. 

 

People rushing through it, but those people see when there's a problem. Right? Broken seat back, 8 f. Today, they have to pull up their app, authenticate, blah, blah, blah, takes some 3 to 4 minutes, and an 8 minute process, that's a real problem. They just launched a voice first system where they're the cleaning crew can just process it using voice real time while they're working obvious examples. 

 

In law enforcement where you want your hands available, warehouse work factory work. Retail, there's a big push in retail to have their employees be able to check inventory via voice attendant. Oh, I mean, you just you know, that's a whole class of use cases. There's a safety class of use cases where it's interacting with machines that they don't know exactly what's going on. The example we use in the book is 1 of them is Boeing. 

 

Accidents with the Max 37 is you know, that ultimately was pilots weren't able to interact with a device. They weren't able cut off the autopilot, and voice first experience would have immediately solved that And that's not, you know, science fiction. Both the US and the Russian Air Force now have voice first cockpits that they're that are being tested and or deployed. And so Yeah. 

 

Kaihan Krippendorff: You talked about I thought it was 1 thing that stuck out of me what from what I read is that interactions that don't happen frequently and you might forget how to what buttons to press, those will be ones that would naturally be susceptible 

 

Tobias Dengel: to Exactly. So that falls into safety emergency category, or it falls into the apps we're using every day. The average banking app has over 300 pieces of functionality. How on earth can you organize efficiently 300 pieces of functionality on its phone screen. Right? 

 

So, you know, if you have to reorder checks as an example, and I ask you to use your favorite banking app to do that, your blood pressure would probably go up because you're like, this is gonna be a nightmare figure out how to do, like, probably in there somewhere, but that's a perfect use case. We just say order rejects and the system does it. 

 

Kaihan Krippendorff: Beautiful. Love it. I'm gonna have 15 more questions, and we don't have we're at the top of our time with you. So let me just I'm gonna have 1 more question, and then I'm gonna ask how can people continue connecting with you. Is there a past technological revolution or change or introduction that you use when you think of it as an analogy to what's happening here 

 

Tobias Dengel: To me, the best 1 is the advent of the smartphone. We talk a little bit about the Internet, but the Internet was such a big change and kinda came completely out of nowhere. Any circumstance. But the smartphone is a good 1 because if you remember, we were the digital on mobile devices was around for a decade or more. Right? 

 

We had WAP, which was very basic web browsers that we had in the late nineties already. Then we had our whole experience with Blackberry that did certain things, but we only use it for certain things because it the interface wasn't there. And then all of a sudden, the iPhone launch, there was a new interface, and it changed everything. And I think we are at that same point in voice right now where through the combination of conversational AI and general AI that makes it good, but this concept of multimodal being recognized as the right interface the right user interface is going to have as profound a change as mobile did. And I think it's easy in retrospect right now to say, oh my gosh. 

 

When the iPhone launched the world changed, it didn't seem that way necessarily at the time. Fellow, now the iPad has, you know, phone functionality. And so I think that's where we are, and that's the best analogy. And just like you said, no 1 that day said, oh my gosh, you know, Uber and Lyft are gonna disrupt the taxi industry, but that took 48 months or whatever it was, those are the kind of things that are gonna start happening real quickly. 

 

Kaihan Krippendorff: Yeah. It's brilliant. I think, yeah, you're touching on some. I've been thinking a lot about. I know Amy web wrote a endorsement of your book. 

 

She's been on the podcast. What kind of a sort of would which bit and this is tangential, I guess, is this stuff is already happening. It's just we haven't named it yet. It's not like you're predicting what's going to happen. It's more recognizing what's already happening. No? 

 

Tobias Dengel: Yeah. A hundred percent. It's putting a name on it and categorizing it and allowing us to think about it as a phenomenon in and of its own. Right? And I think that's you know, that's how we need to think about it is we need to start really focusing on how can we use this technology, how can we use voice in our in each of our companies, and or start a new company, there's gonna be so many opportunities. 

 

Kaihan Krippendorff: Awesome. Well, thank you for helping us put a name on it, to help us get our head around it, and learn where the opportunities are and threats are. In addition to buying your book, which I highly recommend people buy the sound of the future, the coming age of voice technology, how else can people continue to explore this with you, connect with you, learn with you? 

 

Tobias Dengel: Yeah. The easiest way is either through our company website, which is willotry apps dot com or on LinkedIn. I'm the only Tobias Dangle on LinkedIn. 

 

Kaihan Krippendorff: Really? Okay. 

 

Tobias Dengel: Yep. There's only there's only 1 of them right now. And so that's where I'm most active and I'd love to connect with folks. Great. 

 

Kaihan Krippendorff: Well, to be honest, thank you so much for being here for the work that you did to put this together, and I really think that you are opening up industries to something as you say that is happening now and allowing us to actually, like, name it, fixate on it, and plan for it. Thank you. 

 

Tobias Dengel: Thanks for having me. I love your podcast, and great to connect again. 

 

Kaihan Krippendorff: Yeah. It's great to see you. Thank you. Thank you to our guest. Thank you to our executive producer, Karina Reyes, our editor, Zach Ness, and the rest of the team. 

 

If you like what you heard, please follow download and subscribe. I'm your host, Kaihan Krippendorff. Thank you for listening. We'll catch you soon with another episode of Out Thinkers. 

People on this episode