Breaking News

Talking to Google Duplex: Google’s human-like phone AI feels revolutionary






Believe the hype—Google's phone-call bot is every bit as impressive as promised.

NEW YORK—Evidently, I didn't walk into a run-of-the-mill press event. Roughly two months after its annual I/O conference, Google this week invited Ars and several other journalists to the THEP Thai Restaurant in New York City. The company bought out the restaurant for the day, cleared away the tables, and built a little presentation area complete with a TV, loudspeaker, and chairs. Next to the TV was a podium with the Thai restaurant's actual phone—not some new company smartphone, the ol' analogue restaurant line.

We all knew what we were getting into. At I/O 2018, Google shocked the world with a demo of "Google Duplex," an AI system for accomplishing real-world tasks over the phone. The short demo felt like the culmination of Google's various voice-recognition and speech-synthesis capabilities: Google's voice bot could call up businesses and make an appointment on your behalf, all while sounding shockingly similar—some would say deceivingly similar—to a human. Its demo even came complete with artificial speech disfluencies like "um" and "uh."

The short, pre-recorded I/O showcase soon set off a firestorm of debate on the Web. People questioned the ethics of an AI that pretended to be human, wiretap laws were called into question, and some even questioned if the demo was faked. Other than promising Duplex would announce itself as a robot in the future, Google had been pretty quiet about the project since the event.

Then all of a sudden, Google said it was ready to talk more about Duplex. Even better, the company would let me talk directly with the infamous AI. So for an afternoon at least, I wasn't Ron Amadeo, Ars Technica Reviews Editor—I was Ron Amadeo, THEP restaurant employee waiting to field "live" phone calls from a bot.

Talking to Google Duplex
Unfortunately, Google would not let us record the live interactions this week, but it did provide a video we've embedded below. The robo call in the video is, honestly, perfectly representative of what we experienced. But to allay some of the skepticism out there, let's first outline the specifics of how this demo was set up along with what worked and what didn't.

Ironically, the only thing that wasn't working in our demo was the one thing anyone can try today: the Google Assistant. In a consumer Google Duplex interaction, a user would say something like "OK Google, reserve a table for four at the THEP Thai Restaurant at 6pm." From there, the Google Assistant would fire up Duplex and make the call. But in our demo, the call was never initiated with a verbal voice command. Instead, an engineer in the corner of the room silently punched reservation requirements into his computer, and Duplex then took over and called the business.

(Fortunately, voice activation seems like the least important part of Google Duplex. We know the Google Assistant works. We know it can handle voice commands. We know it can start a call with a named business using Google Maps info.)

The THEP restaurant phone proved to very much be a real, live phone line. In-between demos at one point, the phone unexpectedly started ringing. The Google rep quickly shot a "Wait, did you start a call?" question at the engineer in the corner. After he said no, THEP's owner hurriedly jogged over to the phone to speak to a genuine customer.

During the demonstration period, things went much more according to plan. Over the course of the event, we heard several calls, start to finish, handled over a live phone system. To start, a Google rep went around the room and took reservation requirements from the group, things like "What time should the reservation be for?" or "How many people?" Our requirements were punched into a computer, and the phone soon rang. Journalists—err, restaurant employees—could dictate the direction of the call however they so choose. Some put in an effort to confuse Duplex and throw it some curveballs, but this AI worked flawlessly within the very limited scope of a restaurant reservation.



need to keep my day job
In my group, I took the first phone call from Google Duplex. I walked up to the front of the presentation area, picked up the ringing receiver, and the call started on the phone and over the loudspeaker. Listening to recordings of Duplex are one thing, but participating in a call with Google's phone bot (in front of a live audience, no less) is a totally different experience. Immediately, I realized this was much more than I was expecting: Google PR, Google engineers, restaurant staff, and several other journalists were intently watching and listening to me take this call over the speaker. I was nervous. I've never taken a restaurant reservation in my life, let alone one with an audience and an engineering crew monitoring every utterance. And you know what? I sucked at taking this reservation. And Duplex was fine with it.

Duplex patiently waited for me to awkwardly stumble through my first ever table reservation while I sloppily wrote down the time and fumbled through a basic back and forth about Google's reservation for four people at 7pm on Thursday. Today's Google Assistant requires authoritative, direct, perfect speech in order to process a command. But Duplex handled my clumsy, distracted communication with the casual disinterest of a real person. It waited for me to write down its reservation requirements, and when I asked Duplex to repeat things I didn't catch the first time ("A reservation at what time?", it did so without incident. When I told this robocaller the initial time it wanted wasn't available, it started negotiating times; it offered an acceptable time range and asked for a reservation somewhere in that time slot. I offered seven o'clock and Google accepted.

From the human end, Duplex's voice is absolutely stunning over the phone. It sounds real most of the time, nailing most of the prosodic features of human speech during normal talking. The bot "ums" and "uhs" when it has to recall something a human might have to think about for a minute. It gives affirmative "mmhmms" if you tell it to hold on a minute. Everything flows together smoothly, making it sound like something a generation better than the current Google Assistant voice.

One of the strangest (and most impressive) parts of Duplex is that there isn't a single "Duplex voice." For every call, Duplex would put on a new, distinct personality. Sometimes Duplex come across as male; sometimes female. Some voices were higher and younger sounding; some were nasally, and some even sounded cute.

As impressive as it is to hear a computer realistically replicate human speech, the model that generates these voices, WaveNet (from Google's Deepmind division), is actually holding back in the human mimicry department. Deepmind's blog has already revealed that WaveNet can generate human mouth sounds if it wants to. On the blog, there are demos of it breathing and making lip smack noises between sentences. Duplex doesn't do any of that yet.

During the I/O keynote, Google played a brief, pre-recorded Duplex call. Given that the recording was missing many of the important chunks of a normal business call, many suspected that the demo was heavily edited. The employees never said the business' name, and Google never gave out important identifying information like a phone number. People also took issue with the lack of disclosure that Duplex was a robot, and the lack of a call-recording disclosure would be a violation of the law in many states. I think the simplest explanation for the I/O demo is that Google's call was edited for privacy and brevity, and it was only meant as a teaser. During our time at THEP Thai, all of these concerns were addressed.

Every single call started with something along the lines of, "Hi, I'm calling to make a reservation. I'm Google's automated booking service, so I'll record the call. Can I book a reservation for..." This covered both the "I'm a robot" disclosure and the "this call is being recorded" concerns brought up earlier. Google says it's still working on the exact messaging, but the company always intended to disclose that it was a robot recording the call.

Duplex is fine giving out information, but it's designed to only to give out information the bot is authorized to share. In today's demo, Duplex would clearly, slowly spell out the demo caller's phone number or name when asked. It even had good phone etiquette, saying things like, "The name is Ron, that's R, O, N." At one point, the callers' email was asked for and Duplex responded with "I'm afraid I don't have permission to share my client's email."

This spelling out of names and numbers is the one time Duplex really loses the illusion of sounding human. It's almost like WaveNet didn't practice this part of speech at all, and the service drops into a Speak & Spell mode when it needs to rattle off individual characters. The intonation of each letter or number is all over the place, never flowing with normal beginning and ending tones that a human would use.

Looking back, I also take issue with some of the "personalities" Duplex presented. The Google Assistant presents itself as a happy, professional robot assistant with a bit of a fun streak. It can tell the occasional joke, but the Assistant usually speaks with proper language, good enunciation, and a happy, upbeat attitude. In contrast, Duplex is much more casual. Google basically built a secretary AI with Duplex, but it doesn't speak with the practiced confidence of someone accustomed to making reservations—it often sounds like a teenager ordering a pizza. That's not necessarily how I would want to be represented to a business. The casual attitude can sometimes combine with the occasional intonation glitch and come across as annoyed, tired, disinterested, or sarcastic.

Google actually has a great public example of this on its blog. How do you feel about the voice for this Duplex phone call? To me, this comes across as a nasally teenager with a bit of an attitude problem. The awkwardly loud "yeah" at 12 seconds with a rising intonation comes feels a little sarcastic, as does the "OK, awesome" and "OK, bye-bye" to end the phone call.

When you talk to someone on the phone, you build a profile of their mood and intent as you speak to them. This is, of course, up to each person's personal interpretation, but the casual nature of the Duplex voices allows it to drop into "sarcasm" considerably faster than something more restricted like the Google Assistant. This is just one of the voice examples, but there were others like a timid, mousey girl that could be misinterpreted, too.

Google said it created a variety of voices to get feedback about what works and what doesn't, but in general, I would request something less casual. Since this voice is Duplex representing me to a business, I would really prefer that it sounded more like a professional, confident secretary or assistant. There's a reason everyone at work puts on a professional tone; there's just less room for misinterpretation. And while the calls are being recorded, right now they're being recorded only for Google's purposes—currently, the company doesn't plan on letting users hear their calls. As a potential user, I would be very interested in listening to at least the first few calls when I had time; I'd want to know how Duplex is representing me to a business. Listening might also be helpful in diagnosing a miscommunication if the business gets something wrong.

Bad restauranteurs, good bots
With a basic call out of the way, Google's restaurant presentation then set about screwing with Google Duplex to try and make it mess up. One person took a call for four guests and intended to end the call with a recap that got the number of people incorrect: "OK, so that's a reservation for six at 8pm?" But Duplex doesn't hesitate and corrects them, "Uh, I need a reservation for four people." The system is designed to do "gentle corrections" like this.

Another person took offense to the idea that a robot was calling them: "Wait, you said you're the Google Assistant and you're calling me?" Again, no hesitation. "I'm Google's automated booking service" Duplex replied. "I'm calling to make a reservation on behalf of [human]."

Another person gave Duplex a normal Google Assistant command and asked what the weather was. "I'm only allowed to book reservations; I'd like to make a reservation at four... ," Duplex responded. For situations like this when the business goes off script, Duplex will give a dismissive answer and try to steer the conversation back to its goal of making a reservation. We were also told that if a business is closed, Duplex can try back again when the place is open.

One thing Duplex didn't seem to do was verify the name of the business it was calling. We were free to make up any name we wanted when we answered the phone, and many didn't say Duplex was calling the THEP Thai restaurant. Presumably behind the scenes, Duplex was told to call THEP, but the bot didn't throw up a red flag when the business it called didn't identify itself. Duplex seems to rely on the Google Maps info being correct in this instance.

The key is the limited scope

As impressive as Duplex was, it had no shortage of things working in its favor during the demo. In terms of sounding human, the restaurant demonstration was all being done with standard telephony gear and the ultra-crappy voice quality that comes with it. So when I say Duplex "sounds human over a phone," that should be taken with the understanding that a phone call is not a great medium over which to judge a voice. The calls were typically grainy, mono, and used a low bitrate, which could be hiding flaws in Duplex's voice. Sometimes there were awkward pauses that were a beat too long, but thanks to latency, that can happen on a phone call with regular humans, too. You're also busy while on a phone call, not necessarily parsing the voice for hints of its computer roots. You can't rewind the phone call and listen to it over and over again the way to can with the Web-based content.

If you noticed that there weren't many examples beyond making a reservation, that's because that's all Duplex can do right now. This is really the key to the whole system. Google did not build a general-purpose speech AI; it built something that is laser focused on making a reservation and nothing else. Duplex can't even do reservations at any business—it only supports making reservations at restaurants and hair salons or checking holiday hours. An appointment for something like an oil change would go roughly the same as these other calls, but that's not supported.

Part of the problem is that Google's training doesn't scale as well with Duplex as it does for other AI efforts. Wiretapping laws mean there isn't a treasure trove of millions of phone calls Google can train its AI with. All the training data needs to be made by Google, so the limited scope really helps.

As Scott Huffman, the VP Engineering for the Google Assistant, described Duplex like this: "One thing that makes it work is, in fact, that it is trained on these very narrow tasks. On one hand, a lot can happen in a restaurant reservation conversation, but on the other hand not that many things. So once you've done some calls, you get the heart of it, which is they're going to ask you about the time, the number of people, and all that. Then you can build out from there. With not very much data, you end up having the heart of it, and that's what allows us to build these kind of systems."

So Duplex wouldn't stand a chance at passing a Turing test, since it is only set up to make reservations and can't carry out a general conversation. In the context of a reservation phone call though, it is easily one of the world's most convincing speech AIs.

Similarly, the first public steps for Duplex will be limited. Duplex will start testing calling businesses for holiday hours in "the coming weeks," but Nick Fox, the Google Assistant's VP of product design, admitted the company wasn't sure when a wider rollout for holiday hours would happen. It likely depends on how the testing goes. The holiday hours feature would allow a user to ask Google for a business' holiday hours, and if it didn't know, Duplex could call the business to check. Once the first call goes out to a business, Duplex would then send that information to Google Maps, so queries from everyone else could just ping Google Maps instead of doing another call to the business.

Reservations for restaurants and haircut appointments will start later this summer but only for a small number of "trusted tester" users. Even then, it's not for all businesses yet, either—Duplex would only handle calls to a limited number of small businesses that are also part of the testing program. Businesses will be a part of this and get controls, too. Google showed a settings screen that would allow businesses to turn off Duplex reservations. Google also mentioned the need to build spam controls into the system preventing it from being turned into a robocaller.

"We're actually quite a long way from launch, that's the key thing to understand," Fox explained at the meeting. "This is super-early technology, somewhere between technology demo and product. We're talking about this way earlier than we typically talk about products."

The reason for the early announcement and testing is feedback. "It's a very new technology and a new concept as well," Fox said. "It's important we think about, not just the technology itself we need to get right, but all the product elements around it: the disclosure we need to get right, the business experience we need to get right, the user experience we need to get right. All of those product pieces and those societal pieces are really important to get right, and the reality is we shouldn't be dreaming all of this stuff up alone in our offices and conference rooms and stuff."

Building Duplex, with human call center backup

Google says that, as of today, four out of five calls for reservations and holiday hours can be carried out autonomously. For the calls that Duplex can't handle itself, the AI bot is trained to gracefully bow out of the conversation and transfer the call. The call doesn't go to your personal phone; instead, it routes to Google's call center of humans that the company has standing by to deal with Duplex issues. This call center is basically Duplex mission control, and it's been instrumental in training the AI.

During the presentation, Google likened the development of Duplex to that of a self-driving car. The AI is trained first with "manual" data from calls performed by a human. Then it moves on to "supervised" calls where humans are listening and ready to "take the wheel" at any time. For reservations and hourly calls, Duplex is at the "Automated with fallback" stage where humans aren't normally on the call. But the call center is standing by if things go wrong anyway.

Again, like self-driving cars, Duplex's experience out in the real world is recorded and sent back to the call center for further training. The humans have tools for marking up and annotating calls, as well as correcting Duplex and nudging it toward being a smarter, more useful call bot.

For now, it sounds like the call center will stick around for public use, too. Any time you ask about the future of Duplex, you'll hear, "we're still experimenting" somewhere in the answer, and Google currently says it will "likely continue to have operators as the technology progresses." Calls will all come from Google's call center rather than from your personal phone connection. That allows businesses to easily identify the call as the Google Assistant, and it allows calls from non-phone hardware like Google Home.

The future: An Assistant interface, feedback, and more use cases
At the THEP event, Google took time to look back and talked a bit about the first Duplex prototype. That was nothing more than a laptop with a phone receiver carefully lined up over the speakers and microphone. We got to hear a recording of one of the even earlier calls (again, we weren't allowed to record this), and it was pretty awful. A very robotic voice stumbled through a phone call with a business, there were awkward pauses, it interrupted the human, it didn't understand a lot of things, and the human had to repeat themselves several times. The audience cringed through the whole thing.

Thanks to the call-center training and a major voice upgrade with WaveNet, Duplex is much better today. Huffman explained that Duplex got up and running faster than you would expect, since it was a combination of already-existing technologies at Google. Voice recognition and transcription have both been at the company for a long time, as has computer generated voice tech. What this initiative really needed was a conversational AI built around the very specific interactions Duplex is expected to have.

If you're one of the lucky few that will get to try Duplex soon, you'll see a Google Assistant interface that looks a lot like you would expect. In the example, the command "OK Google, book a table for two at El Cocotero on Tuesday at 7pm" will be all it takes to get a reservation. Duplex wants to go into a call armed with the ability to do time negotiation if your preferred slot is filled. So after a command like this, the service tells you it will allow for a one-hour bump if your time is filled. You can also specify a time frame in your voice command. Duplex will tell you it's prepared to give your name and number during the call, and it looks like you can even pick from several numbers.

At the end of the call, the Google Assistant will give you a notification letting you know how the call went. If it made an appointment, the time and date will be added to your calendar. You'll also get an email record of the call.

All of the demos for Google Duplex involved making a one- to two-minute call to a small business, and Google even highlighted small business during its presentation. The company says that 60 percent of businesses that rely on appointments don't have automated booking, so they seem like natural use cases for Duplex.

Personally, I don't really mind making a short call to a small business though. What I really want is for Google Duplex to handle a big business call. I want it to deal with a 45-minute conversation to Comcast customer service and be able to cancel the TV service. Have Duplex navigate the touch-tone directory tree, listen to "we're experiencing higher call volume than normal," and shut down counter offers for six months of cheaper service if I don't cancel today. That would be a truly useful service and something I would have no qualms about letting a computer handle.

I asked about calls to big companies. "I would say not yet, but I talked about a number of the pain points and, in general, we see a number of opportunities here," Fox replied. "It's a matter of actually building those... it's this kind of feedback that's super-useful for us, as we want to understand what's useful to people as well, and we'll push where it makes sense." Again, success with Duplex is all about the limited scope, and it needs more training to grow within and beyond those.

Speaking of big business, I can easily imagine what will happen when the other side shows interest—what if a company like Comcast or Time Warner gets hold of Google Duplex-style technology for its call centers? Many of these call centers are already doing everything they can to cut costs by outsourcing "tier 1" tech support and customer service to developing countries. And in lieu of comprehensive training, these call-center employees are often instructed to rigidly follow a script. Many big businesses are basically trying to make a human version of a robot right now.

With some Duplex or a Duplex-like system in the future, an organization like AT&T could build a robot version of a human and potentially layoff many call-center employees. The AI could handle the customer service or tech support script, telling the customer to power cycle their modem, restart their computer, or whatever other useless suggestions tier 1 tech support offers. You'd never be able to speak to a human again! Our personal assistant robots would simply be talking to the customer service robots over the phone repeatedly. At that point, they might as well be making 56k noises at each other.

Fox didn't rule out selling this kind of tech to call centers, but that's another potential down-the-road addition. "There are companies that do that well," he said. "There are very big and established companies that provide software for call centers. It's not the core of our business. We'll see as we go whether we think if we can help there. But it's not a business that we're in today, and it's a pretty well-served business."

Rather than being spurred by questions, Google did share what it sees as a number of future use cases for Duplex. It mentioned that, some day, a non-native speaker could use Duplex to make a call in the native local language. Google also said speech-impaired people were interested in the technology.

But right now, Google is decidedly taking a "slow and deliberate approach" with Duplex. If the technology graduates following its upcoming trials, perhaps we'll all soon be able to make robo-reservations no matter how much like me (bumbling, inexperienced, nervous) the human on the other line happens to be.

"In general, we are listening pretty closely to feedback that we're seeing," Huffman told the group. "And this feedback in this next phase will be super critical as well to make sure we're getting it right as we go. I don't think we pretend to know all the answers."

https://www.geezgo.com/sps/28453

No comments