Zoom distance: Researchers estimate physical signals make up 70 per cent of conversation

Zoom distance: Researchers estimate physical signals make up 70 per cent of conversation


Travel often demarcates an experience, focusing attention and solidifying work-life boundaries –whether it’s a flight to a conference or a daily commute to the office. As the online world has sliced those rituals away, people have experimented with “fake commutes” (a walk around the house or block) to trick themselves into a similarly targeted mindset.

“For people who live alone, it has been really hard not to be able to hug friends and family…. I’m not sure if [technology] can achieve that 10 years from now, but I hope we can.”

But while the evolution of technology use is always ongoing, the pandemic threw it into warp speed. Zoom reported having 300 million daily meeting participants by June 2020, compared to 10 million in December 2019. Zoom itself hosted its annual Zoomtopia conference online-only for the first time in October 2020, which attracted more than 50,000 attendees, compared to about 500 in 2017.

Some might see this as evidence that the tech is, thankfully, ready to accommodate lockdown-related demands. But on the other side of the coin, people have been feeling exhausted and disrupted.

Humans are adapted to detect a lot of visual signals during conversations: small twitches, micro facial expressions, acts like leaning into a conversation or pulling away. Based on work starting in the 1940s and 1950s, researchers have estimated that such physical signals made up 65 to 70 per cent of the “social meaning” of a conversation.

“Humans are pretty bad at interpreting meaning without the face,” says psychologist Rachael Jack of the University of Glasgow, co-author of an overview of how to study the meaning embedded in facial expressions in the Annual Review of Psychology. “Phone conversations can be difficult to coordinate and understand the social messages.”

An image shows a set of six synthetic faces exhibiting characteristic facial shapes for various emotions. The faces are labelled happy, surprise, fear, disgust, anger, sad.

The muscles of the human face contract in characteristic patterns to produce widely recognised signals of emotions, as shown in this image of 3D synthetic faces. During the many video meets we have experienced during Covid, faces and expressions are more constantly and prominently on display than they would be normally if, say, an individual was quietly and anonymously listening to someone speak in a meeting or classroom.

Being “on all the time” – making sure to appear attentive and interested, to maintain eye contact – contributes to Zoom fatigue. On the flipside, social messages are harder to transmit using audio alone.

People often try, subconsciously, to translate the visual and physical cues we pick up on in real life to the screen. In virtual worlds that support full-bodied avatars that move around a constructed space, Bailenson’s work has shown that people tend to intuitively have their virtual representatives stand a certain distance from each other, for example, mimicking social patterns seen in real life.

The closer avatars get, the more they avoid direct eye contact to compensate for invasion of privacy (just as people do, for example, in an elevator).

Yet many of the visual or physical signals get mixed or muddled. “It’s a firehose of nonverbal cues, yet none of them mean the thing our brains are trained to understand,” Bailenson said in his keynote. During videoconferencing, people are typically looking at their screens rather than their cameras, for example, giving a false impression to others about whether they are making eye contact or not.

The stacking of multiple faces on a screen likewise gives a false sense of who is looking at whom (someone may glance to their left to grab their coffee, but on screen it looks like they’re glancing at a colleague).

And during a meeting, everyone is looking directly at everyone else. In physical space, by contrast, usually all eyes are on the speaker, leaving most of the audience in relative and relaxed anonymity. “It’s just a mind-blowing difference in the amount of eye contact,” Bailenson said; he estimates that it’s at least 10 times higher in virtual meetings than in person.

Research has shown that the feeling of being watched (even by a static picture of a pair of eyes) causes people to change their behaviour; they act more as they believe they are expected to act, more diligently and responsibly. This sounds positive, but it also causes a hit to self-esteem, says Bailenson. In effect, the act of being in a meeting can become something of a performance, leaving the actor feeling drained.

For all these reasons, online video is only sometimes a good idea, experts say. “It’s all contextual,” says Michael Stefanone, a communications expert at the University of Buffalo. “The idea that everyone needs video is wrong.”

Research has shown that if people need to establish a new bond of trust between them (like new work colleagues or potential dating partners), then “richer” technologies (video, say, as opposed to text) are better. This means, says Stefanone, that video is important for people with no prior history – “zero-history groups” like him and me.

Indeed, despite a series of emails exchanged prior to our conversation, I get a different impression of Stefanone over Zoom than I did before, as he wrangles his young daughter down for a nap while we chat. I instantly feel I know him a little; this makes it feel more natural to trust his expertise. “If you’re meeting someone for the first time, you look for cues of affection, of deception,” he says.

But once a relationship has been established, Stefanone says, visual cues become less important. (“Email from a stranger is a pretty lean experience. Email from my old friend from grade school is a very rich experience. I get a letter from them and I can hear their laughter even if I haven’t seen them in a long time.”)

Visual cues can even become detrimental if the distracting downsides of the firehose effect, alongside privacy issues and the annoyance of even tiny delays in a video feed, outweigh the benefits.

“If I have a class of 150 students, I don’t need to see them in their bedrooms,” says Stefanone. He laughs, “I eliminate my own video feed during meetings, because I find myself just staring at my hair.”

In addition to simply turning off video streams occasionally, Bailenson also supports another, high-tech solution: replacing visual feeds with an automated intelligent avatar.

The idea is that your face onscreen is replaced by a cartoon; an algorithm generates facial expressions and gestures that match your words and tone as you speak. If you turn off your camera and get up to make a cup of tea, your avatar stays professionally seated and continues to make appropriate gestures.

(Bailenson demonstrates during his keynote, his avatar gesturing away as he talks: “You guys don’t know this but I’ve stood up…. I’m pacing, I’m stretching, I’m eating an apple.”) Bailenson was working with the company Loom.ai to develop this particular avatar plug-in for Zoom, but he says that specific project has since been dropped. “Someone else needs to build one,” he later tells me.

Such solutions could be good, says Jack, who studies facial communication cues, for teachers or lecturers who want visual feedback from their listeners to keep them motivated, without the unnecessary or misleading distractions that often come along with “real” images.

  • A Knowable Magazine report

About author

Your email address will not be published. Required fields are marked *