Does Apple’s double-tap gesture solve the mobile/human interface problem?

Apple this week demonstrated a new gesture to use with the latest Apple Watch — a two-finger tapping motion. These kinds of gestures could represent a whole new way of interacting with computers and other tech devices.

Smartphones are arguably more powerful than PCs. Consider that the iPhone 12, at 11 teraflops, outperformed the Cray 2 supercomputer by over 5000 times. And that’s a now-three-year-old phone.

Even Samsung argues that modern smartphones are more powerful than PCs. Even if that isn’t true for high-end PCs, the performance of the two classes of devices is very close between top-end smartphones and high-volume PCs with integrated graphics.

So why haven’t we yet ditched PCs for smartphones? Because of the human interface. A computer uses a keyboard and mouse (or trackpad), while a smartphone, which ditched the physical keyboard with the arrival of the first iPhone, typically relies on a virtual keyboard and touch, both of which compromise an already much smaller display.

Head-mounted displays might some day solve the screen-size issue, but the lack of a keyboard and mouse for input and navigation remains a big impediment to losing the increasingly redundant PC.

This week, Apple launched the iPhone 15 and a new Apple Watch. The Apple Watch has even less screen real estate (by a lot) than the iPhone, but it now come with a new feature that could up-end things: a double-tap capability that could open the door to a new interface that replaces the keyboard and mouse and will work with a head-mounted display like the upcoming Apple Vision Pro.

Let’s explore.

The double tap

The double-tap feature, which has been available as a little-known Accessibility option for a while, allows an Apple Watch user to execute a command without touching their watch or phone by just tapping their fingers together twice. The phone isn’t involved (that’s how Microsoft handled a similar problem with HoloLens). The tapping motion is based on wrist movements instead. While initially this particular maneuver is limited to one gesture, there’s no reason a wide variety of other gestures couldn’t emerge.

Imagine if you could communicate with your device using one hand instead of two; you could create documents at twice the speed you do now (and that’s without AI finishing words and sentences for you, which would be even faster). You would only need a smartphone and a headset instead of a full-fledged PC to get your work done.

American Sign Language vs. keyboard and mouse

Years ago, I trained to use American Sign Language (ASL) — not to communicate with people who had hearing problems but so I could communicate in loud environments (where I was basically deaf). ASL uses a variety of mainly one-handed gestures.

This is where it gets interesting.

Typically, you need to be able to sign at a speed of 110 to 130 words per minute to have a conversation, while professional typists type at between 43 and 95 words per minute. This was news to me when I first learned it. But think of the implications of moving from typing to sign language: you’d see a significant positive impact on productivity and the side benefits of being able to better communicate in noisy environments and with those who have hearing impairments.

For computing-related tasks, you’d still need something like a mouse. But new head-mounted displays like the Apple Vision Pro have eye tracking, so you can use your eyes to move the “mouse” and that double tap gesture to select what ever is highlighted by your eye cursor.

ASL vs. voice

People can speak at 110 to 150 words per minute, but at those higher speeds, they become harder to follow. While speech-to-text has been highlighted as another way to replace the keyboard on a small device, you still have to deal with uncomfortable and relatively unattractive accessories that go over your mouth. Even with virtual reality, which makes sense for gaming and collaboration, this solution is an ugly keyboard replacement. And the same kinds of problems (think punctuation) exist for sign language and voice. (Fortunately, we now have AI tools that can analyze what’s said and add punctuation.)

There may be a way to use the double-tap technology Apple demonstrated this week to create some kind of voice capture — you’d move your lips and tongue silently to create text. But I’m not aware of anyone working on this kind of solution yet.

A game-changing gesture

I think Apple’s double-tap is a potential game changer for how we interact with technology. Even in this initial use, where it only does one thing, it makes the Apple Watch far more useful. And if it can be made to capture more complex gestures like those required for sign language, it could evolve to be the next great human/machine interface. That would finally allow the smartphone to rise to its potential and displace the PC with a far more portable device and experience.