Why the future of Enterprise Voice Collaboration will feel more like Superhuman than Slack.
|Brett Bivens||Jan 15|
Historically, almost all of the context necessary to understand the workings of a given business in its early days was housed under one roof – the proverbial two people in a garage.
Sketches on a whiteboard, screens on a laptop that could be slid across a desk, and ideas inside the heads of people in the room that were generally conveyed via conversation.
As coordination complexity grows within an organization — more people join the team, a product goes from in development to in the market, or, as is the case in many companies getting off the ground today, the team works in a distributed manner from day one — we tend to shift to communication methods that solve for not having everyone in the same room working on the same problem at the same time.
One of the trade offs to accommodate for this added complexity is lower resolution communication methods with less information density.
This creates near term efficiency by encompassing the broad set of activities a person, team, or company must undertake with each of its stakeholders (less context switching between communication mediums is one example) but diminishes the effectiveness of each specific interaction and tends to cause a company to accumulate communication debt over time (we'll come back to this concept).
Low resolution communication (WhatsApp, email, Slack) — as Alex Danco points out in his great essay that prompted me to write about the topic — can actually serve as a creativity enabler by opening up space for each individual to interpret a message in their own way and act accordingly.
But in the context of most business-related collaboration, ambiguous communication simply creates extra cost. Follow ups to clarify, time spent deciphering sentiment in a hastily typed email, etc.
In considering the ways that we will work in the future, we tend to think far too linearly and paint what is to come with an overly broad brush — some “thing” will replace Slack which replaced email which replaced the phone.
This line of thinking is understandable, as it arises from our experience — experience that saw us facing exponentially increasing communication complexity with a limited set of tools we could use to navigate.
Over time, as jobs became more multi-faceted and more business functions were pushed outside the four walls of a single company, the decision tree on how to best handle each individual interaction got more complex:
Internal vs. External
Synchronous vs. Asynchronous
Collaborative vs. Solo
In Person vs. Remote
Planned vs. Unplanned
Formal vs. Informal
In most cases, to restate an earlier point, we solved for this complexity by throwing up our hands and converging on tools that served merely as half-measures (and often added more work in the process!).
This lowest common denominator form of communication — what Kevin Kwok calls “911 for whatever isn’t possible natively in a company’s productivity apps” — grows over time to the point where, to further the analogy, we are hailing an ambulance for a minor headache (or a minor edit request in Google Docs) when the solution should be available natively (or in our own medicine cabinet).
Instead of doing the hard work to to thoughtfully design internal workflows (or, to beat the analogy to death, personal fitness habits to keep our health in order), we default to the easy choice in the moment that negatively compounds and gets more challenging to unwind over time.
Companies and individual workers are seeing this collaboration debt build up in real time and pushing back by using products designed with specific workflows or jobs in mind and focused on becoming places where real work gets done.
From The Arc of Collaboration:
As it becomes more clear what are specific functional jobs to be done, we see more specialized apps closely aligned with solving for that specific loop. And increasingly collaboration is built in natively to them. In fact, for many reasons collaboration being natively built into them may be one of the main driving forces behind the venture interest and success in these spaces. As these apps proliferate, there is less and less need to turn to Slack. And Slack becomes more and more about the edge cases that aren’t yet built in.
This forces Slack into a significantly diminished role, and one it is poorly suited to play — that of contextual communication layer connecting all of these disparate apps. With their recent release of Workflow Builder they seem to have acknowledged both their culpability in enabling poor communication hygiene and their need to change, though they likely can't go far enough without totally blowing up the core way their product is used today.
In a paradigm where Slack exists as the organizational center of gravity, being a largely "cool" media platform makes sense as its purpose is as much cultural and collaborative (even if it doesn't pull this off well) as it is executory.
In the coming paradigm, where gravity is unbundled into individual apps, the contextual layer will serve a much more utilitarian purpose. The contextual communication layer won't attempt to be the place "where work gets done" but will instead be a highly efficient vessel through which we gather context, deliver a high density message, and get back to work.
If Slack is the massive sun at the center of the work solar system, the contextual layer will be something akin to teleportation between planets that drops you onto each new rock with the spacesuit you need to survive the climate and the language skills to talk with the aliens (or designers 😉).
Voice and the Contextual Layer
Returning to the title of this post, I believe that the contextual layer bridging all of these workflow-specific products will have voice at its core. And to return to the last paragraph, that is because the contextual communication layer will largely be responsible for helping us deliver high density, high resolution messages that serve as an efficient means to an end (the actual work) and not an end unto itself (which Slack tends to impose on users).
Audio is the best medium for delivering this type of communication.
To pull from Alex Danco's piece once more:
Audio, especially verbal speech, is tremendously high in information content. Most people are unaware of this. We mistakenly think of information as sensory input being thrown at us, usually with a bias towards our visual senses. But information isn’t what we’re told; it’s what we understand. Audio and speech resolve uncertainty and communicate meaning more powerfully than any other format. Audible speech burns hot with information. Intonation, accents, innuendo, vocal phrasing, emphasis, pauses, all communicate far more than a transcript can. Audio is the format for “You all know exactly what I’m talking about, because of the way I’m saying it.” Audio is how you communicate what you really mean, straight into ears, headphones and car radios, intimately and directly. Music is good at this, but speech is even better.
This seamless integration of audio into our workflows hasn't been possible from a technical standpoint until recently (although technical gaps still remain).
AirPod ubiquity, and the user behavior shift that came along with it is one element of the progress. Another step forward has been the ability to deliver high quality, low latency audio in sync with additional "collaboration" functionality (screen-sharing, to use a basic example).
Today, we have reached a point where an audio-first contextual layer in the enterprise is possible. The question that remains is what it will look like.
Tandem: Discord for the Enterprise
In The Arc of Collaboration, Kwok argues that the experience most closely resembling the ideal contextual layer (he calls it the meta-layer) is that of Discord:
Discord is actually two products bundled into one. It is a messaging app that looks akin to Slack. But it is also a meta-layer that runs across all games. Beyond its Slack-like functionality, Discord has functionality like a social graph, seeing what games your friends are playing, voice chat, etc. These have been misunderstood by the market. They aren’t random small features. They are the backbone of a central nervous system.
This certainly makes logical sense and, without knowing the numbers behind Discord, seems likely to be backed up by engagement numbers indicating this product and network paradigm is highly engaging and "productive".
This certainly seems to me to be the path being pursued by Tandem, a company that recently exploded out of YC with a large Seed round led by Andreesen Horowitz. Tandem talks about itself as a "Virtual Office", moving the communication paradigm from text based (like Slack) to audio based. The general idea of Tandem is accurately depicted in the image below. Four walls under which all work gets done (bringing us back to the top of the page and the value of housing all company related "context" under one roof).
Yet as compelling as the Tandem (or another well executed Discord for Enterprise) value proposition is, another recent Andreessen Horowitz investment may actually serve as a better analogy for helping us understand where value will be captured at the contextual audio communication layer.
Superhuman and “Low Gravity” Productivity
When I set out to write about voice collaboration and the contextual communication layer, I certainly didn't intend to end up writing another Superhuman for X post...yet here we are. 😊
For those who follow the collaboration and productivity market, the Superhuman story is well understood. Take a commodity, often free, piece of the work stack and create a premium product that delivers a user experience orders of magnitude better than the alternatives.
Superhuman is certainly a place where work gets done but, even for users whose work revolves around email, comes at it from the philosophy that everyone has higher order tasks worth working on and tries to get users back to those tasks as fast as possible.
Today, I have a multitude of free ways to deliver audio (live or asynchronous) to people I work with — Zoom, phone, an audio message on WhatsApp. But the experience remains highly disjointed and, as with email pre-Superhuman, disorganized and prone to the accumulation of communication debt for both sender and receiver over time. Additionally, audio communication in the enterprise tends to exist in service of executing higher order work and is not, itself, "the work".
To me, this calls for a product focused on speed, high density context, and a bias towards being "low gravity".
Superhuman is philosophically low gravity — the opposite of Slack. It wants to deliver a delightful, highly productive experience and then help you move on to something else. Counterintuitively, I imagine the less time users spend in Superhuman, the more likely they are to continue paying for it. This will hold true in voice collaboration as well.
We may not be at the point today where enough users are actively bemoaning the brokenness of audio collaboration in the enterprise in the same way they do email, but that day is not far off. As AirPods and other hearable devices continue to proliferate and our at work collaboration behavior shifts further and faster in the direction of voice, the company best positioned to capture value from the shift will be the one that, instead of fencing users in, helps return them — faster and better equipped — to the more important and productive parts of their day.