Commercializing Live Captioning for the Web

I led research and design for 3Play Media’s Live Professional Captioning offering, connecting professional captioners with organizations needing high-quality, real-time captions through a two-sided marketplace and a proprietary, home-grown captioning interface.

Note: To comply with my non-disclosure agreement (NDA), I have omitted and obfuscated confidential information in this case study. All information presented in this case study is my own and does not necessarily reflect the views of 3Play Media.

Project Overview

Company

3Play Media

Timeline

Q1 2020 - Q2 2022 (16 months)

Platforms

Desktop, Mobile Web

My Role

Design and Research Lead

Key Collaborators: I worked as a solo user researcher and designer for this project, supporting three developers, a captioning subject matter expert, and a product manager.

A Bit of Background to Start...

About the Company

3Play Media is a company that powers a lot of the key accessibility solutions (think captioning, translation, and audio description) on the web for individuals with disabilities.

In comparison to its competitors, 3Play thrives due to its loyal contractor base and investment in technological solutions, creating a scalable and repeatable business model that works across different verticals of the business.

About Live Captions

Gaps in technology have historically prevented individuals with disabilities from gaining equitable access to content on the web. In particular, audio and video content (podcasts, short-form video, and long-form content) becoming more popular over the past few years has only increased the barrier to access for users with disabilities.

Live captions are a form of providing access to real-time time-based media such as meetings, webinars, conferences for people who primarily have auditory disabilities, although it also helps people with cognitive disabilities comprehend content better.

Automatic Captions

Generated using artifical intelligence, have been around for more than a decade, but only partially bridge the accessibility gap for people with auditory or cognitive disabilities.

Professional (or human-generated) Captions

Created with the manual input of a human captioner either through stenography or voice writing, have historically only been available to broadcast, television-based customers.

Disrupting a Primitive Market

Our goal was to apply our scalable business model to the world of live professional captioning and web-based meetings, something no other competitors did at scale.

Especially with the onset of the pandemic, at-home quarantine and remote work accelerated the need for high-accuracy live captioning solutions, as all live meetings and events were conducted virtually and real-time text-to-speech solutions were not satisfactory enough for users with auditory disabilities. Additionally, existing human-powered live captioning solutions in the market were not scalable and extremely “high-touch”, typically used for broadcast television instead of online events.

The Timeline and Process

Our timeline was not linear, but I’ve tried my best to provide a high-level overview of what it looked like and how I was involved in the process.

About Live Captions

User research activities included:

  • General web accessibility needs
  • Existing publishing behavior and associated challenges
  • Industry trends and organizational structures
  • Audio and video user engagement analytics tracking

Design and Testing

After collecting sufficient data, I created initial mockups and prototypes based on needs we heard from our interviews with customers and contractors. We brought this back to the same set of users and conducted a series of formative usability tests to make changes to the designs.

Closed Beta Launch

Our team launched in a closed beta capacity in October of 2021.

We created a closed feedback loop using surveys in Hotjar to capture frustrations, bugs, and pain points that would inform changes to the design.

Scaling and Growing

As of May 2022, the customer base is growing exponentially. I’ve shifted my focus slightly from designing for customers and contractors to internal stakeholders, creating tools that automate key portions of their workflow to allow them to focus on scaling the platform to reach more people in the future.

Taking a Look at the Competition

We started with a competitive analysis, with the goal to audit different categories of existing solutions in market to identify current technologies and buyer critia.

Professional captioning solutions in market are expensive due to the large amount of operational overhead. The scheduling process for ordering captions and coordinating event setup is extremely manual; additionally, software solutions are outdated and do not integrate with modern streaming technologies.

✅ Pros

High accuracy compared to automatic captions

Higher comprehensibility due to support of proper nouns, speaker names, acronyms, and phrases

⚠️️ Opportunities for Improvement

Stenography requires a very specialized skillset (years of training)

High overhead and manual coordination

Example Companies:

Automatic captioning solutions are quick and cheap, as there are a lot of competitors offering free or close-to-free live speech-to-text solutions. Our company already had a live automatic captioning solution, but decided strategically that this was a red ocean space we did not want to compete in due to the highly competitive pricing race to the bottom.

✅ Pros

Cheap, low-cost

Low latency

Most changes can be made through self-serve UI

⚠️️ Opportunities for Improvement

Lack of support of proper nouns, speaker names, acronyms, and phrases

Low accuracy and comprehensibility

Example Companies

Voice Writing: The Technology Powering Live Captioning

Prior to working in the world of accessibility, I was curious about the technology that powered so many of the captions for the many shows and live news channels on TV, but never had put too much thought into it.

Introducing Voice Writing:

Illustration of a headset playing sound
Illustration of captioner with microphone headset
Illustration of live speaker on a laptop

1. Listen

A captioner listens to the stream audio or video file, and keeps in mind what’s been said by whom.

2. Caption

The captioner repeats back the speech and inserts punctuation, speaker labels, and other context where applicable.

3. Deliver

Captions are delivered directly back to the stream with a slight latency or to an external screen containing the transcript.

The Chicken or The Egg? 🐣

The challenge with scaling a marketplace is figuring out to sustainably grow each side of the market.

Finding customers was not a challenge initially, as most of our efforts centered around building a UI to enable live captioning delivery. However, once we had trained enough captioners, we needed to scale up demand so that we wouldn’t see too much churn or dissatisfaction.

As a result, this required a lot of pivoting and context switching between different design initiatives to maintain the balance of the market.

Imbalanced scale with finger pressing down on one side to rebalance the scale

Summary from Months of User Research

I worked with my team to create a research protocol that would help us uncover customer needs, frustrations, pain points, behaviors, aptitudes, attitudes, and current solutions related to the world of live captioning.

Some major takeaways from the research:

1. There are two primary groups of users:

  • account administrators who coordinate events but do not use the 3Play system
  • daily operations specialists who order events through 3Play

2. Customers need upgrade and event completion paths to publish events instantly to their platform of choice with fully accurate transcripts

3. High accuracy is important, and can be achieved with the following event information:

  • acronyms
  • wordlists
  • supporting presentational material
  • captioners who are subject matter experts

Measuring Success – Alignment with Product Goals

At an executive level, scaling live captioning was the most important product initaitive. At a product level, the key metrics we cared most about were number of total live events, number of contractors trained, overhead of internal team members to coordinate live events, and revenue generated.

Based on these metrics, I worked with my team to come up with a series of UX and product-centric metrics that could be directly mapped back to performance and product success.

Metrics:
Customer abandonment rate, customer support requests, number of manual interventions by internal stakeholders, contractor NPS and ease of use

A Three-Pronged Approach to Live Captioning

The final solution involved three separate interfaces for three different sets of stakeholders. Together, these interfaces worked in conjunction to power the infrastructure behind our live captioning product.

An Unforeseen Opportunity

In February of 2022, 3Play acquired two organizations, Captionmax and National Captioning Canada. With the acquisition, we now had access to dozens of full-time captioners that had never seen our captioning interface before, but would have to use it in the future.

Over the next several months, I engaged in some ethnographic and generative research, which allowed me to get a better glimpse into existing captioning interfaces, captioner behavior and preferences, and map those back to gaps in our interfaces.

We used this data to create a survey that helped us understand missing features and table stakes that our interface did not support, that would be crucial to build out before migrating these contractors over to our new interface.

Looking into the Future

The future of live captioning is bright with 3Play’s scalable platform. Watching this product evolve over the past year and a half has been an amazing experience, especially seeing the impact on the individuals who benefit from the service.

Credits and thanks:
Photos from Unsplash. Icons from Flaticon.

Profile image of Derek

Derek Mei

Thanks for checking out my work. If you have questions or want to get in touch with me, please reach out by email or connect with me on LinkedIn!

SCROLL TO SECTION ↑