I led research and design for 3Play Media’s Live Professional Captioning offering, connecting professional captioners with organizations needing high-quality, real-time captions through a two-sided marketplace and a proprietary, home-grown captioning interface.
Note: To comply with my non-disclosure agreement (NDA), I have omitted and obfuscated confidential information in this case study. All information presented in this case study is my own and does not necessarily reflect the views of 3Play Media.
Company
3Play Media
Timeline
Q1 2020 - Q2 2022 (16 months)
Platforms
Desktop, Mobile Web
My Role
Design and Research Lead
Key Collaborators: I worked as a solo user researcher and designer for this project, supporting three developers, a captioning subject matter expert, and a product manager.
3Play Media is a company that powers a lot of the key accessibility solutions (think captioning, translation, and audio description) on the web for individuals with disabilities.
In comparison to its competitors, 3Play thrives due to its loyal contractor base and investment in technological solutions, creating a scalable and repeatable business model that works across different verticals of the business.
Gaps in technology have historically prevented individuals with disabilities from gaining equitable access to content on the web. In particular, audio and video content (podcasts, short-form video, and long-form content) becoming more popular over the past few years has only increased the barrier to access for users with disabilities.
Live captions are a form of providing access to real-time time-based media such as meetings, webinars, conferences for people who primarily have auditory disabilities, although it also helps people with cognitive disabilities comprehend content better.
Generated using artifical intelligence, have been around for more than a decade, but only partially bridge the accessibility gap for people with auditory or cognitive disabilities.
Created with the manual input of a human captioner either through stenography or voice writing, have historically only been available to broadcast, television-based customers.
Our goal was to apply our scalable business model to the world of live professional captioning and web-based meetings, something no other competitors did at scale.
Especially with the onset of the pandemic, at-home quarantine and remote work accelerated the need for high-accuracy live captioning solutions, as all live meetings and events were conducted virtually and real-time text-to-speech solutions were not satisfactory enough for users with auditory disabilities. Additionally, existing human-powered live captioning solutions in the market were not scalable and extremely “high-touch”, typically used for broadcast television instead of online events.
Our timeline was not linear, but I’ve tried my best to provide a high-level overview of what it looked like and how I was involved in the process.
User research activities included:
After collecting sufficient data, I created initial mockups and prototypes based on needs we heard from our interviews with customers and contractors. We brought this back to the same set of users and conducted a series of formative usability tests to make changes to the designs.
Our team launched in a closed beta capacity in October of 2021.
We created a closed feedback loop using surveys in Hotjar to capture frustrations, bugs, and pain points that would inform changes to the design.
As of May 2022, the customer base is growing exponentially. I’ve shifted my focus slightly from designing for customers and contractors to internal stakeholders, creating tools that automate key portions of their workflow to allow them to focus on scaling the platform to reach more people in the future.
We started with a competitive analysis, with the goal to audit different categories of existing solutions in market to identify current technologies and buyer critia.
Professional captioning solutions in market are expensive due to the large amount of operational overhead. The scheduling process for ordering captions and coordinating event setup is extremely manual; additionally, software solutions are outdated and do not integrate with modern streaming technologies.
High accuracy compared to automatic captions
Higher comprehensibility due to support of proper nouns, speaker names, acronyms, and phrases
Stenography requires a very specialized skillset (years of training)
High overhead and manual coordination
Automatic captioning solutions are quick and cheap, as there are a lot of competitors offering free or close-to-free live speech-to-text solutions. Our company already had a live automatic captioning solution, but decided strategically that this was a red ocean space we did not want to compete in due to the highly competitive pricing race to the bottom.
Cheap, low-cost
Low latency
Most changes can be made through self-serve UI
Lack of support of proper nouns, speaker names, acronyms, and phrases
Low accuracy and comprehensibility
Prior to working in the world of accessibility, I was curious about the technology that powered so many of the captions for the many shows and live news channels on TV, but never had put too much thought into it.
A captioner listens to the stream audio or video file, and keeps in mind what’s been said by whom.
The captioner repeats back the speech and inserts punctuation, speaker labels, and other context where applicable.
Captions are delivered directly back to the stream with a slight latency or to an external screen containing the transcript.
The challenge with scaling a marketplace is figuring out to sustainably grow each side of the market.
Finding customers was not a challenge initially, as most of our efforts centered around building a UI to enable live captioning delivery. However, once we had trained enough captioners, we needed to scale up demand so that we wouldn’t see too much churn or dissatisfaction.
As a result, this required a lot of pivoting and context switching between different design initiatives to maintain the balance of the market.
I worked with my team to create a research protocol that would help us uncover customer needs, frustrations, pain points, behaviors, aptitudes, attitudes, and current solutions related to the world of live captioning.
Some major takeaways from the research:
1. There are two primary groups of users:
2. Customers need upgrade and event completion paths to publish events instantly to their platform of choice with fully accurate transcripts
3. High accuracy is important, and can be achieved with the following event information:
At an executive level, scaling live captioning was the most important product initaitive. At a product level, the key metrics we cared most about were number of total live events, number of contractors trained, overhead of internal team members to coordinate live events, and revenue generated.
Based on these metrics, I worked with my team to come up with a series of UX and product-centric metrics that could be directly mapped back to performance and product success.
The final solution involved three separate interfaces for three different sets of stakeholders. Together, these interfaces worked in conjunction to power the infrastructure behind our live captioning product.
For our customers, I designed a simple, multi-step process and dashboard for high visibility on upcoming, previous, and in-progress events that need live captioning. This multi-step process allowed customers to self-serve and self-diagnose issues related to their live captioning events.
I designed the first iterations of our captioning interface to be robust, simple, and scalable. Although we were primarily a Figma shop, I opted to use Adobe XD for the first set of low-fidelity prototypes because of the built-in voice prototyping feature, which was extremely cool to demo the voice writing technology that we would use to power the captioning.
For our internal operations and support teams, we wanted to provide supporting tools to help them match jobs to available contractors, see trends at a macro-level market view, while providing robust tooling to debug and support events on an individual event level.
In February of 2022, 3Play acquired two organizations, Captionmax and National Captioning Canada. With the acquisition, we now had access to dozens of full-time captioners that had never seen our captioning interface before, but would have to use it in the future.
Over the next several months, I engaged in some ethnographic and generative research, which allowed me to get a better glimpse into existing captioning interfaces, captioner behavior and preferences, and map those back to gaps in our interfaces.
We used this data to create a survey that helped us understand missing features and table stakes that our interface did not support, that would be crucial to build out before migrating these contractors over to our new interface.
The future of live captioning is bright with 3Play’s scalable platform. Watching this product evolve over the past year and a half has been an amazing experience, especially seeing the impact on the individuals who benefit from the service.
Credits and thanks:
Photos from Unsplash. Icons from Flaticon.
Thanks for checking out my work. If you have questions or want to get in touch with me, please reach out by email or connect with me on LinkedIn!
Want to read another case study?
Making Online Media Accessible to All