Several options are available for automatic or machine-generated captions/transcripts for both live and recorded materials. This process is commonly referred to as automatic speech recognition (ASR) and often has an accuracy rate of anywhere from 60% - 85%.
ASR solutions are available within an LMS (i.e., Blackboard, Collab, Canvas), video tools (i.e., YouTube, Panopto, Kaltura), and video conferencing tools (i.e., Zoom, Teams, Webex). While this technology is improving quickly, the resulting product still does not meet the level of accuracy for academic purposes by default.
AN ADDITIONAL STEP IS REQUIRED when ASR captions/transcripts are the planned solution for captions/transcripts of a recorded event. After the event has ended and the recording is ready, the ASR captions/transcripts must be reviewed and edited to meet the 99% accuracy threshold required.
When using vendor-based, professional services (companies that employ human transcriptionists), the resulting product reaches a minimum of 99% for most recorded materials and often 85% or above for live events. A person creating these resources is also more likely to understand the context of the audio and provide more accurate captions/transcripts, and understanding the context of the event/recording.
Understanding the time involved with this review, the Captioning Project focuses on providing solutions that do not require this additional step. Captions and transcripts provided by professional services are required by contract to achieve an accuracy rate of 99%. The intention of the project's combination of central funding and the coordinated management approach is to remove as many barriers as possible, making it as straightforward as possible for instructors to provide these resources for enhanced engagement in their courses.