Home

Awesome

<h1>HTML5 Audio Read-Along</h1>

<a href="http://westonruter.github.com/html5-audio-read-along/" title="Jump to live demo"><img src="https://github.com/westonruter/html5-audio-read-along/raw/master/screenshot.png?3" alt="Screenshot of app" width="340"></a>

<p><em>Jump straight to the <a href="http://westonruter.github.com/html5-audio-read-along/">live demo</a>.</em></p> <p>When I was in college, my most valuable tool for writing papers was a text-to-speech (<abbr title="text-to-speech">TTS</abbr>) program. I could paste in a draft of my paper and it would highlight each word as it was spoken, so I could give my proof-reading eyes a break and do proof-listening while I read along; I caught many mistakes I would have missed. Likewise, for powering through course readings I would copy the material into the TTS program whenever possible and speed up the reading rate; because the words are highlighted, it's easy to re-find your place if you look away and just listen for awhile. (I constantly use OS X's selected-text speech feature, but unfortunately it does not highlight words). A decade after my college days, I would have hoped that such TTS read-alongs would have become common on the Web (though there is work-in-progress <a title="chrome.tts Google Chrome Extensions API" href="http://code.google.com/chrome/extensions/tts.html">Chrome API</a> and a <a href="http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0022/htmltts-draft.html" title="HTML Text to Speech (TTS) API Specification">W3C draft spec</a> now under development), even as read-along apps are prolific in places like the Apple App Store for kids books.</p> <p>I created an intial version of this read-along demo in December 2009, so the passage I chose was naturally the nativity story, specifically from the Gospel of Luke in the <a href="http://www.esv.org/" target="_blank">English Standard Version</a> (<abbr title="English Standard Version">ESV</abbr>). I chose a biblical passage not only in keeping with the Christmas spirit but also because the ESV has an excellent <a href="http://www.esvapi.org/" target="_blank">API</a> which allows both a passage’s text and audio to be queried. With the text and audio in hand, each of the words in the text had to be time-indexed for its begin time and duration in the corresponding audio. In the past, audio Bibles were divided into chapter segments only and that was as granular as you could go; the ESV team did the <a title="The Development of Verse-Level Audio at the ESV Online Edition" href="http://www.gnpcb.org/esv/share/about/audio/">innovation</a> of taking this granularity down to the verse-level. Unfortunately, however, the granularity is not available at the word-level. Therefore, in order to make this read-along demo work, I manually traversed the audio to find each word’s begin time and duration, and I added these time indicies to the word markup as <code>data-begin</code> and <code>data-<abbr title="duration">dur</abbr></code> attributes, akin to SMIL’s <a title="SMIL 3.0 Timing and Synchronization: The begin Attribute" href="http://www.w3.org/TR/SMIL3/smil-timing.html#adef-begin" target="_blank"><code>begin</code></a> and <a title="SMIL 3.0 Timing and Synchronization: dur" href="http://www.w3.org/TR/SMIL3/smil-timing.html#adef-dur" target="_blank"><abbr title="duration"><code>dur</code></abbr></a> attributes. (As an aside, it took me a tedious <em>four hours</em> to manually obtain the time indices for this passage. Because of the pain endured, in 2011 I set out to find a way to automate the process of finding time indexes, and I had some success which can be found in my <a href="https://github.com/westonruter/esv-text-audio-aligner">ESV Text/Audio Aligner</a> project.)</p> <h2>My Wish</h2> <p>The <strong>ultimate goal</strong> I would have for this demo would be that it would inspire e-book publishers to work toward adding read-along functionality to their applications. Specifically, I have my eye on Amazon here. I think it is tragic that Amazon now owns Audible and has access to a vast amount of high-quality audio books, but that they are disconnected from Amazon's vast array of e-books in the Kindle store. Amazon needs to work toward integrating Kindle and Audible into one product. When I purchase a Kindle book there should be an option to also purchase the Audible book as part of a package. Then, instead of only being able to use the Kindle device's TTS to listen to the book (it is frustrating how TTS is only available on the Kindle device and not from any Kindle apps), I should be able to listen to the Audible audio book while I am reading the Kindle book, all from the same Kindle app on any supported device, even on the Cloud Reader (for which this demo could be directly applied). Amazon would just have to align their Audible audio books with the respective texts in their Kindle e-books, and there are text-audio alignment tools available for this purpose, as mentioned above.</p> <p>I imagine a Kindle/Audible app which would allow you to seamlessly switch between audio and visual reading modes. Think of listening to a book on your drive home from work, and then picking up where you left off at home with visual reading. If you're at an unimportant passage and start multitasking (e.g. in the kitchen), you could take your eyes off the screen and easily re-find your place since each word is highlighted as it is spoken. Furthermore, having the text-audio alignment would enable highlights and note-taking while just listening to the audio; there could be an app button, for example, that when pressed would cause the spoken audio to be highlighted in the Kindle book; likewise, there could be a button to add a voice memo to the book at that point and it would appear in the Kindle book as a text note via speech recognition/dictation. With integrated audio and text, new modes of reading would be enabled and reading would be much more accessible. <em>Amazon and other e-book publishers, hear me!</em></p> <h2>Instructions</h2> <p>Upon playing the audio, the word in the text corresponding to the one currently being spoken in the audio is highlighted. When manually adjusting the seek position, the words which correspond to each audio position will be highlighted; and conversely, clicking a word causes the audio to seek to its corresponding position (and double-clicking will then cause it to start playing). <em>Thus the text itself serves as an interface for navigating the audio.</em> There is also a <strong>keyboard interface</strong> for navigating the text. Each word in the text is focusable, and upon tabbing to a word you may hit <kbd>Enter</kbd> to seek the audio to that point; there is also checkbox toggle for whether highlighted words should be auto-focused. Hitting <kbd>Spacebar</kbd> toggles play/pause.</p> <h2>Browser Support</h2> <p>The read-along demo works in the latest stable versions of Firefox, Chrome, Safari, and Opera (it may even work in Internet Explorer 9); I've also tested on iPhone and iPad (iOS 5). Safari and Chrome play the MP3 as served from the ESV API. Firefox doesn’t support MP3 so I include an OGG Vorbis <code>source</code> as well. There is also an 8kHz WAV fallback. Note that the speech rate control only works in browsers (e.g. Chrome and Safari) that implement <code>HTML5MediaElement.playbackRate</code> property; currently detection for <code>playbackRate</code> support in iOS is failing, so changing the range control in iOS will have no effect. Note that increasing the reading rate will decrease the accuracy of the word highlights since the words cease being spoken long enough for <code>setTimeout</code> to fire quickly enough.</p> <h2>Credits</h2> <p>Demo created by <a href="https://plus.google.com/113853198722136596993" rel="author">Weston Ruter</a> (<a href="https://twitter.com/westonruter">@westonruter</a>), <a href="http://x-team.com/" title="My employer">X-Team</a>. Code is licensed <a href="http://www.opensource.org/licenses/MIT" rel="license">MIT</a>/<a href="http://www.gnu.org/licenses/gpl.html" rel="license">GPL</a>.</p> <p>Scripture taken from The Holy Bible, English Standard Version. Copyright ©2001 by <a href="http://www.crosswaybibles.org" target="_blank">Crossway Bibles</a>, a publishing ministry of Good News Publishers. Used by permission. All rights reserved. Data obtained from the <a href="http://www.gnpcb.org/esv/share/services/" target="_blank">ESV Bible Web Service</a>. <hr> <p><small>I have put redirects from my blog to GitHub, so the comments on my blog are no longer accessible there. For archive purposes, I've posted the <a href="http://westonruter.github.com/html5-audio-read-along/wordpress-comment-archive.html">raw comments</a> to gh-pages</a>.</small></p>