Community Help

erlend_thune · ‎02-20-2018

I have made some javascript code to support interactive YouTube transcripts. A working example can be seen here: http://www.erlendthune.com/yt/ytexample.html Look here for instructions on how to make it work in Canvas or any other web page: PfDK.github.io/readme.md at master · PfDK/PfDK.github.io · GitHub

If anyone has any ideas on how to improve the code or want to help improving it, that would be great!

phanley · ‎02-20-2018

That's really awesome - if you embed that in a module, etc. does it pick up the Canvas styling?

James · ‎02-20-2018

@phanley ,

Iframes do not have access to the CSS styling of their parent windows. You would need to style it yourself in this case.

erlend_thune · ‎02-21-2018

That is a good idea Peter. James, it is only the youtube video that is in an iframe. The video transcript is part of the Canvas page, so it is possible to use any of the Canvas styling classes. I've added it as an issue in the GitHUB project.

Chris_Hofer · ‎02-20-2018

That's really neat, @erlend_thune Nice Work! Hey @James ‌, have a look at this! ( @Renee_Carney ‌ suggested I reach out to you.)

James · ‎02-20-2018

Nice use of the YouTube API to bring the text off the video and onto the page, @erlend_thune .

I stumbled across one of your other contributions this week as I was trying to get WordPress and H5P to play nice with Canvas. I ended up breaking my xAPI installation before I got to try your script, but thank you for making these available to people.

One way around the duplicated video issue, if that really is an issue, would be to get the ID of the video from the src attribute on the iframe and make everything else generic without tacking on the specific video ID.

Were you envisioning that people use this outside of an iframe? If so, ignore most of what I wrote in the last paragraph.

What is the purpose of the 0:00 at the beginning. it's the only timestamp and it doesn't change for me.

Moving forward past the initial "wow this is really cool and a great idea" effect and it's great that this code is available and documented so that people can build off of it, I'm trying to wrap my brain around how it would be used. I said the same thing when Alexa for Canvas came out.

I can see the desire to extract a transcript automatically from YouTube. There has been a push to close caption videos and we finally have someone at our school other than me who is advocating that. The problem with YouTube transcription is that you lose all sense of the speaker and any paragraph breaks. In other words, you won't get nicely formatted output this way.

The text extracted from YouTube and displayed on the page ran together, which is where the highlighting came into play so that I could follow along and see where the text was.

I don't think this is technically possible through the YouTube API, but you lose the context of paragraphs and author when it is highlighting what is being said. If someone had a nicely formatted transcript that kept the formatting, it might be nice to highlight that text when the video is played. It would probably require the two transcripts to match really closely, though.

But now I'm back to how we will use this question? If I am sighted, but hearing impaired, then I would turn on the close captioning so that I could watch the video and not be distracted by the highlighting on the text distracting me and causing me to jump back and forth.

If I am visually impaired but can hear, then I am getting the audio from the video or the audio from the screen reader that is reading the transcript, but the screen reader isn't able to do anything other than read a single block of text since it isn't marked up in anyway with the speaker or paragraphs. Again, that's a limitation I think of where it came from.

I think I might have just answered my own question, but I tend to leave things I've written as a trail of my thought process. What I just noticed after writing all this is that blocks of text are highlighted when you mouse over something. I missed that the first time as I didn't mouse over the text, I just loaded the page and played the video. This allows you to jump to a specific portion of the video and I can see the use for that a little better. Videos tend to be too long and this would help a student jump straight to the relevant portion. So far, this is the best part of this for me.

Maybe you could add a section to the documentation the explains the features rather than just saying here's a demo? That way we know ahead of time what to expect and don't miss out on the biggest thing because we didn't move our mouse over it.

If there was a way to take a nicely formatted transcript and link it to the YouTube video and apply the interactivity to the formatted text, it would be amazingly awesome, although I'm not sure what benefit it would have over a nicely formatted text transcript with time stamps.

But I'll be the first to admit that I miss what may be obvious to other people. After all, I'm still trying to figure out the purpose of Alexa, and, to a lesser extent, cell phones.

erlend_thune · ‎02-21-2018

Thank you James. I will add a better description of the features on the GitHUB page! I agree that the yellow highlighting is disturbing, and it should at least be made a bit more nuanced!

All new educational material in Norway must fulfill the WCAG 2.0 universal design requirements, that's why I started looking into this. I have just taken a MOOC at Coursera, and noticed that they had much better support for transcripts than Canvas. They do however not highlight the text, probably for the same reasons as you describe. But, if someone for some reason wanted to only hear the sound of the video and read the transcript on screen while listening, I guess the highlighting could come in handy. A button where you can turn the highlighting on/off, and/or a parameter in the Javascript to turn that functionality on/off could be a start.

I am not sure what you mean by a nicely formatted transcript from YouTube. If you want line breaks etc., that is easy to add. YouTube does not support automatic subtitling of norwegian yet, so we have to write them in manually.

The purpose of the 0:00 timestamp is to be able to jump to the very start of the video. The UX and UI part can surely be improved.

Anyway, I agree that the best part is that you can search through the transcript for words you remember from the video and then be able to jump to that part of the video.

James · ‎02-21-2018

I am not sure what you mean by a nicely formatted transcript from YouTube.

What I meant was I didn't think that YouTube would identify who the speaker was or put breaks into the text. Other people may be able to speak in full and complete sentences, but I normally have a sentence that takes more than one screen of captions, so you wouldn't be able to break just on sentences.

What I consider a nicely formatted transcript would be one that indicates the speaker (if it changes), and has paragraphs with marked-up code, things that identify sound effects, etc. Basically what they describe at W3C Multimedia Accessibility FAQ and the example Podcast: Interview on WCAG 2 that was linked from there. What I was trying to say is that I don't think YouTube can give that level of captioning, but I might be missing something.

That WCAG page says that captioning is essentially a transcript synchronized with the video. That's what yours does and in a way that allows people to do translations on the page if needed, whereas closed captioning wouldn't be as easy to translate.

carolineecastil · ‎02-23-2018

This would be incredibly helpful for my ESL kids learning English. They can follow along and practice the pronunciation of unfamiliar words as well as hearing natural cadence.

phanley · ‎02-22-2018

I've been playing around with this script, and made a few changes based on my own ideas (mostly making it re-useable) and also sort of what @James brought up regarding the formatting - I should probably get back to work for now, but I'd be happy to keep working at it

Here's basically where I am with it:
https://academicapps.temple.edu/youtubeIT/ytitexample2.html?v=Ux1iQBU09oA

It also works with the original post video:
https://academicapps.temple.edu/youtubeIT/ytitexample2.html?v=Lm0m4VtZ3Us&name=bokmål&lang=no

But I can't figure out how to get it to work with arbitrary videos that have autogenerated captions:
https://academicapps.temple.edu/youtubeIT/ytitexample2.html?v=i1M95njhovw&name=&lang=en

Apologies for the personal video, but it was the only one I had where I was familiar with the captions (since I edited them myself).

Changes:

added Canvas css, although in retrospect it's probably not worth it since it only affects the typography of the transcript, which is maybe not worth the added bloat
- although it is nice to use the grid they use (iframe & transcript are side-by-side in col-xs-6 divs)
  - also I ended up adding the canvas style .content-box-mini to the btnSeek spans (in youtubeIT.js)
dynamically generate the iframe & transcript div from a GET parameter (?v=Ux1iQBU09oA) that can be copied from a youtube url i.e. ...youtube.com/watch?v=Ux1iQBU09oA uses a nice convenience obj called urlObject
```
var page_url = urlObject(window.location.href);
ytvid = page_url.parameters.v || "Ux1iQBU09oA";
lang = page_url.parameters.lang || "en";
vname = page_url.parameters.name || "";
```
- this needs some work - it's a little tedious to figure out the transcript language/name, so it should default to the autogenerated transcript if possible, but I haven't figured that out yet
Made the iframe "responsive" (sort of - it resizes with the window, anyway)
Changes to youtube.js:
- changed the name of the youtube.js to youTubeIT.js, but that's just because I wanted to edit it and be able to switch between the original and my modifications easily. At some point I refactored the object from mmooc to youTubeIT just to make it feel more "library-ish"
- the formatter addition required adding an function argument to the this.mmooc.youtube function to ensure the transcript being loaded before formatting, and playing around I changed the structure of the js a little bit so that the main function is named youTube and it gets called after the iframe_api is loaded like
```
youTubeIt.youtube = youTube(formatter);
```
- initalization is a little more complicated, but not really - you just have to declare formatter function before adding the youtubeIT.js script:
```
var formatter = function(){
    $('.btnSeek').each(function(i, val){
        var $p = $(val);
        $p.html(
            $p.html().replace(/^(Zaybee-Wan\:|Darth Paapa\:)/, '<strong>$1</strong>')
        );
    });
};

$.getScript( "youtubeIT.js");
```

I think that's it? I have some questions that I'll ask in another post.

erlend_thune · ‎02-23-2018

That's great Peter! I'll try to integrate your changes in the github project, unless you want to upload them yourself.

If you use the network inspector, you will see that the following url is called when you turn on CC on the auto generated CC video (https://www.youtube.com/watch?v=i1M95njhovw😞

Network inspector:

https://www.youtube.com/api/timedtext?caps=asr&hl=en_US&signature=690C7D532D9188E602B2A1D0241BA07009...

Breaking it up gives:

https://www.youtube.com/api/timedtext?caps=asr&
hl=en_US&
signature=690C7D532D9188E602B2A1D0241BA07009859780.801B7DC1758C282D6C8B30EC16904B8DF4BBD291&
xorp=True&
sparams=asr_langs,caps,v,xorp,expire&
v=i1M95njhovw&
asr_langs=ko,it,de,fr,ja,en,ru,es,pt,nl&
key=yttt1&
expire=1519414320&
kind=asr&
lang=en&
fmt=srv3

Now, removing the signature f.ex. results in an error message if you try to get that url:

https://www.youtube.com/api/timedtext?caps=asr&hl=en_US&xorp=True&sparams=asr_langs,caps,v,xorp,expi...

But, if you do the same with one of the videos with a manually added transcript, like this one (PfDK MOOC - Ny videreutdanning i digital kompetanse for lærere - YouTube ), you get this url for the timedtext in the network inspector:

https://www.youtube.com/api/timedtext?xorp=True&key=yttt1&signature=BE75D143A2A25FB531044D78A14135AE...

Breaking that up:

https://www.youtube.com/api/timedtext?xorp=True&
key=yttt1&
signature=BE75D143A2A25FB531044D78A14135AEFF3CB56F.37A684BEEF8FE6BBD04F91857DBE2DA387544765&
asr_langs=en,ja,fr,de,ko,it,nl,pt,es,ru&
v=Lm0m4VtZ3Us&
caps=asr&
hl=en_US&
sparams=asr_langs,caps,v,xorp,expire&
expire=1519416041&
lang=no&
name=bokmål&
fmt=srv3

Looks quite similar, but removing the signature and everything else except the v, lang and name parameters for that sake, still gives the subtitles:

https://www.youtube.com/api/timedtext?v=Lm0m4VtZ3Us&lang=no&name=bokm%C3%A5l

So, grabbing the subtitles for ASR (Automatic Speech Recognition) subtitles behave differently for some reason.

I've tried looking into using the Google youtube data api for downloading captions (Captions: download | YouTube Data API | Google Developers ), but I can't get it to work without asking the user to login with a google account!

Also, I guess there is a risk that YouTube will remove support for the timedtext urls, making it risky to use that approach in any case

One solution could be to store the subtitles on a private server, i.e dropbox f.ex. and download the subtitles from there instead. I tried that here: https://dl.dropboxusercontent.com/s/f1raehbpyabjucg/i1M95njhovw.xml

This html file uses that subtitle file:

https://www.erlendthune.com/yt/ytexample2.html

To make the javascript look in dropbox, I changed/added these lines in the js file:

-> changed: var hrefPrefix = "https://dl.dropbox.com/s/f1raehbpyabjucg/";
-> added: var hrefPostfix = ".xml";

-> changed: var href = hrefPrefix + videoId + hrefPostfix;

I also had to change this:

captionText = captions[i].textContent;
// captionText = captions[i].textContent.replace(/</g, '<').replace(/>/g, '>');

The last change was because the ASR timedtext for your video contained color coding for some reason!

I guess one could have a parameter in the url indicating where the script should look for the subtitles.

I wonder why youtube makes it so difficult to show a transcript of their videos.

jswhi1e · ‎08-10-2019

Peter, your additions look really nice in the example! Can you share the complete html and js files you used?

@erlend_thune , I'd love to see your latest modification for auto-generated text. (Complete files as well)

Can we get these both in the github project?

erlend_thune · ‎08-11-2019

I don't think I've made any modification for auto-generated text? Or do you mean the post where I explain how to lookup the subtitle file from dropbox?

jswhi1e · ‎08-13-2019

The Dropbox one. I think it would be useful to see the code that extracts

the autogenerated text in context.

On Sun, Aug 11, 2019, 10:49 PM erlend.thune@iktsenteret.no <

samuel_malcolm · ‎08-11-2019

This is really cool! iI actually built a very similar thing with interactive transcripts and also a video menu in React JS. I modelled it after the UI on Lynda.com
Using React might be helpful because you could use InstructureUI components and then it would look like Canvas as well, thought id think that out loud because I saw someone ask about the styling of the interface.

jswhi1e · ‎08-13-2019

Sam, do you have a link for your react based version?

I'm using react as well and would love to see how it works!

On Sun, Aug 11, 2019, 9:57 PM samuel.malcolm@rmit.edu.au <

samuel_malcolm · ‎08-25-2019

Hey James, im happy to share the code with you, its currently baked into another project. I was planning on making it an npm package but havent got around to it yet, are you still after it?

jswhi1e · ‎08-26-2019

Yea! that would be great! I've fiddled around with a few versions, but I'd

really love to see your version.

On Sun, Aug 25, 2019 at 7:51 PM samuel.malcolm@rmit.edu.au <

samuel_malcolm · ‎08-26-2019

I am planning on making some lynda style videos so i made this for my personal site, heres a link to the react component itself: sam-malcolm-website/YoutubePlaylist.js at admin_section · SamMalcolm/sam-malcolm-website · GitHub it gets the data from a Mongo database, but Mongo is just housing data that came from the Google API (to avoid hitting google everytime someone visits the site). On line 61 it searches another API to try and find any additional resources via the YouTube video ID, this is in another Mongo collection. The component also looks to see if it is a single video or a playlist, if its a list it creates a menu of all the items in that list, and if not simply widens the transcript and cuts off the menu.

The server is in Express and the code for that side is sam-malcolm-website/api.js at admin_section · SamMalcolm/sam-malcolm-website · GitHub This contains all of the sites self called API endpoints. Youll notice on line 256 where i call google with a playlist (or video) ID and then add it to my MongoDB.

Its fairly complex and baked in at the moment, but i do hope to make a more reuseable version soon. Hoped this helped somewhat though

Interactive YouTube transcript

Get Credentials UserID for API awards

Authenticate using user access token instead of us...

Clean up unused Outcomes?

Calendar error when adding an event

Canvas API page-view next response header not appe...

Get Credentials UserID for API awards

Authenticate using user access token instead of us...

Granting "Permissions - Manage" - Is that the only...

Gaining an LTI Developer Key Outside of My Institu...

API returning all my page_views but only first ite...

You're signed out

Interactive YouTube transcript

Community Help

View our top guides and resources: