That's great Peter! I'll try to integrate your changes in the github project, unless you want to upload them yourself.
If you use the network inspector, you will see that the following url is called when you turn on CC on the auto generated CC video (https://www.youtube.com/watch?v=i1M95njhovw😞
Network inspector:
https://www.youtube.com/api/timedtext?caps=asr&hl=en_US&signature=690C7D532D9188E602B2A1D0241BA07009...
Breaking it up gives:
https://www.youtube.com/api/timedtext?caps=asr&
hl=en_US&
signature=690C7D532D9188E602B2A1D0241BA07009859780.801B7DC1758C282D6C8B30EC16904B8DF4BBD291&
xorp=True&
sparams=asr_langs,caps,v,xorp,expire&
v=i1M95njhovw&
asr_langs=ko,it,de,fr,ja,en,ru,es,pt,nl&
key=yttt1&
expire=1519414320&
kind=asr&
lang=en&
fmt=srv3
Now, removing the signature f.ex. results in an error message if you try to get that url:
https://www.youtube.com/api/timedtext?caps=asr&hl=en_US&xorp=True&sparams=asr_langs,caps,v,xorp,expi...
But, if you do the same with one of the videos with a manually added transcript, like this one (PfDK MOOC - Ny videreutdanning i digital kompetanse for lærere - YouTube ), you get this url for the timedtext in the network inspector:
https://www.youtube.com/api/timedtext?xorp=True&key=yttt1&signature=BE75D143A2A25FB531044D78A14135AE...
Breaking that up:
https://www.youtube.com/api/timedtext?xorp=True&
key=yttt1&
signature=BE75D143A2A25FB531044D78A14135AEFF3CB56F.37A684BEEF8FE6BBD04F91857DBE2DA387544765&
asr_langs=en,ja,fr,de,ko,it,nl,pt,es,ru&
v=Lm0m4VtZ3Us&
caps=asr&
hl=en_US&
sparams=asr_langs,caps,v,xorp,expire&
expire=1519416041&
lang=no&
name=bokmål&
fmt=srv3
Looks quite similar, but removing the signature and everything else except the v, lang and name parameters for that sake, still gives the subtitles:
https://www.youtube.com/api/timedtext?v=Lm0m4VtZ3Us&lang=no&name=bokm%C3%A5l
So, grabbing the subtitles for ASR (Automatic Speech Recognition) subtitles behave differently for some reason.
I've tried looking into using the Google youtube data api for downloading captions (Captions: download | YouTube Data API | Google Developers ), but I can't get it to work without asking the user to login with a google account!
Also, I guess there is a risk that YouTube will remove support for the timedtext urls, making it risky to use that approach in any case
One solution could be to store the subtitles on a private server, i.e dropbox f.ex. and download the subtitles from there instead. I tried that here: https://dl.dropboxusercontent.com/s/f1raehbpyabjucg/i1M95njhovw.xml
This html file uses that subtitle file:
https://www.erlendthune.com/yt/ytexample2.html
To make the javascript look in dropbox, I changed/added these lines in the js file:
-> changed: var hrefPrefix = "https://dl.dropbox.com/s/f1raehbpyabjucg/";
-> added: var hrefPostfix = ".xml";
-> changed: var href = hrefPrefix + videoId + hrefPostfix;
I also had to change this:
captionText = captions[i].textContent;
// captionText = captions[i].textContent.replace(/</g, '<').replace(/>/g, '>');
The last change was because the ASR timedtext for your video contained color coding for some reason!
I guess one could have a parameter in the url indicating where the script should look for the subtitles.
I wonder why youtube makes it so difficult to show a transcript of their videos.