Traditou – Chuyun Shen

Traditou

Feb 07, 2023

Introduction

Find Traditou on the Chrome Web Store and check out the repository on GitHub.

Traditou finds the original subtitle files sent from the website servers, parses them and displays the original subtitles along with English subtitles on the bottom, with Google translate.

This extension can be used with three French Canadian video streaming websites: Tou.tv, Noovo.ca, and TeleQuebec.

Traditou not only matches words closely, it also transforms TV-style rolling captions (which are visually difficult to follow) into more readable streaming service style subtitles.

I made this extension so I can learn French by watching French Canadian content, while not needing to look up words for every sentence.

More Details on the How:

I first go through the requests that are sent from websites’ front-end to their respective servers to filter for requests that contain the keyword “vtt”. Vtt files are the most common video subtitle format. Depending on the website, the server could send one vtt file at the beginning of the video or many vtt files throughout the video.

After procuring the files, I then parse them, from rolling styles to regular streaming service styles. You can refer to the gif above for a visual idea. The subtitle files that are sent have a rolling style, with cues formatted like this below example. A cue is a single subtitle block that has a single start time, end time, and a piece of text. The cues are simplified with letters in the following example. t represents timestamps, while the rest of the letters represent a few words. For example, a could represent “They are taking”, and b could represent “the Hobbits to Isengard.”

cue #1: t1 -> t2
a
b
cue #2: t3 -> t4
b
c
cue #3: t5 -> t6
c
d
e
cue #4: t7 -> t8
d
e
f
cue #5: t8 -> t9
e
f
g
cue #6: t9 -> t10
f
g
h

I parse files like the above to look like below:

cue #1: t1 -> t4
a b c
cue #2: t4 -> t8
d e f
cue #3: t8 -> t10
g h

Once I have generated clean subtitle data, I then create invisible elements on the page, with the French subtitle sentences inside. One interesting thing that I noticed is that Chrome translates as users scroll a page. Chrome doesn’t translate the whole page at a time to save computing power. So the placement of the invisible elements is important–it has to be in the video. Chrome also does not translate elements that have a css style of display: none, but opacity: 0 does the trick. I opted for Google’s page translate because it’s free, unlike Google’s translate API.

After the translations are completed, new cues are then created with both the original and translated subtitles. They are then displayed on top of the video with the <track> element.

Beyond the main functionality, there are a few small standard features like: adjusting the subtitle font size as users change their browser window size, move up the subtitle box when the video control bar (progress bar) is showing, and toggling different subtitle modes: dual subtitles, or French only, or English only.

Challenges

All the video websites implement video solutions different. I have to write a fair amount of custom code for each website. It is difficult to maintain the project when websites’ implementations change and it is time-consuming to add new support for new websites.

Chrome’s manifest V3 is not as forgiving as V2 in user permissions. My interpretation (mostly Internet’s savvy developers’ opinion) is that Google wants to deter the development of a few categories of extensions, namely adblockers (they are cutting into Google’s core revenue stream) and tampermonkeys. The impact on Traditou is that now upon installation, Chrome shows users a warning that Traditou reads their browser history, as well as reads and changes their data, which makes Traditou look like malware. I still haven’t found a way to counter that.

At the beginning of the project, I was using Google’s page translate to quickly translate elements that have subtitle texts in them, instead of working on batch vtt files. This resulted in subtitles that tend to flicker a lot. But parsing vtt files worked much better as I later found out.

I also haven’t found a way to skip the step where users have to right-click and translate the page.

Finally, the biggest challenge is that French Canadians speak too fast, even with this extension, I still have trouble following plots…

Name Credit

Hugo (my boyfriend)’s mom came up with the name Traditou, for traduire direct tout, meaning translate all directly. It is also a pun: tou for tou.tv and for tout, which means all. I think it’s pretty clever.

Tools