Sinoscope subtitles, a dialogue aid

September 2, 2015 at 07:52 PM

I am working on a program that accepts an audio file containing Chinese speech alongside a corresponding transcript. The program will display the corresponding character of the spoken syllable for a duration commensurate with that of the utterance. In other words, the character will only be visible while the corresponding syllable is being spoken (essentially acting as one-dimensional subtitles). The temporary and volatile nature of the subtitle reflects these properties of speech (you can neither read ahead nor look back to catch what you've missed, just as in oral conversation). I believe that this system would help students with both character recognition and speech comprehension, as their pairing would be mutually beneficial; think of it as 'training wheels'.

Here is a sample in Mandarin that I made as a proof-of-concept video:

https://www.youtube.com/watch?v=q0jX-p-uGkY&feature=youtu.be

September 2, 2015 at 08:22 PM

This looks interesting. It also has the same properties when used with English.

It is a theory that if you put up English words in the same place on the screen you can increase the speed at which you can read and understand.

There are a couple of real world examples, one is used by Honda if I remember correctly, in one of their tv ads.

http://creativity-online.com/work/honda-uk-keep-up/39050

Reckons it is 500 words per minute.

The more I think about this, the more I think your idea has serious merit.

September 2, 2015 at 09:07 PM

Thanks for the Honda add; it really helps support my hypothesis!

While it is possible to implement this system in just about any language, I think it is most compatible with Chinese languages because:

1) Chinese characters have perfectly square dimensions. Most written languages transcribe units of speech as linear strings of glyphs (usually horizontally, but sometimes vertically, as in Mongolian and Manchurian). These equilateral shapes are easier for the eye to scan quickly than strings of varying lengths. The monosyllabic words 'I' and 'twelfths' would occupy different amounts of space, thus straining the reader. It would also be necessary to determine whether such strings should be centered on screen or left-justified (or otherwise right-justified for Arabic and Hebrew); the former would encourage the reader to process the entire word as a single graphic unit, whilst the latter would prompt the reader to 'sound-out' the word from its initial letter.

2) Chinese languages have one syllable per one morpheme per one character (with extremely rare exceptions). This sets it apart from Korean Hangeul, which, whilst also squarely proportioned, can have phonemes that bleed into neighboring syllable blocks (e.g. a word spelt 'hag-a' yet pronounced as 'ha-ga'). Even if one were to render Japanese entirely in Kana, it operates on the level of 'mora' rather than syllable, which also disqualifies it (e.g. 'kou-sen' deconstructed as 'ko-u-se-n').

September 2, 2015 at 09:26 PM

I believe it is called stationary-window condition and is the third condition of three ways they tried to speed up reading

1) the text appears in the top left of the "window" and new words are added without taking away the old words.

2) the text appears in the centre of the screen but the new word appears next to the old word which disappears once the new word is there.

3) the words appear one after another quickly in the centre of the screen with the old word disappearing as soon as the new word appears.

There is an excerpt here https://books.google.co.uk/books?id=zCxf_LVzXoEC&pg=RA1-PT175&lpg=RA1-PT175&dq=stationary-window+condition&source=bl&ots=RsBBb739GB&sig=EIcXKXt3eFB8GviVIKwNnPOqJko&hl=en&sa=X&ved=0CCEQ6AEwAGoVChMIlarugKPZxwIVDBjbCh1WVA2C#v=onepage&q=stationary-window%20condition&f=false

I agree Chinese does seem well suited to this.

I am interested to see how you get on with this and if there is anything I can do to help, ask and if I can, I will

September 3, 2015 at 08:52 PM

Yes agreed, there might be something in this. Going through with increasing speed would be interesting.

September 4, 2015 at 07:50 AM

"These equilateral shapes are easier for the eye to scan quickly than strings of varying lengths. The monosyllabic words 'I' and 'twelfths' would occupy different amounts of space, thus straining the reader. "

Show me the science. Despite common use of the word 'scan', when you actually look at what the eye does, you see brief fixations and jumps. Readers of both Chinese and English will focus on one point and absorb information around that point - not 'scan' a word from start to end - and then jump forward (or maybe back if they need to). Information is actually cut off when the eye is moving. As for straining the reader - well, we all seem to read books without, in the main, developing headaches. And if I was feeling unfair and unscientific, I might even ask you to explain this ;-)

This looks like it might be a fun little tool, IF you can get the characters and utterances lined up in real-world input.

September 4, 2015 at 08:10 AM

Personally I think it would be a much more useful tool if it matched whole words, rather than just individual characters.

September 4, 2015 at 08:13 AM

Or built up from individual characters to phrases.

September 4, 2015 at 07:04 PM

@roddy

I meant 'scan' as in the way one would scan a barcode (a fixed point). I didn't mean to imply that the eye was absorbing information while in motion; I concede that my choice of words may have been misleading. My intended assertion was that this point of fixation is wider in alphabetic scripts, while a Chinese character represents a quadrilaterally symmetrical point. Although our vision is stereoscopic, both eyes nevertheless fixate on a single point, along with its surrounding area in all directions (not just on the sides).

@imron

The concept of a multisyllabic 'word' as distinct from a monosyllabic morpheme (usually assigned to a character) is a relatively recent concept in (written) Chinese. These Chinese 'words' arose from a need for disambiguation in speech (and later for technical and modern nomenclature); they only really proliferated in tandem with vernacular writing.

That aside, flashing polysyllabic words defeats the purpose of this system. Chinese languages are unique in their near-perfect morpheme-syllable ratio (as far as I'm aware), and these pairings are well represented by characters. Aside from the fact that word boundaries are not always clear in Chinese, there is simply no need to visually group the morphemes within a word when the speaker will have to utter two syllables during that short period anyway. Just as two syllables are not vocalised simultaneously, neither is there a need to display two morphemes (and two characters by extension) similtaneously in a system designed to reflect speech as a visual analogue.

September 5, 2015 at 01:14 AM

they only really proliferated in tandem with vernacular writing.

Yes, which came about because people wanted to write in the language they spoke, rather than a language that they didn't.

That aside, flashing polysyllabic words defeats the purpose of this system.

Flashing monosyllabic characters will make the system cumbersome because the reality is, the language as used and spoken today is made up of polysyllabic words. Splitting them up will just make it difficult for people to identify word boundaries. I ma gine try ing to read Eng lish split up in to syll a bles. It's significantly more burdensome than reading whole words.

September 5, 2015 at 02:50 AM

"Splitting them up will just make it difficult for people to identify word boundaries."

I agree, and that's by design. This system preserves all of the difficulties of speech comprehension, including undefined word boundaries, except for one: homophony. I do not define word boundaries for the same reason that I do not provide punctuation: it pushes too far into 'reading' territory. By the way, English orthography does not reflect syllable boundaries as clearly as Chinese does.

September 5, 2015 at 03:22 AM

This system preserves all of the difficulties of speech comprehension

It actually makes it more difficult because people speak in words, and the pauses and emphasis they make provide clues as to where the word boundaries are. People don't speak in a staccato of monosyllables.

September 5, 2015 at 05:12 AM

If we're talking about Chinese speech, then I cannot detect word boundaries at all unless I am already familiar with the words in question. Tactically placed pauses do allow me to demarcate phrases and clauses without prior acquaintance, but never words. I'd imagine that it is because Chinese languages are tonal whilst my native language is not.

Aside from all that, the other reason is that placing two characters on the screen simultaneously does not force the reader to absorb the characters at the same rate as the speaker is vocalising the corresponding syllables. For example, the user could remain fixated on the complicated 鬱 of 鬱悶, and not have time to move on to the neighboring 悶 before the speaker has finished uttering both syllables. In short, it would be less conducive to reader-speaker synchronisation; I want to ensure that each character is given an amount of time on screen congruent with the duration of its corresponding syllable.

Sign In

Sinoscope subtitles, a dialogue aid

Recommended Posts

ParkeNYU

Shelley

ParkeNYU

Shelley

Tianjin42

roddy

imron

roddy

ParkeNYU

imron

ParkeNYU

imron

ParkeNYU

Join the conversation