New Members xilus2 Posted December 31, 2014 at 07:41 PM New Members Report Posted December 31, 2014 at 07:41 PM I found this great tool for adding spaces to the Chinese Langauge. The Chinese dont use spaces witch makes it very hard for foreigners to read. Check it out http://cws.acuron.us/ You can copy and paste a news article or any other chinese text and it will add the spaces. Quote
skylee Posted January 1, 2015 at 04:41 PM Report Posted January 1, 2015 at 04:41 PM You think it is great? I disagree. There are too many mistakes. Quote
Demonic_Duck Posted January 1, 2015 at 05:08 PM Report Posted January 1, 2015 at 05:08 PM The following is my personal opinion: If you don't want to learn to read Chinese, learn pinyin. This is only really recommended for people who don't want/need to interact with the language on a very deep level - perhaps you want to get up to survival level for a brief period of travel. If you want to learn to read Chinese, you're going to have to do it without spaces sooner or later (and it's going to be sooner, unless you're deliberately shielding yourself from contact with authentic materials). Segmenting text correctly in your mind is a skill, and by having software do it for you, you're depriving yourself of the opportunity to develop that skill. If the software also makes lots of mistakes, you're doubly handicapping yourself. 3 Quote
maomao2014 Posted January 1, 2015 at 07:09 PM Report Posted January 1, 2015 at 07:09 PM It is strange to read an article with spaces. It is unnecessary. Quote
hedwards Posted January 1, 2015 at 08:40 PM Report Posted January 1, 2015 at 08:40 PM Maomaoit'snotnecessary,butitgreatlyimprovestheefficienceywithwhichonereads. Word separation is there because it makes reading a lot quicker as you only have to decide where the word breaks should be when you're writing. And the writer will generally know what the words are. Occasionally, you'll have some disagreement about when exactly words should be concatenated, hyphenated or left separated, but those tend to happen relatively infrequently. I was reading a book from a hundred years ago and they hyphenated to-morrow. What's more, for beginners it's even more important because they don't necessary have a large enough vocabulary to know how the words are supposed to be combined 1 Quote
imron Posted January 2, 2015 at 12:38 AM Report Posted January 2, 2015 at 12:38 AM butitgreatlyimprovestheefficienceywithwhichonereads Only at a very low level. At least with Chinese, I find that now that I am used to no spaces, having them in slows me down considerably. In any event, I agree with the duck that for Chinese, reading without spaces and learning where to break words is a skill you need to develop. Regarding the segmenter, it doesn't work for long texts (it truncated the text when I tried it with a short novel) and it takes a long time to segment. Both of these things are issues I tried to solve with my own segmenter. Quote
Demonic_Duck Posted January 2, 2015 at 02:56 AM Report Posted January 2, 2015 at 02:56 AM Maomaoit'snotnecessary,butitgreatlyimprovestheefficienceywithwhichonereads Yes, you are correct, if you're talking about English. I would highly recommend using spaces when working with English. 1 Quote
hedwards Posted January 2, 2015 at 03:49 AM Report Posted January 2, 2015 at 03:49 AM @imron and Demonic_Duck, it applies to all languages to some degree or another. Chinese probably has less of an issue with it due to the fact that most words are 3 or fewer characters in length with most common words being one or 2. But, it is still a drag in efficiency in those cases where a character could be shared between multiple words, a space separating them would eliminate the ambiguity and somewhat reduce the amount of work necessary when reading. Anyways, how native readers and individuals who have been reading for a long period of time perceive it is slightly off topic as the OP is presumably still learning how to read and until one has an idea what words to expect in collocations, it's rather tough to know where the word boundaries are if they're not visible. Quote
Demonic_Duck Posted January 2, 2015 at 07:37 AM Report Posted January 2, 2015 at 07:37 AM Unfortunately, I strongly suspect your hypothesis (that having no spaces is a "drag in efficiency") is untestable. As far as I know, there aren't any living languages which are sometimes written with spaces and sometimes without, though I could be wrong about this. More to the point, even if such languages exist, have studies been done comparing how quickly/efficiently people who have been taught to read with or without spaces can read text in that language? If you can point to some strong evidence, I'll happily eat my words (with the caveat that if the language studied has an alphabetic script, the same may not necessarily hold true for a syllabic script). Otherwise, your claims are without basis. Quote
imron Posted January 2, 2015 at 08:18 AM Report Posted January 2, 2015 at 08:18 AM I strongly suspect the point is moot anyway, because whatever theoretical increase in speed you might get from adding spaces to Chinese, the reality of the situation is that the overwhelming majority of Chinese content does not contain spaces, and that situation is unlikely to change any time soon. What this means is that if you want to get to the level where you can read native content, then you need to learn to spot word boundaries without spaces. The best way is not to have things pre-segmented, but to struggle through trying to make sense of non-segmented text (that whole 'trying to make sense' part is where the learning is done). 1 Quote
Demonic_Duck Posted January 2, 2015 at 09:04 AM Report Posted January 2, 2015 at 09:04 AM Agree 100%, that's what I was trying to get at with my first post in this thread. 1 Quote
renzhe Posted January 2, 2015 at 01:38 PM Report Posted January 2, 2015 at 01:38 PM Of course people should eventually learn to read unsegmented texts, and the sooner they start getting used to them, the better. But for a beginner, a tool like this might be useful when tackling unfamiliar texts with many unfamiliar words. Written Chinese can be really daunting in the beginning, most of us eventually forget just how daunting it was 2 Quote
maomao2014 Posted January 2, 2015 at 06:23 PM Report Posted January 2, 2015 at 06:23 PM @hedwards Segmentation for English is not words without spaces. When I started to learn English we segment the sentence for example "John Smith, /our teacher,/ came in/ with a book in his hand." The problem I found from the software is it simply separate two or three characters words which should be together. For example 从 试卷 的 得分 情况 可以 看出 , 考生 文言文 的 断句 能力 较差 , 这 实质上 是 缺乏 文言文 的 语感 。 得分情况 should be together 断句能力should be together. The sentence should be segmented in this way 从试卷的得分情况/ 可以看出 , 考生文言文的断句能力/ 较差 , 这实质上是 /缺乏/文言文的语感 。 Quote
imron Posted January 2, 2015 at 11:13 PM Report Posted January 2, 2015 at 11:13 PM The sentence should be segmented in this way Sorry, I disagree. The purpose of the segmenter is to split a sentence into words and the example you gave is quite accurate I think. Quote
skylee Posted January 3, 2015 at 01:44 AM Report Posted January 3, 2015 at 01:44 AM I pasted a few paragraphs to the tool and got the following. I have no strong views about people using whatever tools to learn a language. But I think the tools should be effective and accurate. Some might still find this tool useful. I don't. Perhaps this is just a difference in opinion. 我 們熱愛 的 小城 , 出現 了 前所未見 的 撕裂 ; 我 們 建立 起 來 的 良好 警民 關係 , 也 因 對峙 的 政治 局面 、 劍拔 弩張 的 狀態 , 造成 武力 衝突 , 引起 流血 事件 , 已 再 難 修復 。 警察們 的 警棍 , 已 不 再 是 用來對 付 強盜 和 罪犯 。 和平 示威 的 學生 和 市民 , 也 成為 警棍 下 的 受害者 。 警棍 , 就是 代表 著 當權者 的 權力 , 就 像 村上春樹 所 說 代表 政府 的 那 堵 高牆 , 像 「 轟炸機 、 戰車 、 火箭 與白磷彈 」 一 樣 , 手無 寸鐵 的 學生 和 市民 「 被 壓碎 、 燒焦 、 射殺 」 。 再看看 一 位 十四 歲 女孩 在 牆 上 畫 花 的 遭遇 。 她 , 只有 一 枝 粉筆 , 默默 的 在 牆上 繪花 , 表達愛 自由 、 爭取 民主 、 反抗 不 義 的 情操 。 卻被 十四 位 警察 圍困 、 威嚇 、 拘捕 和 扣押 , 若非 市民 團結 起來 的 吶喊 和 支援 , 小 女孩 被 折磨 的 苦難還會繼續 下去 。 PS - the tool has failed to show “我們”、“熱愛”、“劍拔弩張”、“對付”、“表達”、“愛”、“手無寸鐵”、“不義”、“苦難”、“繼續” etc as separate words. Quote
imron Posted January 3, 2015 at 01:50 AM Report Posted January 3, 2015 at 01:50 AM That paragraph is significantly less accurate than the example given by maomao. Quote
Demonic_Duck Posted January 3, 2015 at 08:18 AM Report Posted January 3, 2015 at 08:18 AM Could it be anything to do with simplified/traditional character sets? Perhaps the segmenter has inadequate support for traditional. To me, the main problem with the site (if I was inclined to want to use that function) is the inordinately long time it takes to work. Imron's software says it can segment a novel in under a second. This site takes at least ten times as long to segment even a short paragraph. I can't imagine that this is accounted for by being online/offline tools, nor by the fact that the output is given as a paragraph with spaces rather than individual words. Quote
hedwards Posted January 3, 2015 at 08:21 AM Report Posted January 3, 2015 at 08:21 AM @Demonic, it definitely is testable, it's just that I'm not aware of anybody doing so. And unless the Chinese government starts considering the possibility, the only actual benefit would be beginner materials that are more friendly to the beginner. It would be a bit tough to do a proper double blind test, but that doesn't make it untestable. As I've already noted, Chinese is probably less sensitive to the lack of word breaks than a language that uses a more limited set of characters. A large part of the problem in English is that if you have the letters i t i and s in sequence, you don't know which of several possibilities it is. It could be "it is" it could be "itis" and it could be a somewhat ungrammatical "I tis." Chinese seems to have fewer possibilities, so it probably does impact efficiency less. But, that being said, it isn't non-zero and it's more efficient for the writer to just segment things prior to reading as that's something that is done once. Whereas segmenting by the reader is done each time somebody reads the text. @Maomao, I remember when I started to read German going through that. Segmentation is between words, the stress pattern is something different and somewhat more difficult to pick up. It requires that you not just see the words separately, but consider the next words as you're reading the current word. And to consider the intent of the author. Depending upon how precisely you modify the stress pattern you can wind up with a very different sentence. Mandarin being Stress timed rather than syllable timed has similar possibility as well. As interesting as this is, I don't think it's terribly productive to argue about as the government is probably not going to recognize the obvious superiority in word breaks, especially for the purpose of boosting literacy for the masses and making it easier for foreigners to learn the language. Quote
Demonic_Duck Posted January 3, 2015 at 08:34 AM Report Posted January 3, 2015 at 08:34 AM When I said "untestable", I meant untestable in fact, rather than untestable in principle. As interesting as this is, I don't think it's terribly productive to argue about as the government is probably not going to recognize the obvious superiority in word breaks, especially for the purpose of boosting literacy for the masses and making it easier for foreigners to learn the language Yes, it really is quite astounding that neither the government of the PRC nor the government of the ROC will change their entire writing system based on what some guy on some forum said, especially when word breaks are so obviously superior that their superiority doesn't require any supporting evidence. Quote
歐博思 Posted January 3, 2015 at 08:47 AM Report Posted January 3, 2015 at 08:47 AM I have the explanation for the delay using the software: Has anyone seen the episode of 爱情公寓 where the character 悠悠 (who doesn't speak Japanese) is able to talk with 关谷's Japanese father because 字幕组 is translating for them real time? Somebody somewhere is putting spaces in all the wrong places Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.