What causes so many homophones?

August 5, 2010 at 12:16 AM

In modern Mandarin Chinese, as well as in modern English, there are a lot more homophonous syllables and words than there were in the past. In Russian this tendency also exists, although it is considerably weaker: for example, unstressed sounds [a] and [o] tend to merge. Is this true in other languages? Does anybody know the opposite example, where the sounds, formerly the same, tend to diverge in different words?

Is this homophonization due to the rise of written language, or of the cities, or what? If that was happening in the tribal communities over 100,000's of years, would we lose the ability to speak at all, or would we take a lot longer time to say one word due to many almost-identical syllables in it? Any thoughts?

August 5, 2010 at 01:57 AM

I can recommend a book. Language: Its Structure and Use by Edward Finegan. It doesn't give much history, but it goes over how languages behave. With that information you can imagine how a language might evolve. Creating and splitting homophones both occur naturally. In Chinese, an example of a split is 安 and 焉 which used to sound the same and were used interchangeably.

August 5, 2010 at 04:03 AM

Many times other languages will contribute words that sound similar, but with different spelling and meaning. Even if a word is pronounced considerably different in the contributing language, it can end up being pronounced the way speakers of the receiving language would see it in their phonetic system. English is abounding with homophones in part because of its history of frequent invasions from mainland European countries - an island that is a sitting duck due to its close proximity to most European coastal nations. Instead of ditching their own language for the ones of the conquerors, the English speaking people would just add some of the invaders' language to their own vocabulary.

Mandarin seems to invite a plethora of homophonic sounds with its fixed number of initials, finals, and tones. A speaker can say "ta", and one would have to pick up from the context if the speaker meant 他 or 她. 四 is considered an unlucky number because it sounds a lot like 死.

August 5, 2010 at 11:36 PM

> Hofmann: "Language: Its Structure and Use" by Edward Finegan.

Looks interesting. Is it a standard textbook for an undergraduate linguistics course? How is the course called?

> Hofmann: an example of a split is 安 and 焉 which used to sound the same

Is this an exception? Why would people suddenly decide to pronounce homophones differently? Perhaps it is an influence of some dialect where the two words had never been homophonous?

Here is one scenario how one pronunciation may split into two: first, a community splits into parts that don't communicate; then, each part develops its own pronunciation of the homophone; finally, the parts unite again and retain both pronunciations, but narrow down their meanings. Kind of like speciation in biology. Perhaps ancient languages had richer pronunciation because communities split and merged a lot more often.

> crazy-meiguoren: English is abounding with homophones in part because of its history of frequent invasions ... the English speaking people would just add some of the invaders' language to their own vocabulary.

To me it seems that frequent invasions work against homophonization. But the opposite may also occur: say, in 18-19th centuries the British expanded across the globe, and English came to be spoken by people with diverse linguistic backgrounds. Then subtle differences in pronunciation or grammar may get lost if only a fraction of speakers observe them.

> crazy-meiguoren: Mandarin seems to invite a plethora of homophonic sounds with its fixed number of initials, finals, and tones.

I guess that's a consequence, not a cause of homophonization. If the industrial revolution were to occur a couple of millennia later and the traditional China existed for that much longer, would its language become even more homophonous?

August 6, 2010 at 06:11 AM

It's possible that a language can become more homphonous as time goes on and further advances take place. The more words that are added to any language, the greater the chances of getting homophonus words. An example of an abundance of homophones in Mandarin is the ability to create a complete sentence that is totally homophonic (allowing for tonal changes): 妈妈骂马.

If one wishes to stretch the homophonic concept a little further, one could include multisyllable words that can sound like a phrase of smaller words: "euthanasia" sounds like "youth in Asia". "Assignee" sounds like you're talking about two distinct body parts located directly above and below the thigh.

August 7, 2010 at 03:51 AM

Is it a standard textbook for an undergraduate linguistics course? How is the course called?

("How?" You speak French?) It was used in a linguistics course I took called "Introduction to the Study of Language."

Is this an exception? Why would people suddenly decide to pronounce homophones differently?

I'm not sure, but I would think that merging is much more common than splitting. Also, I don't think it can happen suddenly.

August 7, 2010 at 04:23 PM

I am not sure that the overall number of homonyms actually has an inherent tendency to increase in languages, at least in theory, aside from the fact that the invention of writing and the development of complex cultures has created a greater ability to retain obsolete forms. For example, from what can be revealed from our limited understanding of their phonology, Ancient Chinese, Ancient Egyptian, Ancient Mayan, and Sumerian all seem to have had many homonyms (or at least near homonyms). "Radicals" or "determinatives" were independently invented in all four to distinguish meaning among written words. Increased language contact has also had effects that I describe below and that might also explain why present facts might diverge from theory.

I think that language change, especially on a phonological level, results from the tension between minimizing the effort needed to be understood and maximizing what can be expressed. In other words, what can be easily said is not always easy to understand and what can be readily understood is not always easy to express.

I think that homonyms tend to arise as certain aspects of the phonology are simplified. If understanding is not often compromised, the process can continue. As misunderstanding becomes more frequent, languages tend to use various process to distinguish the homonyms. For instance:

(1) Affixes can be added to distinguish them. E.g., yǐ vs yǐzi.

(2) Clarifying words can be added. E.g., "hose" vs. "panty hose" or "garden hose." xiàng vs. dàxiàng or yìnxiàng. xǐ vs. xǐzǎo or xǐhuɑn

(3) They can be split among different parts of speech. E.g., "rite" (noun), "right" (typically an adjective or adverb), "write" (verb)

(4) The use of one or more of the homonyms may be restricted in some other way. E.g., 觉 jiào can generally not appear in the same place as 教 jiào. The nominal uses of "main" and "mane" are each very restricted (e.g., "Spanish main," "might and main," "horses mane," and "lion's mane."

(5) The above are often combined, so that it can be a little difficult to construct natural sentences where the homonyms can truly be confused. E.g., "C," "see," "See," and "sea." E.g., "meet," "meat," and "mete." 十，石，实，时，食.

(6) I think homonyms exhibit phonological splitting only relatively rarely. I cannot recall any of the Chinese examples at the moment. The closest English examples I can think of is the split of Old English "ān" into Modern English "an," "a," and "one." Also the two pronunciation each of "root" and "route" might conceal a homonym split somewhere.

As these devices become ineffective, one or more homonyms tends to be transformed or to be dropped out of use. The transformations as a whole may tend to create more complicated phonology. For instance, "would not" and "wood not" and "would he" and "woody" are both pairs of homonyms. However, in my dialect of American English "would not" is almost always contracted into "wouldn't," which includes a glottalized syllable I have not encountered in any other language. The first vowel in "Would he" is similarly often omitted, reducing the two morphemes to a sequence of semiconsonant, an alveolar flap, and a vowel (something like "wri" with all three clearly pronounced). This is again perhaps a unique combination among languages.

Similar processes pull language back and forth between analytic (low morpheme to word ratio) and synthetic (high morpheme to word ratio) characteristics and between agglutinative (easily segmentable morphemes) and fusional (morphemes that are hard to segment). Homonyms are probably more common in languages that are analytical or fusional. Synthetic languages usually require so many morphemes to say anything that there are usually many ways to distinguish similar words. For example, a language with two noun classes (e.g., masculine and feminine) has the opportunity to reduce the number of noun homonyms by half. Languages that have obligatory suffixes that distinguish nouns and verbs can likewise eliminate the possibility of homonyms between these word classes. My limited experience with different languages seems to suggest that fusional languages also tend to create more homonyms, but I am less certain of this.

I would guess that as cultures have increased communication between them, that creolization during language contact has become a fairly consistent pressure on most languages. If this is true, there is probably a worldwide trend from synthetic to analytical languages, since creoles tend to be very analytic. Such a trend, if proven, would tend to support a hypothesis that homonyms are generally increasing. I can think of many languages that have become more analytical over time, but almost none that have become more synthetic. I am less certain about any definite trends between agglutinative and fusional languages, since I can think of examples going both ways.

August 21, 2010 at 05:36 AM

Hello, Altair, sorry for a much delayed response! Your answer is so advanced and detailed that I was at a loss what to say.

Altair> Ancient Chinese, Ancient Egyptian, Ancient Mayan, and Sumerian all seem to have had many homonyms (or at least near homonyms).

As far as I remember reading in Daniel Kane's book, ancient Chinese was much richer in pronunciation, had more consonants, syllable endings etc. Also, if one reads written classical Chinese now, the meaning will be too ambiguous to understand, but apparently it wasn't ambiguous then.

Altair> I think that language change, especially on a phonological level, results from the tension between minimizing the effort needed to be understood and maximizing what can be expressed.

It seems that "minimizing the effort" results in more homonyms, but "maximizing what can be expressed" results in longer words. So, these two aspects do not undo each other, but accumulate different kinds of change. Over time, a language becomes phonologically poor and uniform in style, but acquires lots of combined words, idiomatic expressions, proverbs etc. that extend its vocabulary. Is this true in general, for old languages?

Altair> Similar processes pull language back and forth between analytic (low morpheme to word ratio) and synthetic (high morpheme to word ratio) characteristics and between agglutinative (easily segmentable morphemes) and fusional (morphemes that are hard to segment).

It is hard to imagine a transition from one type to the other, even over thousands of years. For example, Russian heavily relies on word endings, suffixes and prefixes, and Chinese-type words such as "downtown" as well as completely new words such as "google" are rare (typically borrowed). Russian has a lot fewer homonyms perhaps because most words have 2 or more syllables + differ in gender, case, applicable rules, suffixes/prefixes etc. The basic principles of language organization seem just as conserved as the genetic code.

BTW, I wonder how Chinese ended up totally without word inflections. How long ago did it diverge from the Indo-European family? Must be several tens of thousand years ago at least.

Altair> ... since creoles tend to be very analytic. ... I can think of many languages that have become more analytical over time, but almost none that have become more synthetic.

The same seems to me as well. But then how did synthetic languages originate in the first place? Were the driving forces behind language evolution different in the ancient or tribal times? In what way?

Hofmann> ("How?" You speak French?)

Good guess!! Russian, not French. Are you French? I guess I should brush up on my English before daring to face Chinese...

Hofmann> I would think that merging is much more common than splitting. Also, I don't think it can happen suddenly.

If splitting occurs due to an invasion or merger between tribes, it can be sudden. Perhaps that's the only way it can happen, because a slow change in pronunciation would be consistent across the entire population.

August 23, 2010 at 04:04 AM

@dreamon, do you have any examples to illustrate how frequent homophones raise confusion?

I've studied and immersed into putonghua environment for too long. Long enough for me to forget how it really affects daily conversations. In my experience, Chinese homophones give a lot of troubles when I was asked how to write a specific personal name. Even a simple name such as 丹丹, 沪生, 海燕 needs further confirmation so that I can write it correctly. One of my friend is thought to be called "五哥" until two years later when I helped him reserve an air ticket, I found that his name is "武(surname)歌(last name)".

Other than this, homophones don't really bring much troubles to me.

August 27, 2010 at 10:39 PM

As far as I remember reading in Daniel Kane's book, ancient Chinese was much richer in pronunciation, had more consonants, syllable endings etc. Also, if one reads written classical Chinese now, the meaning will be too ambiguous to understand, but apparently it wasn't ambiguous then.

Yes, this is correct; however, I think ancient Chinese had fewer acceptable syllables than modern English and like modern English still had many homophones.

It seems that "minimizing the effort" results in more homonyms, but "maximizing what can be expressed" results in longer words.

I think not always. The average English word has apparently been losing a syllable every millennium or so for several millennia. The words have gotten shorter, but some things, like the typical verb phrase, have gotten much longer. Also, "minimizing effort" in one way can increase effort in another way. For instance, English seems to have developed several palatal consonants over the last 2000 years that it did not have before as different phonemes have been squashed together. In other words, the reduction in the number of phonemes in some words ended up creating new ones. The reintroduction of some of these words with "older" pronunciation through old Norse even created a few new differences in vocabulary that did not exist before: e.g., shirt and skirt; give and if; screech and shreek.

Over time, a language becomes phonologically poor and uniform in style, but acquires lots of combined words, idiomatic expressions, proverbs etc. that extend its vocabulary.

I do not think that there is a general trend for languages to become phonologically poorer. Even in Chinese, the development of tones actually has balanced some of the loss in consonantal diversity. In English, even discounting imported words, I think that the phonetic diversity has increased since the times of Old English.

It is hard to imagine a transition from one type to the other, even over thousands of years. For example, Russian heavily relies on word endings, suffixes and prefixes, and Chinese-type words such as "downtown" as well as completely new words such as "google" are rare (typically borrowed)...The basic principles of language organization seem just as conserved as the genetic code.

Actually, such changes are not really infrequent. It is thought by many experts, that Proto-Indoeuropean had significantly simpler verbs and nouns than is represented in ancient Latin, Greek, and Sanskrit. The Anatolian languages, like Hittite, are thought by many to show an older pattern of how Indo-european languages worked, e.g., with only two genders (animate and neuter), two types of verb conjugations, overlapping case endings, etc. The descendants of Latin, Greek, and Sanskrit have all shown substantial simplification of their declensions and conjugations, showing a return to a simpler pattern.

Old and Middle Egyptian were very synthetic. Late Egyptian seems to have been much more analytic. Coptic, their descendant, seems to be analytical, but also fusional, since it packs all sorts of information into its verbs. At the same time, it completely eliminated adjectives as a part of speech.

Even Russian, which maintains many of the characteristics of synthetic languages like Latin, Greek, and Sanskrit, has had quite a number of innovations: e.g., simplification of the past tenses and modes of the verb and also routine use of verb prefixes. Proto-Indo-European does not seem to have had verb prefixes or a differentiation between perfective and imperfective verbs.

BTW, I wonder how Chinese ended up totally without word inflections. How long ago did it diverge from the Indo-European family? Must be several tens of thousand years ago at least.

According to my understanding, Chinese was never part of the Indo-European family, but rather the Sino-Tibetan family. I do not know very much about this family, but from what I have read, the languages in it generally have characteristics quite different from Indo-European languages. They do not seem to have had grammatical gender, noun declensions, and verb conjugations. Many scholars do believe that Ancient Chinese verbs did occasionally display prefixes and suffixes that have left their mark here and there, but that these affixes were not reflected in the character writing. An example was a causative "s" that can be deduced for many verbs. A remnant of this sort of thing can be found in some words that have alternative tones, depending on their meaning.

As for the lack of word inflections, there really is no clear normal state among languages. Irish inflects its prepositions for person, but Russian does not. Russian has complex noun inflections, but Japanese has none at all. Japanese has a complex array of verb inflections for tense and voice, but Mandarin does not. Navaho has extremely complex verbs and very few pure nouns, let alone noun inflections.

The only thing somewhat distinctive about Chinese is that it seems to be near the extreme analytic end of the spectrum of languages. The nature of analytic languages is that they tend not to have many (or any) inflections, but tend to rely on word order and particles to clarify meaning.

But then how did synthetic languages originate in the first place? Were the driving forces behind language evolution different in the ancient or tribal times? In what way?

This is an interesting subject, but I do not think there is much secure scholarship surrounding it. My own feeling is that synthetic languages tend to arise as phonetic loss reduces the sense that speakers may have of the independence of affixes. As the reduced affixes become phonetically easier to use, they can turn to new uses and start to become incorporated into the words they are attached to. It is not easy to come up with transparent examples, but let me try.

In the ancestor language to Russian verbs used as predicates did not distinguish gender, but now Russian has borrowed participle forms to express these past tenses and so requires this distinction. Russian verbs now require a distinction between perfective and imperfective so that some verbs have acquired prefixes that are otherwise empty of meaning (e.g., про-читать) and the number of morphemes in those words has increased.

Old English had only the basic forms "it rains" and "it rained." Modern English now requires distinctions between "it rains," "it's raining," "it'll rain," "it'll be raining," "it'd rained," etc. If we ignore the "modern" convention of using spaces between these words, you could argue that English verbs have become more synthetic in how they handle tense. Although the "have" meaning of a phrase like "it has rained" used to be quite transparent, this meaning has become bleached out. There are many English speaker who would not bat any eye upon seeing something like "it might of rained" where the origin of the "of" syllable is completely obscured.

Latin required a distinction in nouns between nominative case, accusative case, and dative case. Spanish, one of its daughter languages, reflects this to some degree in a three-way distinction in third-person pronouns, but not in its nouns. Instead, Spanish has acquire (or adapted) a new obligatory "prefix" to show "accusative case" in animate nouns and new verbal affix to make clear when a verb governs a dative object. The new Spanish system is somewhat unstable, varying according to dialect, register, and other factors, and is a could example of how language change is not necessarily logical or predictable.

My guess is that these sorts of phenomena can continue unchecked in relatively isolated, culturally uniform communities; however, I don't think this is possible in most cosmopolitan urban environments. I think that these latter environments usually have a high rate of change that fosters the loss of inflections and other synthetic forms. In most languages I am familiar with, it is in rural areas where speech habits tend to be the most conservative and retain the most archaic means of expression. In Arabic, it was the bedouin that were thought to have the "purest" language, not really the inhabitants of commercial towns like Mecca. In Norway, the "new" language movement looked away from Oslo toward the rural dialects to create a from of expression that was thought to be more pure "Norwegian" and less "Danish." Sardinian dialect retains case endings lost in standard Italian. Relatively isolated Icelandic retains most of the aspects of Old Norse lost in the mainland Scandinavian languages. Standard Irish has lost a lot of the distinctions still made in many of the rural Irish dialects. Some northern English dialects still preserve a form of "thou." In Chinese, it was the communities separated by mountains and rivers that tended to retained the most conservative pronunciation habits.

August 28, 2010 at 03:05 AM

I am not a linguist. But it is quite natural to think that Chinese language is a symbolistic language, not the pronounication one, like English. Also because of the symbolistics, the homephones would not confuse a reader or a listener.

Particularly in the history, or even now, China had/has so many dialects, which all pronounce differently, although all use one written language since Qin Shi Huang by force. So, thinking about Chinese language history and the development, I think homophones are very natural for Chinese. Another thing I may say, Chinese language is very rich because of the homophones which means each different characters have their own meaning.

September 13, 2010 at 06:32 AM

Altair> Also, "minimizing effort" in one way can increase effort in another way. For instance, English seems to have developed several palatal consonants over the last 2000 years that it did not have before as different phonemes have been squashed together. In other words, the reduction in the number of phonemes in some words ended up creating new ones.

I heard somewhere (perhaps wikipedia?) that Chinese tones were a result of two syllables squashed together with the loss of a middle consonant. Is it right? Was 2000-year-old Chinese toneless? Another thing I heard is that measure words is a recent invention.

In Russian, the squashing of phonemes also took place. The letter "ъ" (the hard sign) used to represent a vowel sound, and it still does so in Bulgarian. Now "ъ" means pretty much nothing and is no longer written in most words where it formerly was; only the consonants remain.

Altair> The reintroduction of some of these words with "older" pronunciation through old Norse even created a few new differences in vocabulary that did not exist before: e.g., shirt and skirt; give and if; screech and shreek.

Yes, this is what happens when a tribe splits, stays split for a few centuries, then merges. BTW, are you saying that "give" and "if" have a common ancestor?? That would be weird. What was its meaning?

Altair> I do not think that there is a general trend for languages to become phonologically poorer. Even in Chinese, the development of tones actually has balanced some of the loss in consonantal diversity.

At least, languages seem to acquire a certain sound style to them. Once a style emerges, all words that do not fit this style change pronunciation. In English and Chinese there is a clearly defined style. Foreign words change pronunciation big time when they are assimilated into English, for example "soviet" for Russian "совет" or [karrAdi] for Japanese "karate". I remember how someone was telling me how he practiced [karrAdi] for quite a while, until I said smth like "there is that another martial art called karate, isn't it similar to yours?"

So, this style seems to "tighten up" over time, become more refined and more demanding, more restrictive. The style of the Russian language seems to be a lot less restrictive than the style of the English language, and that in turn less restrictive than the style of the Chinese language.

Altair> It is thought by many experts, that Proto-Indoeuropean had significantly simpler verbs and nouns than is represented in ancient Latin, Greek, and Sanskrit. ... The descendants of Latin, Greek, and Sanskrit have all shown substantial simplification of their declensions and conjugations, showing a return to a simpler pattern.

That brings me back to the original question: what was different a few millennia ago in comparison to the AD centuries? Why was the evolution from the Proto-Indoeuropean to ancient Latin/Greek/Sanskrit going in the direction of more complex grammar, whereas in the modern era the evolution of the most popular languages goes in the opposite direction, towards a simpler grammar? Was there a fundamental difference in the way of social life 6000 yrs ago vs. 2000 yrs ago, that had such effect?

Altair> Even Russian ... has had quite a number of innovations: e.g., simplification of the past tenses and modes of the verb and also routine use of verb prefixes. ... Russian verbs now require a distinction between perfective and imperfective so that some verbs have acquired prefixes that are otherwise empty of meaning (e.g., про-читать) and the number of morphemes in those words has increased.

I don't think that the use of verb prefixes in Russian is part of its grammar, because there are no rules. It is more like a part of the vocabulary. A prefix simply makes a new word by adding its meaning to the meaning of the word. A prefix such as "про-" does not have a stand-alone meaning, but it has a meaning when added to a verb. You can stack up prefixes and suffixes:

читать (imperfective, "to read")

прочитать (perfective, "to complete reading")

прочитывать (imperfective, "to be reading to completion")

допрочитываться (perfective, "to arrive to consequences from continued reading to completion"), e.g. "Ты у меня допрочитываешься!", approx. "If you continue reading (these items to completion), I will punish you!"

But I see what you mean, indeed what begins as a common word pattern may end up as a grammar rule. Thank you for making this clear to me!

Jane_PA> Another thing I may say, Chinese language is very rich because of the homophones which means each different characters have their own meaning.

I am sure it is and can't wait to experience discovering it as my study progresses!

Sign In

What causes so many homophones?

Recommended Posts

dreamon

Hofmann

crazy-meiguoren

dreamon

crazy-meiguoren

Hofmann

Altair

dreamon

Zomac

Altair

Jane_PA

dreamon

Join the conversation