Are new characters still being created?

December 1, 2011 at 06:56 PM

I know there are thousands of old characters that are not in use anymore, so probably the answer is no. Why create a new character when there are thousands of old ones to choose from?

Still, I'd like to know if there is a record of the most recent character created?

Do we even have an estimation of the age of some characters?

(I mean, apart from the traditional to simplified process and variants of the same character.)

December 1, 2011 at 07:01 PM

New ones are being created for new elements, for example, and I think for some chemical compounds.

And of course old obsolete ones are being "repurposed", e.g. 囧.

December 1, 2011 at 09:52 PM

I think as computers become more and more dominant in terms of usage (compared to paper), it will get harder and harder to create totally new characters, simply because the hassle involved in updated various encoding standards, making sure that common fonts all have the character added and so on will present a large obstacle to all but a handful of characters for specialist vocabulary for niche fields.

December 2, 2011 at 05:54 PM

@jbradfor: thanks, I was not aware of the new elements warranting new characters.

@Imron: that's my feeling too... are all the existing characters already in thre standards anyway? Didn't some people have to change their surnames because they use un-typeable characters?

December 2, 2011 at 06:00 PM

I agree with imron that it is pretty hard to coin new characters these days because of encoding standards having a fixed inventory of characters. However, things could change in the future if a way of representing characters based on their internal structure were devised. In the current encoding schemes, a character like 淋 is represented by a numeric value completely unrelated to 林, which is also numerically unrelated to 木, but there have been some attempts to design component-based encoding systems where 林 would be represented as (木木). With such a system, 淋 would be something like (氵(木木)), and it would then be possible to coin a new "lin" character with the hand radical (扌林) by encoding it as (扌(木木)). I think such a system would be a more natural representation of how Chinese characters actually work than the current sequences of numeric values, and maybe in a post-Unicode world in the far future this idea will catch on.

I can't remember where I first read about this kind of encoding schemes. After some googling, I've found this old paper from 1996 that explains such an approach: http://seba.ulyssis.org/thesis/papers/yeung.cpol97.pdf

December 2, 2011 at 06:39 PM

Imron may be right that in some respects the computer makes it harder to create new characters. On the other hand, in a way the computer makes it easier too. Of course dependent on what you consider a new character.

In the past new characters were created by 'random' people either through writing errors or through conscious assembly to express something new or to simplify something that before was expressed by more difficult or multiple characters. After creation it was a matter of chance whether it would catch on. Now language is more standardised and if an official commity decides that a new character is needed it will be implemented in the systems, if only to meet the standards. A simple example is the euro-sign. After it was introduced many existing systems were patched to support it and standards were adapted.

Now, in the computer age only a very limited number of people, those in charge of the standards and the owners of the main OS's, have to be convinced to introduce a new character. A 'simple official decree' will do.

As long as a language is alive and used by people on a daily basis the language will develop. Changes are gradual, the speed of change may vary, but in the end it will effect every part of the language.

December 2, 2011 at 08:09 PM

In the current encoding schemes, a character like 淋 is represented by a numeric value completely unrelated to 林, which is also numerically unrelated to 木, but there have been some attempts to design component-based encoding systems where 林 would be represented as (木木). With such a system, 淋 would be something like (氵(木木)), and it would then be possible to coin a new "lin" character with the hand radical (扌林) by encoding it as (扌(木木)).

The thing is, it's a two pronged problem. It's not just the encoding, it's also the font. If the font doesn't have the character then it still wont display. Also, such an encoding system would be inefficient and impractical for storing characters. Currently unicode uses a maximum of 4 bytes for any character, however it typically comes down to 3 bytes for a common chinese character for utf8, or 2 bytes for utf16. An encoding system such as the one you mentioned could easily double/triple that for some characters.

Actually, for individual purposes, it's quite easy to create your own character with unicode. There is a range of several thousand code points set aside for private use in the BMP, and over a hundred thousand outside the BMP, so you can just decide you're going to use codepoint E001 for your new character and then create a font that has a picture of the character you want for codepoint E001. Anyone with your font installed who gets unicode text containing your new codepoint will see that codepoint as your new character. So, for personal or limited use, such a system makes it quite easy to create new characters. For widespread use however, it would take more work to get widespread agreement and widespread usage of the new font. New characters do keep getting added to the Unicode standard however (although these are not 'new' characters, rather they are rare characters that didn't make it into earlier versions of the standard and are being slowly added).

December 3, 2011 at 03:48 AM

imron wrote:
It's not just the encoding, it's also the font. If the font doesn't have the character then it still wont display.

That’s pretty much the problem I used to face regularly, when trying to input rare and non-standard characters (the latter normally being ones found in non-Mandarin dialects). To fix that problem, I have installed three of (what I have found to be) the most comprehensive set of fonts, i.e.

1. HanNom (Sets A and B)

2. SimSun Founder Extended

3. TW-Kai

The TW-Kai font set is not as comprehensive as (1) and (2), but I like it for its aesthetic appearance - it is the font normally used in Chinese wedding invitation cards. (1) alone probably contains all the characters in (2) and (3) plus more.

However, that still does not solve the problem of how to input the characters (having the font means you can display it, but inputting it is a totally different matter). So, I normally end up going to the Unihan website, and searching for the character (by radical and residual stroke count). That normally does the trick for me.

Actually, I did have an experience of having to manually-create a character. A friend of mine asked for assistance in drafting the text for his wedding invitation card. The snag was, his given name has a rare character: 木+强 (it’s actually a variant of 弶). And generating a new code for it would not have worked, as the draft softcopy would eventually have had to go to the printing company, who would undoubtedly use the TW-Kai font (it does not have the character - I checked). So, I had to manually-create it as a standalone JPEG by merging 木 with 强, plus some sideways ‘compressing’ to get the proportions right.

December 3, 2011 at 04:15 AM

However, that still does not solve the problem of how to input the characters

Most modern IMEs allow you to define an input sequence that maps to a given codepoint. For example, if you invent a new character that you give the codepoint E001 and want to pronounce 'xin', then you can go to the settings of your IME and configure it so that E001 is one of the choices for the input sequence of xin. Shape based IMEs also allow this functionality.

March 17, 2012 at 02:47 PM

Sorry for bumping this thread but some new characters created here in Guangzhou are simplified-looking versions of HK's Cantonese characters. I have never seen any Guangzhou native write in Cantonese by hand, and when most write on the computer or cellphone extremely ad-hoc characters would be used, but some Cantonese textbooks I have use these characters. Most cannot be written on a computer.

Looking through my textbook (今日粤语), here are the first couple of characters that pop out:

口+个 (From 嗰)

扌+罗 (From 攞)

口+系 (From 喺)

March 17, 2012 at 05:04 PM

I understand that the characters for concave and convex were "created" recently.

凸 tu convex

凹 ao concave

These look like the concept being described, i like them for their simplicity and information given.

March 18, 2012 at 03:37 PM

Do you have a reference for this?

According to google ngrams they appear in some relative old books:

http://books.google.com/ngrams/graph?content=%E5%87%B8%2C%E5%87%B9&year_start=1800&year_end=2006&corpus=11&smoothing=1

March 18, 2012 at 03:55 PM

I don't have any referances, only what my first chinese teacher told me. i did say in my original post that i wasn't sure. I said i understand.......

It does seem to be older, but i think the point is that it was created specifically for 2 concepts. Most characters seem to have developed and been adapted, these were created from scratch for these concepts. i am not sure of any others that are like this....but i am probably wrong:)

Thanks for the info.

March 18, 2012 at 04:10 PM

This word stands for "lift" (ie elevator) -> �� / (if you can't see it, it is (車立) combined as one character). In HK the pronunciation is lip1 (Cantonese). I think it is quite good, combining the meaning of standing in a car and also the similarity in pronunciation.

March 18, 2012 at 05:07 PM

This may be of interest - http://www.chinese-tools.com/characters/new.html

Sign In

Are new characters still being created?

Recommended Posts

edelweis

jbradfor

imron

edelweis

Jose

Silent

imron

Mark Yong

imron

Takeshi

Shelley

BertR

Shelley

skylee

Shelley

Join the conversation