Hofmann Posted April 26, 2009 at 01:11 AM Report Share Posted April 26, 2009 at 01:11 AM Certain characters delete stuff after it. For example, if I write a post with this character in it, the character and everything after it doesn't get posted. Quote Link to comment Share on other sites More sharing options...
roddy Posted April 26, 2009 at 01:15 AM Report Share Posted April 26, 2009 at 01:15 AM (edited) Ok Edited April 26, 2009 at 01:34 AM by roddy Quote Link to comment Share on other sites More sharing options...
roddy Posted April 26, 2009 at 01:16 AM Report Share Posted April 26, 2009 at 01:16 AM Ouch, you're right - there should have been more to that post . . . Quote Link to comment Share on other sites More sharing options...
roddy Posted April 26, 2009 at 01:59 AM Report Share Posted April 26, 2009 at 01:59 AM MySQL (in a utf8_general_ci column, but I also just tried utf8_unicode_ci and utf8_bin ) just truncates from certain (rare) characters. Same thing for this, adjacent to the one linked above. I'm seeing this on different installs / servers. If anyone's got any ideas, I'd like to hear them. It's not an everyday problem, but it is annoying. Also, if anyone can think of any old posts that might have used similar rarely-seen characters, take a look and check they're all in one piece. If not I should be able to restore them from a back up. Will move this into the computing section in the hope of attracting some gurus. Would be useful to know the scope of the problem, if it's restricted to certain MySQL versions, and anything that can be done (binary columns?) Quote Link to comment Share on other sites More sharing options...
adrianlondon Posted April 28, 2009 at 12:15 AM Report Share Posted April 28, 2009 at 12:15 AM I clicked on the link in the first post and ... in Firefox the "Your Browser" area contains a rectangle with hex (I assume) inside; in IE it's an empty rectangle. So even if you fix the problem, I wouldn't be able to see the characters anyway ;) Is it only me with this issue? Quote Link to comment Share on other sites More sharing options...
imron Posted April 28, 2009 at 01:53 AM Report Share Posted April 28, 2009 at 01:53 AM It's just you I see the character just fine. You might want to make sure you've got a better unicode font installed. Quote Link to comment Share on other sites More sharing options...
adrianlondon Posted April 28, 2009 at 09:58 AM Report Share Posted April 28, 2009 at 09:58 AM I have Arial Unicode MS which I thought contained everything. Apart from the links on this thread, everything else works fine, so I'll leave it as it is and hope the characters talked about on here aren't used frequently ;) Quote Link to comment Share on other sites More sharing options...
peekay Posted April 28, 2009 at 07:05 PM Report Share Posted April 28, 2009 at 07:05 PM Unfortunately this problem reflects a weakness in MySQL 5's Unicode implementation. Currently, MySQL can only store characters within the Basic Multilingual Plane (BMP, also known as Plane 0.) Hofmann's character lies outside the BMP, in the Supplementary Ideographic Plane (SIP or Plane 2.) If a character's assigned Unicode codepoint has more than four hex digits, then it's outside the BMP. For example, the character in question has a codepoint of 25D32, so it can't be stored by MySQL. Fortunately, commonly used Chinese characters should all be within the BMP. This problem will be fixed in MySQL 6 (still in Alpha.) Technically, it's possible to store these characters in a MySQL 5 binary column, but that's not generally recommended (e.g., full-text searches don't work on binary columns.) For a more technical description of this issue... basically the current implementation can only store Unicode characters which encodes to three utf8 bytes or less. As you can see from the page Hofmann linked to, U+25D32 encodes to four utf8 bytes (F0 A5 B4 B2). MySQL can't handle the extra byte and truncates the string. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.