roddy Posted December 1, 2006 at 09:25 AM Report Posted December 1, 2006 at 09:25 AM Many of you will already be aware of Signese.com, the site of Chinese character photos which I've been running for an indeterminate length of time now. Obviously, you also are aware of what a fantastically useful resource it is, providing as it does real-life reading material that has been scientifically proven to be 67% more interesting than what's in your textbooks. Obviously, being photo-based has meant that searching the database is a bit tricky - you can't just type in 禁止 and expect the site to flick through all its photo albums and find the relevant snaps. And if you see a character you don't recognize you can't copy and paste it out of a photo into your favorite online dictionary. Or you couldn't . . . . until now! Every photo is now accompanied by a text version of its content. This means a) You can search for photos containing the character / word you are interested in. Search for 禁止 to see what has been forbidden lately. See what real-life examples of the usage of 把 you can find. If necessary, try and find a 厕所. B) If you can't understand something, you can copy and paste the text into which ever application you choose to use. There are currently just over 1000 photographs on Signese.com, with over 20,000 characters of text. Not huge, but respectable. Contributors are always welcome, by the way -see site for detail. Hopefully coming soon - pinyin annotation and translations. Offers of help with this are welcome. Quote
zhwj Posted December 1, 2006 at 09:56 AM Report Posted December 1, 2006 at 09:56 AM Awesome. How much work was it to do all of the transcriptions? (Also, 2.5 years would put it in development for six months or more before initial release? And what about this?) Quote
roddy Posted December 1, 2006 at 10:06 AM Author Report Posted December 1, 2006 at 10:06 AM I'm getting confused over which year it was. March 2005, so that's what, a year and 9 months, give or take. Some of the date fields got messed up when I shifted to wordpress a while back, hence the photos dating back to 1999. I didn't do the transcriptions myself, and I'm not sure at the moment how long it actually took - I will be finding out though, as I have to pay for it Quote
kudra Posted December 1, 2006 at 01:11 PM Report Posted December 1, 2006 at 01:11 PM OK so now what am I going to do in all my free time? (c.f. this and this, and this.) On the bright side, I don't have to worry that roddy is generating photos faster than I personally can read and transcribe them, although that is surely still the case. Quote
roddy Posted December 3, 2006 at 06:25 AM Author Report Posted December 3, 2006 at 06:25 AM What, have you been transcribing them? Should have told me, could have saved me a few quid. I'm currently embarking on the next stage, categorizing. I'm using the following schema Banners, Slogans Big red banners, edifying slogans, etc. Commercial Adverts, shop fronts, receipts. Enigmatic Things that make you go 'why?' Folk Scribbled notes, non-official signs, etc Food Menus mostly, some receipts, packaging. Hutongs From Beijing's hutongs. Miscellaneous Triangular pegs that don't fit in our round holes Scriptual Anything not typeset or hand-printed. Traditional Photos featuring traditional characters To which improvements are welcome (preferably now, when I've done 30, rather than when I've finished. Also, I'm quite happy to . . . share . . . this workload with anyone else, so if you want to join in the . . . fun . . . let me know . . . On the bright side, I don't have to worry that roddy is generating photos faster than I personally can read and transcribe them, although that is surely still the case. Actually posting has been slower than usual later, from me at least. However, the number of contributions has risen somewhat to compensate. It's not always obviously which are contributions, as some email their photos in directly and don't bother to say who it's from. Quote
kudra Posted December 3, 2006 at 10:11 AM Report Posted December 3, 2006 at 10:11 AM haha, 不敢当,过讲. In this case it would have been a very few quid indeed for the 3 posts mentioned, and maybe a couple in the signese comments, which were usually asking for help on my incomplete understanding anyway. Maybe I should have clarified -- by free time I meant free time in my next life, where I am an advanced speaker by the age of 20, and retired by 30. As far as categorization, you might run the total through some automatic classifier. I think there some out there, don't know if they exist in Chinese. (There are surely other members who are expert at this kind of machine learning linguistic stuff) I suppose you could auto translate them into English, then run them. Might be interesting to compare that classification to the one you get by hand. Of course if it matches, .... As far as keeping up, how will the transciption work going forward? Quote
roddy Posted December 3, 2006 at 10:52 AM Author Report Posted December 3, 2006 at 10:52 AM I don't think automatic classification would work too well, and although 1000 photos sounds a lot, it isn't really - I'd probably spend more time figuring out how to do that then I will just manually popping them all into the correct categories myself. Transcription work going forward - photos will still go onto the site stark naked, with no explanation or transcription. I and other contributors tend to post by mobile phone, and there's no way I'm tapping the likes of this in on that dinky little keyboard. The idea is that once every few days / once a week I'll go through the as yet transcribed posts and edit in the Chinese, making them searchable. If, as is possible, I fall very far behind I'm not adverse to the idea of paying someone else to do it again. I'm hoping these changes will increase the number of people using the site, as although I think it's a charming little piece of the internet visitor numbers have never really taken off - it's a small niche, and it's hard to generate search traffic for something that (up to now) has consisted entirely of pictures. Once categorization is done I'll probably restart an advertising campaign for it I used to run. maybe a couple in the signese comments, which were usually asking for help on my incomplete understanding anyway I'd like to see more people doing that - nice to know it's at least halfway useful. Another result of the indexing is that I can now work on replacing the comments that disappeared when I moved over to Wordpress - what would have been hellishly dull job is now only a nightmareishly boring one - about a third done already. Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.