js6426 Posted March 16, 2016 at 03:41 AM Report Posted March 16, 2016 at 03:41 AM I have found many frequency lists for the most common Chinese characters, but is there any such list for the most common words (including multiple character words rather than just single characters)? I have had a look around but haven't been able to find anything yet. Thanks Quote
imron Posted March 17, 2016 at 08:38 AM Report Posted March 17, 2016 at 08:38 AM There's not a lot out there. One I can think of off the top of my head is this one. The problem with these types of lists however is that they may or may not be relevant to your vocabulary and to the type of content you are wanting to use. The more advanced your level is, the more likely this is to be true. Take the above list for example, it's generated from film subtitles, and so it will be relatively good for words found in dialog and spoken text, but relatively poor for words found in newspaper articles and novels. If you'll excuse the shameless plug, I wrote a tool that lets you generate your own wordlists based on frequency and/or several other metrics, from any piece of Chinese text. It will also keep track of your known vocabulary over time, so you can use it to export the top 10 unknown words from a given article, and so on. Quote
iand Posted March 18, 2016 at 03:39 AM Report Posted March 18, 2016 at 03:39 AM https://en.wiktionary.org/wiki/Appendix:Mandarin_Frequency_lists 1 Quote
js6426 Posted March 21, 2016 at 02:01 PM Author Report Posted March 21, 2016 at 02:01 PM Thanks guys, these are super helpful Quote
lips Posted March 21, 2016 at 02:13 PM Report Posted March 21, 2016 at 02:13 PM https://en.wiktionary.org/wiki/Appendix:Mandarin_Frequency_lists Thanks, iand, I've been looking for something like this for a while. Noticed something interesting in the first list: 台灣 台湾 is no. 80? Quote
New Members Joana Posted April 7, 2016 at 12:15 AM New Members Report Posted April 7, 2016 at 12:15 AM it's better for you to know the background of those words. Quote
iand Posted April 13, 2016 at 03:31 PM Report Posted April 13, 2016 at 03:31 PM After looking at the list I linked, it's obvious that it was generated from newspapers, and has more formal words that I don't need to worry about yet. For example, 領域, meaning scope or field of operation. Instead, consider the SUBTLEX-CH list, generated from movie and TV subtitles. http://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexch Quote
eddyf Posted April 14, 2016 at 12:52 AM Report Posted April 14, 2016 at 12:52 AM The SUBTLEX-CH has its own issues. Mainly that the corpus it is based on contained a lot of translated subtitles for American movies and shows. So, you'll notice that a ton of transliterations of English proper nouns show up. And who knows what other biases are introduced from the fact that it's all translated material. For example, maybe some common chengyus are ranked really low because they wouldn't be a natural way to translate anything from English. Overall I see this as a serious problem with this particular frequency list. If your goal is just to have a list of words to study then I think the official HSK vocab lists work very well for that purpose. 1 Quote
imron Posted April 14, 2016 at 01:41 AM Report Posted April 14, 2016 at 01:41 AM Overall I see this as a serious problem with this particular frequency list. It's a serious problem with any frequency list generated from content you are not currently reading. Even things like the HSK vocab lists are very poor in this regard. 1 Quote
eddyf Posted April 14, 2016 at 03:11 AM Report Posted April 14, 2016 at 03:11 AM Well, some content will be more general and well-rounded than others. To me, a corpus that heavily features translated texts is particularly problematic from a linguistic standpoint, in a way that goes beyond the problem that every corpus will have of not being perfectly tailored to your own interests. As for the HSK list, my opinion is that up through level 5 the words are frequent enough that it's worth it to learn the entire list straight through. At HSK 6 it starts getting murky and it becomes worth it to mine your own vocab from content that you are reading/watching. Quote
imron Posted April 14, 2016 at 03:27 AM Report Posted April 14, 2016 at 03:27 AM in a way that goes beyond the problem that every corpus will have of not being perfectly tailored to your own interests. Which is why I advocate creating a corpus perfectly tailored to your own interests! (or, if not your own interests, then at least what you are currently reading). At HSK 6 it starts getting murky I think it gets murkier before that. Yes the words on the earlier lists have a high frequency in general texts, but there are also plenty of other frequent words that are not on these lists, and that will change quite significantly depending on what you are reading. If you have a choice between learning words that might be relevant in a few months time, or words that will be directly relevant that day, or in the coming days, then for me it's really a simple choice. You'll end up learning all the HSK vocab eventually, just on a more random schedule. Quote
freipole Posted May 3, 2016 at 07:44 PM Report Posted May 3, 2016 at 07:44 PM https://en.wiktionar...Frequency_listsThanks, iand, I've been looking for something like this for a while. Noticed something interesting in the first list: 台灣 台湾 is no. 80? Academia Sinica (not "Academica" as it is spelled in Wikipedia article) is a Taiwanese academic institution, so their corpus probably reflects Mandarin as it is spoken and written in Taiwan. Quote
iand Posted May 3, 2016 at 07:46 PM Report Posted May 3, 2016 at 07:46 PM Not only will a few minutes browsing the list prove it has a Taiwan-centered vocabulary, but a very old one, I think I saw various Soviet references in there. Quote
cingia Posted June 12, 2016 at 05:59 PM Report Posted June 12, 2016 at 05:59 PM I use this: "A frequency dictionary of mandarin chinese, Routledge" https://www.routledge.com/A-Frequency-Dictionary-of-Mandarin-Chinese-Core-Vocabulary-for-Learners/Xiao-Rayson-McEnery/p/book/9780415455862 The paperback version it's a bit expensive, but it's the best one I have ever seen. Quote
cingia Posted June 12, 2016 at 06:05 PM Report Posted June 12, 2016 at 06:05 PM Ps. Are you sure you want to use a frequency dictionary? It's so boring to nail down all the words in a list, I prefer reading a lot instead of learning words out of context... "enjoy the process" Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.