Jump to content
Chinese-Forums
  • Sign Up

HSK 3.0 ... new, new HSK?


Recommended Posts

Posted

Looks quite a comprehensive document, including character lists, word lists and grammar lists for the nine levels. For levels 1-6 these are split up, one list per level, but for the final three levels (the "new" ones) they're shown grouped together as "7-9". That might suggest the 高级 exam will be similar to how the very first HSK was organised: all 高级 students take the same exam and get graded 7, 8 or 9 (or fail) depending on how well they do.

Edit: or, they've just not yet got around to splitting up 7-9 yet.

... or there will just be three exam papers (初级、中级、高级)???

Posted

Also  "syllable" lists — 音节表 p9-14.

 

I'm curious to know how they're going to test these.

Posted
8 minutes ago, mungouk said:

Are those character lists or "syllable" lists — 音节表? (with the first few levels in pinyin)

Looks like, for each level, syllable lists followed by character lists followed by word lists (音节表、汉字表、词汇表).

I would have assumed that for foreign learners, knowing a character would include knowing its pronunciation, but they appear to have broken it down differently.

 

Posted

Also it looks like from level 4 onwards the exams will test translation/interpretation - alongside listening, speaking, reading and writing. For example it seems Level 6 requires oral interpretation of informal lanaguage, 非正式场合的口译任务, and Level 9 seems to have a simultaneous translation requirement - 同声传译任务. Interesting!

Posted
29 minutes ago, realmayo said:

Level 9 seems to have a simultaneous translation requirement

What on earth? Into and from what language? He wrote, bemusedly, while the massive PDF downloaded. 

  • Good question! 1
Posted

Ha, they use //s in the readings for splittable verbs just like we do - now maybe people will stop emailing us to ask what’s up with those.

 

Also it seems to be a scanned PDF with a watermark so extracting these is going to be a major pain.

Posted

Oh, interesting they specify characters per minute for listening speeds. Up to a max of 800 [edit: ignore, I misread it]. And also for reading.

 

This simultaneous translation thing, though.... 能够完成正式场合专业内容的同声传译任务 - that's a postgraduate degree in itself. And the logistics. Good God, the logistics. Are they doing both directions? Which languages?

 

21 hours ago, mikelove said:

scanned PDF with a watermark

仅供查阅 indeed. 仅供 very slow scrolling while trying to scan the text for the terms I'm interested in, more like.

 

Posted

If you open Adobe Acrobat*, make sure you have Scanned Documents > Settings set to CHINESE (SIMPLIFIED) then open it and hit "Edit PDF"  you can convert it to proper searchable and copyable text, but you might have to do it for each page separately. 


Sounds like a job for @大块头 ??

* I'm using Acrobat Pro DC 2021, older versions may be different.

  • Like 1
Posted
7 minutes ago, mungouk said:

you might have to do it for each page separately. 

 

It looks like just scrolling through the document does the conversion a page at a time, but on my 3-year old Macbook Pro this is not very quick. 

Posted
50 minutes ago, mikelove said:

extracting these is going to be a major pain.

 

I'm currently converting the PDF to Word using Acrobat Pro but if anyone has a faster computer and wants a race, please feel free.

 

Presumably though they will release an Excel version in good time, like they did for the last set of vocab...?  

 

Posted
24 minutes ago, mungouk said:

converting the PDF to Word using Acrobat Pro

 

Well, it looks like Word choked on it. At least it messed up many pages.

 

Good luck everyone!

Posted

They have that table again on p. 7 of this PDF. All the totals haven't changed since they shared the proposed vocabulary list last year, which I'll take as a sign that I can do something during my weekends this spring besides writing flashcards.

 

image.thumb.png.fef742968b5ecf0dcdb1a2fe22cffc13.png

  • Like 1
Posted

This looks pretty clear / regular, won't have time to try this until later but if anybody has ImageMagick I'd suggest running it through that to convert the pages into images, chunk them up into smaller images for each column, and remove anything light enough to be a watermark, then put that all back into a PDF and run that through OCR.

  • Like 1
Posted

@mikelove I have Tesseract chewing on the PDF right now. Maybe all that pre-processing won't be necessary?

 

edit: Here's the raw text output. Doing something like what Mike suggested may be the best way of extracting these wordlists.

 

raw_ocr_output.txt

  • Like 3
Posted
2 hours ago, mikelove said:

now maybe people will stop emailing us to ask what’s up with those

Or now maybe new people will start asking you now.

 

2 hours ago, roddy said:

This simultaneous translation thing, though.... 能够完成正式场合专业内容的同声传译任务 - that's a postgraduate degree in itself. And the logistics. Good God, the logistics. Are they doing both directions? Which languages?

 

Really interesting isn't it! If they're trying to test examinees' skills in simultaneous translation, that sounds a little unfair - surely lots of people are likely to be proficient in Chinese but poor at simultaneous translation. But then, in the same way, listening or reading comprehensions are exam-skills rather than language-skills. So maybe it's reasonable? Especially with material limited to just HSK9-level vocabulary and grammar patterns. But it would have to be teachable first, too. And yes the logistics, regardless of which direction(s), hard to see how a teacher in Chinese university with a class whose students speak a mix of French, English, Vietnamese, Korean and Russian could deal with this.

Posted

Actually, I think the standard foreigner-learn-Chinese degree curriculum includes interpretation skills, so if they're working off that... still odd though. Reading and listening comprehension tasks at least try to mimic real world skills you could reasonably expect to use. 

Posted

OK, ImageMagick does a beautiful job removing the watermark:

 

Quote

convert -density 300 W020210329527301787356.pdf -quality 100 hsk2022.jpg

 

Will convert it to a bunch of JPEGs, then:

 

Quote

mogrify -threshold 70% *.jpg

 

Will remove the watermark. (make sure you do this in a separate directory or it'll de-watermark your other JPEGs too) Run those through an OCR and you should get a relatively clean text.

 

Quickly ran a test page (page 59) through Pleco OCR and when isolated to single columns it was 100% accurate, so just have to chunk this up into smaller images and do that in bulk.

  • Like 3
Posted

The following also does a good job at removing the watermark.

 

Quote

convert input.png -fuzz 15% -fill white -opaque "#bdbcc0" result.png

 

 

result.png

  • Like 2

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...