chinesemadrush Posted October 30, 2016 at 03:57 PM Report Share Posted October 30, 2016 at 03:57 PM Hi everyone, CC-CEDICT is a dictionary which you can search for words in. It is available in a text file here (https://www.mdbg.net/chindict/chindict.php?page=cedict) Unfortunately, I do not know how to parse the text file into nice columns in Excel. Does anyone know how to do this? I am aware that Excel has a text to column function but it doesn't seem advanced enough for the file structure used by CC-CEDICT Thanks, Kevin Quote Link to comment Share on other sites More sharing options...
iand Posted October 30, 2016 at 08:56 PM Report Share Posted October 30, 2016 at 08:56 PM I'm pretty sure it's just tab (tab-separated values). Quote Link to comment Share on other sites More sharing options...
iand Posted October 30, 2016 at 08:57 PM Report Share Posted October 30, 2016 at 08:57 PM That should have read tsv Quote Link to comment Share on other sites More sharing options...
Yadang Posted October 31, 2016 at 05:52 AM Report Share Posted October 31, 2016 at 05:52 AM Yeah, it should be tsv, but upon importing it to excel, it looks like it's delimited by space, which is problematic... You could probably write a function to delimit it based on when the pinyin starts (which is enclosed in brackets)... Let me know if you need help. Quote Link to comment Share on other sites More sharing options...
imron Posted October 31, 2016 at 06:42 AM Report Share Posted October 31, 2016 at 06:42 AM It's not tsv. The format is specified here. Do you have access to an editor that handles regular expressions? If not, download notepad++. Then open the CC-CEDICT file. Then Search->Replace (Ctrl+H) Set the 'Search Mode' to 'Regular expression'. In the 'Find what' field type: ^([^ ]+) ([^ ]+) (\[.*\]) (.*)$ (probably best to copy/paste this from this post). This is a regular expression that matches 4 fields - Traditional, Simplified, Pinyin, Definition In the 'Replace with' field type \1\t\2\t\3\t\4 This replaces each matching line with the individual fields separated by a tab character. Then hit Replace All and wait 10-20 seconds and you should be good to go. Just save the file and import it directly in to excel. 3 Quote Link to comment Share on other sites More sharing options...
chinesemadrush Posted November 6, 2016 at 12:47 PM Author Report Share Posted November 6, 2016 at 12:47 PM Thanks Imron would try it out later! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.