Tuesday, February 9, 2010

Computer assisted Chinese (part III): Dictionary

The second project stems from my deep unhappiness with the Chinese electronic dictionaries that are out there.  Most of them are aimed at Chinese people learning English, not westerners learning Chinese, and they really suck.  The only thing more awful than the software that's generally on them is the horrid dictionary databases in use.  If there are five ways of saying the same thing, how do we know which one is right for the situation?  And which are current and not archaic?

When I've read English dictionaries for Chinese people, the English seems about 100 years old and quite laughable.  I get the same impression when trying to find words I want in a Chinese-English dictionary:  It's not clear which one I want, and when my Chinese friends see my choice, they just tell me "oh, we don't use that word".  Pah.  And don't get me started about how inflexible the searching is.  I should have the ability to click on a character, break it down into components, then see other words with the same components.  That will help me sort out whether I want 青、晴、清、情,or 请!

When I'm online, I'm constantly using the dictionary at MDBG. It's aimed at Chinese for westerners, and the English descriptions are carefully worded so that in general, it's possible to tell which of the five ways of saying something you actually want. And seeing which HSK level a word is rated at also gives an idea of how common it is.

  http://www.mdbg.net/chindict/
 http://en.wikipedia.org/wiki/HSK_test

This website uses the CC-CEDICT database, which can be downloaded and used for free! Since I like the database so much, I've been dreaming of writing my own Chinese dictionary program which uses that database. And I intend to turn that dream into reality. But in order to do that, I need a platform to run it on.

I have looked around, and I've decided to write it for the Nintendo DS. This is a mature platform, with 4M of RAM, two nice screens, long battery life, a free toolchain, a GUI library, and as much read-only data as you can fit on an SDHC SD card. And cheap too: I picked up a grey market one the other day for A$130.

  http://en.wikipedia.org/wiki/Nintendo_DS_homebrew

As far as a user interface goes, I am thinking of making it work somewhat like this program called Pablo:

  http://ehaton.blogspot.com/2007/02/learning-chinese-pablo-my-personal.html
  http://haton.free.fr/chino/pabloscreenshot.jpg

There are also some free handwriting recognition engines which I plan to try, with a view to giving my dictionary handwriting recognition:

  http://www.tegaki.org/
  http://www.kiang.org/jordan/software/hanzilookup/
  http://hanzirecognizer.sourceforge.net/
  http://zinnia.sourceforge.net/

The DS doesn't come with a UI library, so I would either have to write that myself, or use someone else's.  I am intending to use the "Woopsi" library:

  http://ant.simianzombie.com/?page_id=128

I think my first task will be to port my Chutor program from J2ME to the DS.  As well as being useful in it's own right, it will be a good test of the UI library, and my ability to port Java to C++.

Once that's done, I will move onto doing the dictionary program.  I'm likely to leave the handwriting recognition to last.

No comments:

Post a Comment