Daybreakin Things

Filed under In English

Today is one of national holiday, “Hangul Day”. This day is for celebrating creation of Korean alphabets, Hangul. It is a truly wonderful achievement of our nation with almost no illiterat people. Even we have sayings like “One doesn’t know Giyeok(ㄱ) with a sickle.”, which means the one is a fool because Hangul is so easy that everyone should know.

However, we have very many issues about Hangul in the computer world because typical softwares and operating systems were designed by single-language speaking people such as US Americans. Many foreign computer games like Supreme Commander does not support to type Hangul in them. (Games of Blizzard are the significant exceptions.)

Recently, problems of displaying Hangul became just ignorable for that many softwares now use Unicode and almost all operating systems have Unicode fonts (although some of them are not pretty for native Koreans).

Problem 1: Fonts

For Latin characters, major operating systems like Microsoft Windows and Apple MacOSX share a set of common fonts–Arial, Times New Roman, Courier, Impact, Comic Sans MS, Georgia, Lucida Console, Lucida Sans, Palatino, Trebuchet MS/Helvetica and Tahoma/Geneva. This fact makes web designs to have diversity and beauty with consistent looks in many different OSs. I think this is why W3C had not worked actively on web fonts specification.

But for Korean font, the situation is very bad. With monopoly of Microsoft Windows, the four major fonts that are the basic fonts of Korean Windows also monopolized. They are Gulim(굴림), Dotum(돋움), Batang(바탕) and Gungseo(궁서). The former two are sans-serif, and the laters are serif. In small sizes like 10pt, they are displayed with bitmaps because in the past anti-aliasing techniques were not good enough to improve readability on complicated Esatern Asian characters including Hangul.

The problem is, the most frequently used Gulim is made from Japanese font, Naru, and has been criticized for destructing native goemetry and beauty of Hangul. Many web designers also doesn’t like it, but they have no other option except Dotum, but also not pretty. So using images for titles and brochures became very common to use commercial fonts like YoonGothic(윤고딕). Here, another problem is making a new Korean font requires huge amount of costs and time. You just need design about 100 characters for basic Latin font, but for Unicode Korean, we need 11,172 + alpha characters. If you want make it more perfect, you also have to include common Chinese characters, which may be another thousands. So many good fonts were developed as commercial, non-free, and not popularized, and consequently, web designers couldn’t avoid to use images instead of text.

Font Test

Some sample Hangul fonts

Windows Vista is a very remarkable version for Koreans because it first introduced a new default UI font, Malgun Gothic(맑은 고딕), with clear-type enabled. Yes, finally we “graduated” the bitmap font era. Also many local governments like Seoul and major IT companies are developing new vector fonts and releasing them free to improve their brand images. This will make the situation better, but still it’s not good enough like they’re embedded in operating systems.

For programmers, separating English font and Korean font is very important to get good readability, because usually auto-selected fonts for monspace Latin font is generally very unreadable. Fortunately, my favorite text editor gVim support this, but SSH client PuTTY didn’t. So I had to make a patch–dPuTTY.

Font Separation

Comparison of separated fonts and auto-selected fonts

Problem 2: IME

All major operating systems provides internationalized input with their own IME subsystems. Especially, CJK1 characters need complicated IME with automata and dictionaries.

On Microsoft Windows, the operating system offers a series of Input Method APIs. They encapsulates the composition process, so applications need to know just whether the composition is begun, being done, finished. Of course, they have to update their text view or edit controls on those events from IME.

If an application does not support IME interaction, Windows will do a fall-back like this:

[Flash] /blog/attachment/9747313015.swf

Compare with this native behaviour:

[Flash] /blog/attachment/8816602530.swf

The later one feels much more comfortable for Korean people. Enabling the application to interact with native IME is very very important for internationalization.

There is another important problem of IME. There are NO operating systems that make user able to know the IME state conveniently. Users must look around or move their eyes to see the IME toolbar to detect whether the current input mode is Hangul or English. Why Microsoft or Apple hasn’t changed the color or shape of input cursor according to IME status? I think just they weren’t aware of this problem because they don’t use IME and don’t switch two languages frequently.

There were a few softwares that implemented this feature in the past, but currently we don’t have those softwares in our major computing environment. Almost every software uses only Windows’ native IME features as provided.

* * *

Internationalizing a software truly involves headaching problems in many cases. There are other problems with file encoding, mp3 tag encoding, ANSI applications with AppLocale and many many. I hope Latin-language speaking developer would consider basic i18n habits more. Sometimes, I imagine what if modern computer or operating systems were designed in Korea. :P

  1. Chinese, Japanese, Korean. It implies that processing these three languages properly is difficult for developers. 

한글날 특집 포스팅은.... 영어로! -_-; 과연 외국개발자들이 얼마나 이 글을 볼까;;

아침놀님, "한글 폰트 예쁘게 쫌!" 맞다고요.


커서 색이나 모양을 바꾸는 것이 답이 될지 잘 모르겠습니다. 현재 이 댓글창처럼 폰트가 충분히 크다면 인식이 쉽겠지만, 인터넷 주소창처럼 작은 크기에선 어려울 수도 있구요. 결국엔 사용자에게 커서A는 영어 입력용, 커서B는 한글 입력용이라는 것을 연습시켜야 한다는 측면도 있겠네요. 이건 막상 해보면 금방 적응될 것 같지만서도, 웹폰트마냥 웹커서 이런 것들이 덩달아 등장한다면 좀 무섭지 않을까...(...)
전 사용자가 현재 IME 상태를 기억하게 하거나 학습하게 하는 방법보다는, IME가 똑똑하게 알아듣고 변환해 줄 수 있는 방법을 생각하고 있습니다. 처음 입력 서너 개만 받으면 이게 영어인지 한글인지 판단할 수 있을 거 같아요. 물론 이 쪽도 여러가지 문제가 있겠지만요.
(포스팅은 영어였으나 답글은 한글로...)


제 경험에 비추어 볼때, 색깔을 바꾸는 건 확실히 효과가 있었습니다. 모양은 어떻게 바꾸는 게 좋을지 결정하기가 좀 어렵죠.
IME 자동변환의 경우 기존의 워드프로세서에 있던 수준이라면 전 안 하는 게 낫다고 생각합니다. 언젠가, 특정 단어를 입력하려고 하면 자꾸 억지로 변환되어버리는 바람에 스페이스 사이에 넣어서 입력한 다음 나중에 백스페이스로 공백만 지우고 뭐 이런 삽질을 해야 했던 기억이 있네요;; 뭐, 가장 좋은 건 사람 마음(...)을 읽어서 처리하면 되겠지만 말입니다 =33