Home » Posts tagged “unicode”

Tag Archives: unicode

Converting zenkaku to hankaku

For historical reasons, Chinese, Japanese and Korean word processors allow certain characters (including the Roman alphabet and Arabic numerals) to be entered using wide variants called fullwidth (zenkaku; 全角) characters instead of — or rather, in addition to — the ordinary halfwidth (hankaku; 半角) characters used by everyone else. When preparing Japanese text for translation in CAT tools like OmegaT, it often helps to convert zenkaku characters to their hankaku equivalents. The Japanese version of Microsoft Word has a built-in feature that will do this, but it’s a little bit annoying because it also converts katakana characters. All I really want to do is convert the non-Japanese characters. Here’s a Perl script I’ve been using to do this inside TextWrangler: #!/usr/bin/perl -w # File: ZtoH.pl # Author: Phil Ronan, japanesetranslator.co.uk # Convert zenkaku to hankaku # Prepare Japanese UTF-8 plain-text files for translation by # converting full-width (zenkaku) characters to their half-width # (hankaku) counterparts. Katakana characters are not converted. # This script was written for use as a TextWrangler plugin, but # can also be used as a command line tool — [More…]

Posted in Hacks, Translation | Tagged | 2 Comments

Header image: Busy night-time traffic appears as trails of light in this long exposure shot of Akasaka-Mitsuke (赤坂見附) in Tokyo by user DarkFritz at Wikimedia Commons.

Powered by WordPress