Home » Blog » Hacks » Converting zenkaku to hankaku

Converting zenkaku to hankaku

For historical reasons, Chinese, Japanese and Korean word processors allow certain characters (including the Roman alphabet and Arabic numerals) to be entered using wide variants called fullwidth (zenkaku; 全角) characters instead of — or rather, in addition to — the ordinary halfwidth (hankaku; 半角) characters used by everyone else.

When preparing Japanese text for translation in CAT tools like OmegaT, it often helps to convert zenkaku characters to their hankaku equivalents. The Japanese version of Microsoft Word has a built-in feature that will do this, but it’s a little bit annoying because it also converts katakana characters. All I really want to do is convert the non-Japanese characters.

Here’s a Perl script I’ve been using to do this inside TextWrangler:

#!/usr/bin/perl -w

# File: ZtoH.pl
# Author: Phil Ronan, japanesetranslator.co.uk

# Convert zenkaku to hankaku

# Prepare Japanese UTF-8 plain-text files for translation by
# converting full-width (zenkaku) characters to their half-width
# (hankaku) counterparts. Katakana characters are not converted.

# This script was written for use as a TextWrangler plugin, but
# can also be used as a command line tool -- simply pipe in the
# text you want to convert, and the results will be delivered
# to stdout.

use utf8;
use Encode;
binmode STDOUT, ":utf8";

my $s;

while (<>) {
  $s = decode_utf8($_);
  $s =~ tr/ !"#$%&'()*+,-.// !"#$%&'()*+,-.\//;
  $s =~ tr/0-9:;<=>?@A-Z[\]^/0-9:;<=>?@A-Z[\\]^/;
  $s =~ tr/_`a-z{|}〜¢£¬ ̄¦¥₩/_`a-z{|}\~¢£¬¯¦¥₩/;
  print $s;

(You can download the script here, but you’ll need to rename it to ZtoH.pl before running it. Make sure you save the script using UTF-8 encoding.)

If you’re using TextWrangler, simply place this file inside your Unix Filters directory (~/Library/Application Support/TextWrangler/Unix Support/Unix Filters). You should then see this script listed under Unix Filters in the !# menu. Update: In more recent version of TextWrangler, the text filters have been moved to Text » Apply Text Filter.

If you don’t have TextWrangler or you’re running some other system, then you can still use this script as long as you have Perl installed. Just pipe your UTF-8 encoded text through it, and the results will appear on stdout.

This entry was posted in Hacks, Translation and tagged . Bookmark the permalink.

2 Responses to Converting zenkaku to hankaku

  1. Bryan S. Carkin says:

    I recently saw a japanese alphabet. Where can I find the whole alphabet?

    B= Tu
    R= Shi
    Y= Fu
    A= Ka
    N= To

More posts

Previous post:

Next post:

Header image: The Tokugawa family crest (three hollyhock leaves in a circle) decorates the roof of the bell tower at Tōshō-gū shrine in Nikkō (日光東照宮). Photo: Frank Gualtieri.

Powered by WordPress