If this sort of text looks familiar, congratulations! Welcome to the world of mojibake! In this tutorial, I will be explain what mojibake is, what causes it, and how you can remedy it.
What is mojibake?
If you see gibberish instead of Japanese characters, be on a webpage, inside a text file, or in file names, you are dealing with mojibake. It's an issue involving how your computer displays text. Images and flash objects, such as the AAA advertisement in the picture above, are not affected by mojibake. Mojibake is a big problem for UTAU users, but it can affect any program that relies on Japanese file names, such as MikuMikuDance.
Mojibake is the Japanese term for the gibberish you get when your computer can't handle certain special characters. As a UTAU user, you'll most likely only be dealing with mojibake related to Japanese, but it can happen with other languages too.
Important note: Mojibake is an entirely different problem then the issue of missing fonts. If you see �, hexadecimals, boxes, etc, your issue is missing fonts. American versions of Windows XP and Vista do not come with any fonts capable of showing Japanese characters. You can install these fonts simply: Start > Control Panel > Regional and Language Options > Languages. Check the "Install files for East Asian languages" box and press Apply. You may need to restart your computer for the changes to apply. Keep in mind this will not solve mojibake issues; it will only allow a computer to be capable of displaying Japanese text at all.
Cause of mojibake
This section is optional, but it’s good to know. If you want a more technically accurate explanation, see the Wikipedia page on mojibake; this is designed to give a short, easy-to-understand introduction.
Computers were originally designed by speakers of European languages, such as English. Engineers used a certain method, called ACSII, to encode Roman characters into something a computer could understand (binary). However, there's a limit to how many characters can be encoded with ASCII, so as people started using computers in different countries, it became necessary to invent other encoding systems to handle languages that don't use the Roman alphabet, such as Arabic, Japanese, and Russian.
There are many different encoding standards nowadays, such as Unicode UTF-8 and Shift-JIS. A certain string of zeros and ones might represent "あ" when read using Shift-JIS, but when read using Unicode UTF-8, it might be interpreted as "Á&ç".
Ideally, a webpage or a text file will contain a string of code that basically says "this in encoded in ASCII/Unicode/Shift-JIS/etc." But sometimes this is missing or not read properly. As a result, your computer might try to read something with the wrong encoding method.
How to fix a webpage in mojibake (Windows, Mac, Linux)
Usually, the default setting will work just fine. Mojibake in websites is not a common problem in modern browsers, but as at least one major MMD website I have come across causes American computers to display mojibake, I figured this was necessary to include.
Japanese webpages usually are either encoded in Unicode UTF-8 or Shift-JIS. Sometimes, based on your computer's location, the website will try assume you want to read in Unicode when Shift-JIS is being used, or it's an older website and it doesn't declare its encoding properly to your browser. This can be fixed by telling your browser which character encoding method is being used:
Firefox: View>Character Encoding
Safari: View>Text Encoding
Internet Explorer: Right click on the webpage>Encoding>More
Chrome: Chrome menu on the toolbar>Tools>Encoding
You might have to do a bit of guessing to find out how a website is encoded. For Japanese websites, try the following in this order:
3. Unicode UTF-8
4. Autodetect for Japanese (not all browsers have this option)
5. ISO 2022-JP
6. Unicode UTF-16
7. Anything that says Japanese in front of it
Make sure to refresh the page every time you change the encoding method.
Keep in mind you might have to switch the character encoding method back to its default in order for other webpages to be viewed correctly. “文字化け” in Unicode UTF-8 is “譁・ｭ怜喧縺�” in Shift-JIS, so what works on one Japanese webpage break another Japanese webpage. In fact, the screenshot of VPVP Wiki at the top of this tutorial was created by me purposely choosing the wrong encoding system so I'd have a good screenshot of mojibake on a webpage.
How to fix mojibake in stuff you downloaded (Mac OSX)
Macs are pretty clever when it comes to Japanese. They do not have system locale (kind of) and come pre-installed with Japanese fonts, so they are generally capable of handling Japanese quite well. Unfortunately, when things do mess up, it can be a little ornery to fix them.
Usually, the files Macs have trouble with regarding Japanese are files inside of ZIP files and plaintext files. This can cause headaches for a Mac-based UTAU or UTAU-synth user, because readme files are plaintext files and UTAU voicebanks are generally distributed as ZIP files.
.zip (ie Voicebanks)
The way that a Mac unzips ZIP files sometimes incorrectly guesses the method of character encoding. The version of OSX does not seem to matter; I’ve had this issue running 10.4 (Tiger) all the way to 10.9 (Mavericks).
For ZIP files, the simplest way to remedy this to use a program known as The Unarchiever. This program, which can also open more unusual file compression types (7zip, RAR, etc), correctly determines the method of character encoding automatically. Once you have installed The Unarchiever, simply control click on your ZIP file, choose “Open with”, and select The Unarchiever.
You can download it here. And yes, it’s freeware and does not contain viruses: https://itunes.apple.com/us/app/the-unarchiver/id425424353?mt=12
There are other ways of opening ZIP files, but I have had the most success with The Unarchiever regarding Japanese files.
Protip: You can prevent your own .zip files from becoming mojibake on other people's computers by not using .zip at all. LZH files are specifically designed to handle Japanese text without turning into mojibake, which is why a lot of MMD files from Japanese sites are compressed as LZH files. (They might look like IZH depending on how your computer's font, but it's LZH, not IZH.) LZH files require a special program such as 7zip (Windows) or The Unarchiever (Mac OSX) to be opened, so if you do distribute files to a western audience as LZH, expect a barrage of comments asking how to open the file.
.txt (ie readmes)
TextEdit > Preferences > Opening files > Shift JIS
Close the file and re-open it. It should now be in legible Japanese. If not, try other options, such as Unicode.
How to fix mojibake in stuff you downloaded (Windows XP, Windows Vista, Windows 7)
Windows computers having a setting known as “locale” that, in part, affects the way Windows handles character encoding. If you bought your computer in the United States, it will probably be set to “EN-US” (English-United States), “United States,” or something similar. Thankfully, this setting can be changed.
Using UTAU on a non-Japanese locale (not recommended)
You might have heard of a program called AppLocale that some people use to get UTAU to run on an English locale. I do not recommend using it, as it takes a while to set up and does not work very well for UTAU. Additionally, it was only designed to be used for Windows 2003 and XP. And, to top it off, it can only handle romaji voicebanks. As it works poorly and is essentially a complicated solution to a simple problem, I will not be going over how to use UTAU with AppLocale here.
The reason why AppLocale works poorly for UTAU is because the main issue of using UTAU on an English locale (inability to find the correct files because it's looking for a Japanese file name but the file names of the folder you unzipped are in mojibake nonsense) is a system issue. AppLocale affects the program you're running (UTAU), not the files the program requires to run (UTAU voicebanks).
Attempting to use UTAU without AppLocale or changing your system locale (see below) will result in strange errors, usually taking the from of strings of question marks, regardless of the type of voicebank you are using.
Changing the locale to Japanese (recommended)
Changing your system locale to Japanese will not put your entire computer in Japanese. Your computer’s menus and the vast majority of programs will still be English. Please read the notes section below before changing locale.
In order to install UTAU properly on Windows, be it Windows XP, Vista, or 7, your system locale must be set to Japanese before you download the ZIP file containing the UTAU program. If you changed the locale after downloading UTAU, you will still run into issues, such as Defoko's voicebank being in mojibake. If you already downloaded UTAU on a non-Japanese locale, delete it and re-download it after changing to Japanese.
To change system locale, you must be logged in with administrator privileges. In some operating systems, the option to change locale will not even be visible if you are not logged in as an administrator.
In Windows XP:
Start>Control Panel>Regional and Language Options>Advanced
Under the “Language for non-Unicode programs” part, select Japanese. Apply the changes. Allow your computer to restart.
NOTE: Windows XP does not come with Japanese fonts. I did not need the original install disks to install said fonts, but I have been told this is the exception, not the norm. When you change the locale to Japanese, your computer will give you instructions on how to install the fonts, usually referred to as "East Asian Language Packs," if they have not been installed already. Follow those instructions; they'll tell you if you need the disks or not.
In Windows Vista or 7:
Start>Control Panel>Clock, Language, and Region>Administrative>Language for non-Unicode programs>Change system locale
Change it to Japanese, then select Apply. Allow your computer to restart.
Notes about changing system locale on any Windows computer
A few programs, such as Sims 2, Sims 3, and Spore, are known to look at system locale and, based on that, assume you speak the same language as the language listed in the locale. This can lead to a few interesting effects; under a Japanese locale, the installer for Sims 3 might run in Japanese, but the actual program will be English. If you are worried about the potential effects, you can set your locale back to its original setting when installing a new program, but as an owner of Spore and Sims 3, I have had no issues.
Under a Japanese locale, \ on your keyboard might type ¥ instead of \. As \ is only ever used in file locations, this is not a very big deal.
Locale does not automatically become changed back if you turn off the computer or if there’s a power outage. If it was set to Japanese last week, unless you changed it, it will still be set to Japanese this week.
If you save a file with a Japanese file name while under Japanese locale, link to it to something, then change to English locale, your link might break. MMD models turn white due to inability to link to the texture files, UTAU voicebanks might not work, etc. It should be fixed automatically once you change back to Japanese locale.
Make sure your system locale is set to Japanese before you download:
1. The UTAU program
2. Any Japanese voicebank
3. Any voicebank that has Japanese characters somewhere in the file name (such as an English voicebank with a Japanese name)
4. Any files relating to MMD (mojibake can cause models to be missing textures, eyes, etc)
5. On second thoughts, you might as well leave your system locale to Japanese all the time, unless it causes issues with installing programs (see above).