UTAU is musical voice synthesis software created by Ameya/Ayame. Though its purpose is similar to the Vocaloid software created by Yamaha, UTAU functions natively as a stand-alone program on both PC and Mac OSX platforms without requiring a VSTi plugin for third-party software (see: UTAU-Synth) and is licensed as freeware. The PC version also has a shareware option through donation .
UTAU was developed from the concept of Jinriki ("manual") Vocaloid, a practice in which existing voice samples are spliced, re-assembled and pitched to create a singing voice in software such as Melodyne. Ameya released the initial version of UTAU in March 2008. On May 27th, 2011, the first beta version of UTAU-Synth for Mac was released.
The software comes pre-loaded with a robotic-sounding young female voice, Utane Uta (known as Defoko), created from the AquesTalk TTS program . Unlike Vocaloid, however, the UTAU engine can openly accept any manner of .wav files, allowing its users to create their own "voicebanks" and distribute them online.
UTAU is a Japanese program, and thus non-Unicode, meaning the PC version of UTAU requires Japanese locale settings to function properly. The latest release of UTAU (v0.4.18c) has an English interface, but many international users have created language patches to translate the interface into their own languages.
To produce a singing vocal track, users place notes on a piano-roll interface, and insert phonetic "lyrics" for each note. The .wav samples in an UTAU voicebank can either be accessed directly by the filename (for example, "ka.wav") or by an "alias" set up in the oto.ini configuration file (for example, ka.wav in a Japanese voicebank might also be accessed as hiragana か in the lyric editor). Filenames and/or aliases are usually organized by individual phonemes, as UTAU has no built-in dictionary function like Vocaloid or TTS systems.
Users can then modify tempo, pitch, portamento, vibrato, envelope, consonant velocity and many other settings to change the quality and tone of the voice. UTAU saves the user-created files that contain all these settings under the extension .ust, or "UTAU Sequence Text." Users often distribute these .ust files for others to use. UTAU is also capable of importing MIDI files and Vocaloid 1 and 2's .VSQ (Vocaloid Sequence file). There is no current native support for Vocaloid3's .VSQX filetype, and thus must be converted using the VSQX->VSQ Job Plugin in Vocaloid, or through the use of a third outside program such as Cadencii.
To play a .ust, UTAU runs the .wav samples from a voicebank through a resampling engine. The software comes with a default resampler.exe and resampler.dll, though there are several third-party resamplers that each process the .wav samples differently for a unique sound.
UTAU voicebanks consist of 3 main filetypes: a set of .wav samples, frequency map files for each corresponding .wav generated by the software resampler, and an oto.ini (or in the case of UTAU-Synth, oto_ini.txt), which is a diagram of each .wav sample's consonant and vowel structure and text aliases that the user configures his or herself. Additionally, UTAU voice folders may contain a readme.txt, a character.txt, and/or icons and image files which provide information and visuals for the bank and its corresponding character avatar.