01 Feb

Extracting Localization Phrases from our Unity projects

In our Unity projects, we handle localization using the No Such Localization component, as detailed in a previous post. Specifically, we use the phrases version, where phrases act as keys to retrieve strings from a comprehensive table. Switching languages is as simple as changing the table for text (for images, we use a different method).

We maintain a large database of strings and extract a subset tailored to each game’s specific requirements. To do this, we compile a list of all the phrases used by the localization components within a given game. Our process is straightforward: we open each scene in the Unity project, retrieve all localization components, and store their phrases.

The following code demonstrates our approach. We’ve integrated this functionality into a Unity editor menu option, automating the entire process of retrieving phrases and writing them into a text file.

public class IKIGamesEditorTools
{
    private const string LocalizationMenuPath = "IKIGames/Localization/Extract All Phrases";
    private const string LocalizationFileName = "localization_keys.txt";

    [MenuItem(LocalizationMenuPath)]
    private static void ExtractAllLocalizationPhrasesInAllScenes()
    {
        HashSet<string> phrases = new HashSet<string>();
        List<string> scenePaths = GetScenePaths();

        foreach (var scenePath in scenePaths)
        {
            ExtractPhrasesFromScene(scenePath, phrases);
        }

        WritePhrases(phrases, LocalizationFileName);
    }

    private static List<string> GetScenePaths()
    {
        List<string> scenePaths = new List<string>();
        foreach (EditorBuildSettingsScene scene in EditorBuildSettings.scenes)
        {
            scenePaths.Add(scene.path);
        }
        return scenePaths;
    }

    private static void ExtractPhrasesFromScene(string scenePath, HashSet<string> phrases)
    {
        UnityEditor.SceneManagement.EditorSceneManager.OpenScene(scenePath, UnityEditor.SceneManagement.OpenSceneMode.Single);
        ExtractPhrasesFromLocalizationComponentes(phrases);
        // No need to close the new scene since OpenSceneMode.Single opens it as the only active scene
        Debug.Log($"Processed: {scenePath}");
    }

    private static void ExtractPhrasesFromLocalizationComponentes(HashSet<string> phrases)
    {
        foreach (NoSuchStudio.Localization.Localizers.TMProTextLocalizer localizer in Resources.FindObjectsOfTypeAll<NoSuchStudio.Localization.Localizers.TMProTextLocalizer>())
        {
            if (!string.IsNullOrEmpty(localizer.phrase))
            {
                phrases.Add(localizer.phrase);
            }
        }
        foreach (NoSuchStudio.Localization.Localizers.TextLocalizer localizer in Resources.FindObjectsOfTypeAll<NoSuchStudio.Localization.Localizers.TextLocalizer>())
        {
            if (!string.IsNullOrEmpty(localizer.phrase))
            {
                phrases.Add(localizer.phrase);
            }
        }
    }

    private static void WritePhrases(HashSet<string> hashSet, string filename)
    {
        string filePath = Path.Combine(Application.dataPath, filename);
        try
        {
            using (StreamWriter writer = new StreamWriter(filePath))
            {
                foreach (string item in hashSet)
                {
                    writer.WriteLine(item);
                }
            }
            Debug.Log($"Phrases written successfully to: {filePath}");
        }
        catch (System.Exception ex)
        {
            Debug.LogError($"Failed to write phrases to file: {ex.Message}");
        }
    }
}
16 Jan

16 bits COM Oddity

I can’t even pinpoint what a 16 bits COM Oddity really means, but I think the idea is therein, somehow. Previously, I explained how to code a simple a “hello, world” program using the DEBUG tool that was shipped with DOS. Revisiting this obsolete knowledge was unexpectedly fun. We’ll retrieve the hexadecimal version of “hello, world” (well, “hello, world!!”) from that post:

EB 13 0D 0A 68 65 6C 6C 6F 2C 20 77 6F 72 6C 64
21 21 0D 0A 24 B4 09 BA 02 01 CD 21 B4 00 CD 21

That’s all we need for our “hello, world!!” binary. 32 bytes exactly. We can create that file bit by bit but that’d be too excessive, I think. Let’s use the echo command instead. This is the full command I entered in my Windows 10 cmd.exe prompt:

echo|set /p="Ù‼♪◙hello, world!!♪◙$┤○║☻☺═!1└═!">hello.com

After that you’ll get a 16-bit COM, hello.com, that will display the “hello, world!!” message. Funny 🙂

What are those weird characters?

First a little explanation. We want our hello.com file to be, byte after byte, an exact representation of the hexadecimal sequence above presented. We’ll use cmd.exe commands to dump characters into the file and, if we choose our characters carefully in order to match the target hexadecimal values, we’ll end up with the exact representation we’re looking for. For instance, the first 2 bytes block, EB 13, is the “jmp 115” instruction. Then comes the newline (0D 0A), and so on. If we convert our hexadecimal to decimal, we get:

235 19 13 10 104 101 108 108 111 44 32 119 111 114 108 100 
 33 33 13 10  36 180   9 186   2  1 205 33 180   0 205  33

The first byte in hello.com must be EB, or 235 in decimal. In order to dump our characters from the command line, we’ll convert that decimal value to a character. I’m trying this on a Windows 10 (64-bits) machine, with cmd.exe using Code page 850 Multilingual Latin 1. In such code page, character 235 is Ù. And 19 is ‼. And, luckily, 13 is ♪ and 10 is ◙. Those two characters are especially important because they represent the carriage return and the line feed, respectively, and some shells won’t convert them to characters. However, happily, cmd.exe with my default code page will handle them as we need. To input those characters you can type the usual ALT + decimal value.

There are a few important things to notice:

Read More