The copyright situation for this article is unclear. It does not belong to the author of this site. Please see the copyright notice. If you have information about the copyright contact me!

Art of Language Independence

by Ben Greer

Goal:

Although I didn't know it when I started, the reason ScryMUD came into existence was Eric Raymond's concept of Egoboo: "the enhancement of one's reputation among other fans", as he enunciated so powerfully in his landmark paper, The Cathedral and the Bazaar. I wanted as many people as possible to enjoy my game, and their enjoyment and use of the game and its code was all the payment I could ask for. The largest group of people in the game spoke English, but after I released the code, I received a few questions about porting it to various different languages. I did not want to fork the code, and because of the nature of text-based games, the vast majority of the code had as its sole purpose the generation of text. I needed a way to generalize this process so that as many languages as possible could be placed into the same code framework that had worked so well for English.

Dictionary Thumping is much more rewarding

To help fill in the blanks.

There are four distinct areas of translations that are needed: the player commands, code generated messages, world database (DB) entries, and player communications. So far, I have attacked the code generated messages only, and I see no way to easily deal with player communication, short of some kind of BabelFish utility embedded into the server. World DB language independence is coming soon, as are player commands.

Current Solution:

My solution to code generated messages has been fairly straightforward. My main method of getting text to a player is though the critter::show(String& buf) method. The interesting part is getting the right string of characters into buf. To do that, I use my Sprintf(String& buf, char* format, ...) method. Its format is similar to sprintf, though less complete, and safe from buffer over-runs.

For example, you might see code like this:

int hit(int i_th, String& victim, critter& pc) {
   String buf(100);
   critter* foo = pc.getCurRoom()->haveCritNamed(i_th, victim, pc);
   if (foo == NULL) {
      Sprintf(buf, "You don't see %S here.\n", &victim);
      pc.show(buf);
      return -1;
   }
   ...
}

However, what if you are a speaker of a different language? The name of the victim is language independent already (depending on how the World was written at least), but the text surrounding it ("You don't...") is definitely very English! So, we just need a way to get the right format string to feed into Sprintf.

I chose to do this with a (very) large array of array of const char*. Here is a snippet of my code (from lang_strings.cc):

const char* language_strings[LS_ENTRIES][LS_PER_ENTRY] = {
 /* CS_NONE */
   {
      "Should Never See This",
      "Nunca Deberias Ver Esto",
      NULL,
      NULL
   },

   ....

 /* CS_YOU_NO_SEE_EM */
   {
      "You don't see %S here.",
      "No ves %S aqui.",
      NULL,
      NULL
   },

   ....
};//language_strings

Now, if I know which index into the language_strings array I need, and I know which language version of that entry, I can simply get it with something like this:

   char* foo = language_strings[string_entry_idx][language_idx];

Since this is a very large array, I obviously didn't want to use hard coded indexes. My solution was to create a system where there is a large enum with an entry for each index into the array. The enum is shown in comments in the code above. I also created an enum for each different language. Since each critter (PC) knows its preferred language, and the code knows which entry it wants to show the critter, the code becomes fairly simple. I use the cstr method to get the correct string:

// From misc.cc
const char* cstr(CSentryE e, critter& c) {
   return CSHandler::getString(e, c.getLanguageChoice());
}

which calls this method:

//from lang_strings.cc
const char* CSHandler::getString(CSentryE which_string, LanguageE
language) {
   const char* retval =
language_strings[(int)(which_string)][(int)(language)];
   if (retval) {
      return retval;
   }
   else {
      // Default to English...
      // TODO:  Be smarter here, ie Portugese defaults to Spanish??
      return language_strings[(int)(which_string)][0];
   }//else
}//getString

This is how the previous example would look coded in a language independent manner:

int hit(int i_th, String& victim, critter& pc) {
   String buf(100);
   critter* foo = pc.getCurRoom()->haveCritNamed(i_th, victim, pc);
   if (foo == NULL) {
      Sprintf(buf, cstr(CS_YOU_NO_SEE_EM, pc), &victim);
      pc.show(buf);
      return -1;
   }
   ...
}

Now, the mapping of enums to the array of entries is extremely fragile! There is no way I could manually keep them straight. Also, I don't want to have to type in those entries and put NULLs in for the untranslated strings. So, I made a code generator. It uses .spec files to create the enumeration. Here is the header of one of my translations.spec file. It has comments that explain the supported syntax:

# The syntax is as follows:
# ENUM_NAME LANGUAGE "TEXT" LANGUAGE "TEXT" ... ~
#
# The parser is free form in that white-space does not matter.
# All strings should be in double quotes, and you must terminate
# with a ~ (tilde) as the LANGUAGE token.
#
# NOTE:  Order is not important, and you may abbreviate the language,
#        so long as you have enough significant characters to
#        distinguish amongst them.
#
# Comments must not be in the middle of a definition.  In other words,
# only right before the ENUM name.  Comments must not be longer than
# 200 characters per line!!
#
# Languages currently supported by the code_generation process are:
# English, Spanish, Portugese, Italian

CS_NONE
        eng "Should Never See This"
        spa "Nunca Deberias Ver Esto"
        ~

CS_YOU_NO_SEE_EM
        english "You don't see %S here."
        spanish "No ves %S aqui."
        ~

...

I pass one or more of these files into my code generator and it spits out the lang_strings.* files. The explanation of the code_gen program is best gleaned from the code itself (code_gen.*).

This process has proven to be easy to extend (just add another entry to a .spec file and use the cstr() method in the code.) There are some issues when you start dealing with more complex strings with many arguments, but so far I have been able to work around almost all of the problems. Another nice thing is that you can do this a little at a time: if you only know English, just put that in the .spec file. The code will still work for Spanish speakers, it will just show them English until someone gets the Spanish translation in there.

Future:

The other large area of text is the world DB itself. By this I mean the descriptions of the objects, rooms, critters and so on. My current stable branch of ScryMUD does not attempt to solve this problem, but the unstable branch (v3) does address it. ScryMUD v3 attempts to solve this with a similar method to the cstr() as explained above. However, instead of putting the translations in .spec files, the translations will be saved right beside the other descriptions in the world DB.

Because very few people know more than 2 languages, I forsee it taking a large number of volunteers to translate the world DB. To facilitate that, I plan on offering a web-based interface to a back end database (MySQL) so that people can offer their translations from their browser. Then, through magic scripts yet unwritten, that information will be incorporated into the World DB (which, in v3 will finally be in a real database instead of my current flat-file solution.) I don't know when I'll get v3 fully functional, but aside from the database code, the language independence code is implemented, if not tested. If you are brave, feel free to pull down a copy and take a peek!!

You may get much more information and the latest release, or CVS snapshot at my website: scry.wanfear.com

Remember the Egoboo: Feedback is always welcome!!