GSoC/GCI Archive
Google Summer of Code 2012 Mozilla

l10n Tool for Standardization of Localization

by Gautam Akiwate for Mozilla

A need is felt for standardization of translation of words during localization. As of now, lack of an exhaustive list of such standardized usage leads to various problems for l10n contributors and more so for new contributors wanting to join the community. In a nutshell, the goal of the project is to create an exhaustive database for entries, terms, words and the corresponding suggested translation for the same. This should also extend to work with small phrases and sentences. The idea is to use a MT system on the existing localization work. Essentially, write scripts that would modify the existing localization work into a format suitable for MT system learning. These scripts will extend those that of "Transvision" which produces tmx files nightly. Based on the output of this step, a list will be created that would contain the entries, terms, words and the corresponding suggested translation. Corrections to this will be made, if any, manually. This then will be organized into a database along with a small web portal that will help contributors to l10n find words and preferred translations real easy. Again the working will be done using the Transvision portal as a base. At the outset the aim will be to do this for 4 languages (due to lingual restrictions in initial verification) but later extended to all languages supported by Mozilla. Finally, in addition to this quality comparison of suggestions from the tool compared to the already localized strings will be done and using the quality score from the MT system find if any inconsistencies exist in the localized strings. As can be seen, Transvision has partly achieved a few of the goals listed above. Hence, the plan is to leverage Transvision and extend it.