GSoC/GCI Archive
Google Code-in 2013 Apertium

Examples of minimum files where an Apertium language pair messes up wordprocessor formatting (English and Catalan) [1]

completed by: Gabriel Esteban

mentors: Mikel L. Forcada

Sometimes, an Apertium language pair takes a valid ODT or RTF source file but delivers an invalid HTML/XHTML target file, regardless of translation quality. This can usually be blamed on incorrect handling of superblanks in structural transfer rules. The task: (1) select a language pair (2) Install Apertium locally from the Subversion repository; install the language pair; make sure that it works (3) download a series of ODT or RTF files for testing purposes. Make sure they are opened using LibreOffice/OpenOffice.org (4) translate the valid files with the language pair (5) check if the translated files are also valid ODT or RTF files; select those that aren't (6) find the first source of non-validity and study it, and strip the source file until you just have a small (valid!) source file with some text around the minimum possible example of problematic tags; save each such file and describe the error.