GSoC/GCI Archive
Google Code-in 2013 Apertium

Examples of minimum files where an Apertium language pair messes up (X)HTML formatting (Catalan and English) [1]

completed by: DarkGaia

mentors: Mikel L. Forcada

Sometimes, an Apertium language pair takes a valid HTML/XHTML source file but delivers an invalid HTML/XHTML target file, regardless of translation quality. This can usually be blamed on incorrect handling of superblanks in structural transfer rules. The task: (1) select a language pair (2) Install Apertium locally from the Subversion repository; install the language pair; make sure that it works (3) download a series of HTML/XHTML files for testing purposes. Make sure they are valid using an HTML/XHTML validator (4) translate the valid files with the language pair (5) check if the translated files are also valid HTML/XHTML files; select those that aren't (6) find the first source of non-validity and study it, and strip the source file until you just have a small (valid!) source file with some text around the minimum possible example of problematic tags; save each such file and describe the error.