GSoC/GCI Archive
Google Code-in 2012 Apertium

Categorise albanian nouns by inflectional paradigm

completed by: jam0522

mentors: Francis Tyers, Filip Petkovski

This file has Albanian nouns in:

$ cat sq.xml | grep '<e lm' | grep '__n_'  | head -10
    <e lm="mal"><i>mal</i><par n="mal__n_m"/></e>
    <e lm="vend"><i>vend</i><par n="mal__n_m"/></e>
    <e lm="terrorizm"><i>terrorizm</i><par n="mal__n_m"/></e>
    <e lm="turizm"><i>turizm</i><par n="mal__n_m"/></e>
    <e lm="komunizm"><i>komunizm</i><par n="mal__n_m"/></e>
    <e lm="zog"><i>zog</i><par n="zog__n_m"/></e>
    <e lm="vajzë"><i>vajz</i><par n="vajz/ë__n_f"/></e>
    <e lm="muze"><i>muze</i><par n="muze__n_m"/></e>
    <e lm="krye"><i>krye</i><par n="muze__n_m"/></e>
    <e lm="edicion"><i>edicion</i><par n="mal__n_m"/></e>


each line has three parts:

* lm="" the lemma of the word,

*<i></i> = the "stem" of the word, and

*<par n=""/> = the paradigm of the word


The idea of the task is to check that each entry <e> has the correct paradigm <par>


This task requires knowledge of Albanian. If you have more questions come on IRC: #apertium