GSoC/GCI Archive
Google Code-in 2012 Apertium

Extract Armenian noun translations from Wiktionary

completed by: conor-f

mentors: Francis Tyers, Jonathan

Wiktionary has lots of translations for Armenian nouns, for example consider the page:



աստիճան (astič̣an)

  1. degree
  2. extent



The idea of this task is to extract these translations into lttoolbox XML format as follows:


<e c=""><p><l>աստիճան<s n="n"/><s n="nn"/></l><r>degree<s n="n"/></r></p></e>
<e c=""><p><l>աստիճան<s n="n"/><s n="nn"/></l><r>extent<s n="n"/></r></p></e>
<e c="of stairs"><p><l>աստիճան<s n="n"/><s n="nn"/></l><r>step<s n="n"/></r></p></e>
<e c=""><p><l>աստիճան<s n="n"/><s n="nn"/></l><r>stair<s n="n"/></r></p></e>
<e c=""><p><l>աստիճան<s n="n"/><s n="nn"/><s n="pl"/></l><r>stairs<s n="n"/></r></p></e>
<e c="colloquial"><p><l>աստիճան<s n="n"/><s n="nn"/></l><r>ladder<s n="n"/></r></p></e>


You will need to look out for:

* translations which are only translations in the plural

* making sure that comments are put in the comment field

* ensuring that the animacy on the Armenian side is correct


For further information about this task, join us on IRC: #apertium