GSoC/GCI Archive
Google Code-in 2012 Apertium

Extract Armenian adjective translations from Wiktionary

completed by: conor-f

mentors: Francis Tyers, Jonathan

Wiktionary has lots of translations for Armenian adjectives, for example consider the page:




ազնիվ (azniv)


  1. honest, honest-minded
  2. fair




The idea of this task is to extract these translations into lttoolbox XML format as follows:


<e c=""><p><l>ազնիվ<s n="adj"/></l><r>honest<s n="adj"/></r></p></e>
<e c=""><p><l>ազնիվ<s n="adj"/></l><r>honest-minded<s n="adj"/></r></p></e>
<e c=""><p><l>ազնիվ<s n="adj"/></l><r>fair<s n="adj"/><s n="sint"/></r></p></e>
<e c=""><p><l>ազնիվ<s n="adj"/></l><r>straightforward<s n="adj"/></r></p></e>
<e c=""><p><l>ազնիվ<s n="adj"/></l><r>upright<s n="adj"/></r></p></e>
<e c=""><p><l>ազնիվ<s n="adj"/></l><r>straight<s n="adj"/><s n="sint"/></r></p></e>
<e c=""><p><l>ազնիվ<s n="adj"/></l><r>square<s n="adj"/></r></p></e>
<e c=""><p><l>ազնիվ<s n="adj"/></l><r>decent<s n="adj"/></r></p></e>
<e c=""><p><l>ազնիվ<s n="adj"/></l><r>noble<s n="adj"/><s n="sint"/></r></p></e>

You will need to look out for:



* making sure that comments are put in the comment field


* ensuring that the '<sint>' tag is properly added  -- you can retrieve this information from the morphological analyser of English.




For further information about this task, join us on IRC: #apertium