GSoC/GCI Archive
Google Code-in 2013 Apertium

Create a program to generate a flex lexer from an XML description

completed by: Dalimil Hájek

mentors: Mikel L. Forcada, Francis Tyers, Kirill Krylov

Given an XML file with definitions like:



  <def-cat n="noun">

    <cat-item tags="n.*"/>


  <def-cat n="adjec">

    <cat-item tags="adj.*"/>

    <cat-item tags="vblex.pp.*"/>




Create a lexer that looks something like:


^[a-zA-Z]\+<n>\(<[a-zA-Z0-9_]\+>\)/[a-zA-Z]\+\(<[a-zA-Z0-9_]\+>\)$ { return noun; }

^[a-zA-Z]\+<adj>\(<[a-zA-Z0-9_]\+>\)/[a-zA-Z]\+\(<[a-zA-Z0-9_]\+>\)$ { return adjec; }

^[a-zA-Z]\+<vblex><pp>\(<[a-zA-Z0-9_]\+>\)/[a-zA-Z]\+\(<[a-zA-Z0-9_]\+>\)$ { return adjec; }