GSoC/GCI Archive
Google Summer of Code 2009 The Apache Software Foundation

Adding Unicode Normalization support to Xerces2-J

by Richard Kelly for The Apache Software Foundation

This project will design and implement support for Unicode character normalization and normalization checking in Xerces. Applications that use Xerces will be able to produce fully normalized XML documents and verify that any XML documents they process are fully normalised. Documents that have been verified to be fully normalized can have string comparision operations performed on them without having to worry about the many possible forms (with the same meaning) allowed by Unicode.