GSoC/GCI Archive
Google Summer of Code 2011 haskell.org

Convert the text package to use UTF–8 internally

by Jasper Van der Jeugt for haskell.org

For Haskell projects handling unicode text, the Text library offers both speed and simplicity-of-use. When it was written, benchmarks indicated that UTF-16 would be a good choice for the internal encoding in the library. However, these (rather artificial) benchmarks were did not take into account the time taken to 1) decode the "Real World" data and 2) encode it to write it back. I propose to 1) benchmark and 2) convert the library to UTF-8 if it is a faster choice for "Real World"-applications.