GSoC/GCI Archive
Google Summer of Code 2012 Berkman Center for Internet & Society at Harvard University

Media Cloud multi-language support via plug-in infrastructure, plus an automatic stemmer and a stopword list generator

by Linas Valiukas for Berkman Center for Internet & Society at Harvard University

I propose adding multi-language support to Media Cloud in such a way that the new languages (French, German, ...) could be added as plug-ins (adding a new language would not involve further modifications to the core code). Additionally, the process of adding a language would also be made simpler by providing automatic (experimental) stopword list and stemmer generators.