GSoC/GCI Archive
Google Summer of Code 2012 Apache Software Foundation

Distribuited mailbox indexing over HBase/HDFS

by Mihai Soloi for Apache Software Foundation

Currently, James mailbox supports email indexing over Lucene, the directory implementation of the Lucene search and indexing relies on relational databases, or file-system storing. As the number of indexes increases with the number of clients using the mailbox so does the performance of the indexing degrade, thus an implementation over a noSQL database like HBase would solve this problem by distributing the indexes and documents on a system designed for high amounts of data.