GSoC/GCI Archive
Google Summer of Code 2009

The Apache Software Foundation

Web Page:

Mailing List: No central list, see the lists of Apache projects at and

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are characterized by a collaborative, consensus based development process, an open and pragmatic software license, and a desire to create high quality software that leads the way in its field. We consider ourselves not simply a group of projects sharing a server, but rather a community of developers and users.


  • [Apache Mahout] Implement parallel Random/Regression Forest My goal is to add the power of random/regression forests to Mahout. At the end of this summer one should be able to build random/regression forests for large, possibly, distributed datasets, store the forest and reuse it to classify new data. In addition, a demo on EC2 is planned.
  • [Mahout] Distributed Latent Dirichlet Allocation Latent Dirichlet Allocation (Blei et al, 2003) is a powerful learning algorithm for automatically and jointly clustering words into "opics and documents into mixtures of topics, and it has been successfully applied to model change in scientific fields over time (Griffiths and Steyver, 2004; Hall, et al. 2008). In this project, I propose to implement distributed LDA using MapReduce, and to investigate extensions of LDA and possibly more efficient algorithms for distributed inference.
  • Add search capability to index/search artifacts in the SCA domain, including the contributions, WSDL/XSDs, java files, composite files The SCA Domain Manager module provides a web application that allows the administrator to browse through the domain and also add/remove contributions/components/nodes from it. The goal of this project is to implement a search functionality for this module so the user can search for the domain artifacts. Every text that can be searched on a SCA Domain will be extracted via introspection and indexed using Apache Lucene.
  • Adding Unicode Normalization support to Xerces2-J This project will design and implement support for Unicode character normalization and normalization checking in Xerces. Applications that use Xerces will be able to produce fully normalized XML documents and verify that any XML documents they process are fully normalised. Documents that have been verified to be fully normalized can have string comparision operations performed on them without having to worry about the many possible forms (with the same meaning) allowed by Unicode.
  • Apache Lenya (lenya-xml-diff) The functionality of creating and displaying diff files is crucial for the most of content management system. Creating lenya-xml-diff module will allow viewing diff between various XML documents (XML, XHTML etc.) and sending diffs by email. It will make user experience more comprehensive and will help to attract new users to Apache Lenya project.
  • Apache ODE Integration in BPELUnit BPEL is an executable language used in web service compositions and Apache ODE is a work flow engine capable of executing BPEL processes.BPELUnit is an open source testing framework used to unit test BPEL processes.Currently BPELUnit has not been integrated with ODE to provide deployment support or measuring unit test coverage. The aim of this project is, therefore, extending the BPELUnit framework for enabling it to package and deploy instrumented process models to ODE.
  • Application for derby-testandfix Apache Derby’s unit tests are being ported over to the new JUnit standard. My goal is to help with this task so that in the future, there is only one framework for tests comprising the whole set. I have previously worked with Apache Derby from a developer’s perspective and I believe this will provide me with great leverage to help this project. My main focus will be the network tests, as these are a showstopper for concurrent testing. When these are converted, a lot of time can be saved with the test runs. Once this is done, I will also be looking at helping with the undergoing bug fixing. I am well acquainted with Derby from my previous experience, and I am also at ease with JUnit. I believe that this, combined with my extensive experience with Java, is the perfect combination to help Derby’s community of developers and users.
  • Camel Dynamic Routes Dynamic routing is considered as a necessary messaging mechanism on enterprise integration pattern(EIP) because it provides flexible and powerful support for building EIP. Currently, most of the message routing mechanisms don't support a run-time route modification,but on Camel,it has been implemented. Camel can change the routes at run-time using XML.In order to improve it for supporting more languages,we will add support for other languages, such as Groovy, Python or Scala.
  • Cocoon 3 monitoring Nowadays, the reliability of web applications is very important. The best way to detect failures early is to monitor. Adding the monitoring feature to Cocoon 3 gives developers and administrators a great tool for detecting errors and failures. Using JMX enables us to: * automatically inform about the occurrence of problems (via notification mechanisms) * collect statistics about the system (eg. load and connections) * inspect: the actual state of cache, settings and perform certain operations
  • Convert current Tomcat valves to Servlet Filters Apache Tomcat is an implementation of the Java Servlet and JavaServer Pages technologies. It is famous and widely used. It is also a component of an Application Server named JBoss Application Server. My task in this project is to convert current Tomcat valve based implementation into Servlet Filters which is consistent with the Servlet Specification.
  • Convert Derby tests to JUnit and fix Derby bugs I am Eranda Sooriyanbandara from University of Moratuwa Department of Computer Science and Engineering Sri Lanka as a undergraduate for the B.Sc Engineering degree.Apache Derby is a java based database management system, people using it for online transaction processing.I did think to enroll in "Derby test and fix". Because I like and have the skill for working with databases. And also on some of our university projects are mainly based on databases. Now it's time do the real task.
  • Create a new user interface for the Apache Qpid JMX Management Console The management console for Qpid’s Java messaging broker lacks usability, making it in many ways no better and often worse than a generic JMX tool like JConsole. Additionally, the implementation is hard to maintain. My aim is to create a new user interface for the console that is more attuned to the operations being undertaken, and more maintainable. In order to do so, some work will be required to refactor the existing console to better separate some of the core functionality from the interface.
  • Devising a strategy to execute cross-partition query in Slice. OpenJPA is a feature rich implementation of the Java Persistence API. Slice is a distributed persistence module for OpenJPA to enable any OpenJPA module to work seamlessly across horizontally partitioned databases. Presently, there are some restrictions on the query, like the related tuples must be in the same partition. In this project we aim to design a strategy to overcome these constraint so that we can "join" the relations even when the related records are not in the same partition.
  • Empowering Google Android applications to easily consume business services The Tuscany Project implements the SCA specifications and easier SOA developments by it's comprehensive infrastructure of components that can be assembled into applications called composites. To make a bright and effective use of Tuscany SCA's opportunities, applications are to be developed on fully featured and OPENED platform. Android Mobile platform provides such features and has been targeted to host tuscany and to demonstrate it capabilities under the "tuscany-host-android" subproject.
  • Enable OSGi features for Harmony JDK OSGi has a lot of advantages due to its ability to manage modules' dependences and the service it provided. Apache Harmony JRE implementation has already been designed to be modularized and least coupled amongest the module . If we could integrate the two together , we will bring great flexibility to the development and update of Harmony.And more importantly this will be more attractive to users for it is more easy to use .
  • Extend the Vysper XMPP Server/Client with the publish-subscribe XEP (XEP-060) This proposal covers the extension of the Vysper XMPP (Jabber) server currently under development on Apache Labs with the XEP-060 "Publish-Subscribe" protocol extension (
  • Generics support for Axis2 code-first development Apache Axis2 has become a very popular framework for implementing web services using Java. Axis2 recently made the move to Java 1.5 from 1.4, to obtain the benefits of the newer JDK. However, it does not yet support a very useful feature introduced in Java 1.5, namely Generics, the desirability of which is made apparent by requests and feedback from the Axis2 community. The project I am proposing will be aimed towards implementing Java Generics support for the Axis2 Web services framework.
  • Implement ANSI/ISO Sequence Generators For Apache Derby This project will add support for ANSI/ISO compliant sequence generators to the Apache Derby RDBMS with the following features. 1. Creating/dropping a Sequence generators with optional clauses 2. Granting and revoking permissions to users or roles to use a sequence generator 3. Invoking a sequence generator 4. Altering an existing sequence generator Also two new system catalogs to store sequence descriptions and usage permission records will be added, along with upgrade logic for older databases.
  • Implement the SOAP over TCP standard supported by Metro and WCF(via plugin) Implement the SOAP optimized TCP transport developed by Sun and supported by Metro and WCF (via an external plugin). The spec is available at: The optimized transport would support using a stateful fastinfoset grammar to optimize the transmissions increasing performance.
  • Implement the SOAP/JMS specification for CXF CXF supports SOAP over JMS, but it does not meet the current draft specification defined at ( and instead uses some proprietary formats, headers, URL formats, etc. This project would update the SOAP/JMS support in CXF to be completely specification compliant. Upon sucessful completion of the SOAP/JMS project, CXF will become one of the very first Open Source implementations of the SOAP/JMS specificiation.
  • Implement WeakReference support in Apache Harmony Concurent GC Apache Harmony has a concurrent GC(Tick) which performs garbage collection without suspending application threads. Tick don’t support weak reference now, it treats all the references as strong references,which may cause inefficiency for some applications.I will add this feature, it would be different from the implementation in gen GC,since the consistency should be maintained for the objects. Read barrier of get method of reference object will be used,and performance issues will be considered.
  • Implementing SQL Authorization Support for Derby dblook Apache Derby is an open source relational database management system implemented in Java. It integrates well with any Java application and it is based on well known standards such as Java, JDBC and SQL. Starting from version 10.2, SQL authorization support was introduced to Derby. But some of the Derby utility tools were not properly updated to cope up with the new feature addition. The objective of this project is to add SQL authorization support to one such tool, namely dblook.
  • Improving Rampart Tests Apache Rampart is the security module of Axis2 which implements the specifications of WS-Security stack. The objective of this Google Summer of Code project is to improve the tests of Rampart in such a way that it covers all possible scenarios including negative scenarios and the feature additions which do not have tests at the moment. Tests for Binding level policy configuration, code generated stubs ,secured MTOM and for negative scenarios will be implemented under this project.
  • Online Classification and Frequent Pattern Mining using Map-Reduce Last summer I worked on implementing a high precision (Complementary) Naïve Bayes classifier for text data. The model building used Hadoop and scaled well for large dataset like Wikipedia. My proposal has two broad objectives. 1. Create an an online + batch classification system for the current NB/CNB Classifier using HBase 2. Implement Parallel FP Growth algorithm and create a tag-tag relationship from the wikipedia dataset using the same.
  • Proposal for harmony-classes-selector The main idea to deal with the subject is to collect infomation from the input first and then generate smallest JRE according to the infomation collected and harmony classes dependences.
  • RAT 1 Cut&Paste Detector I find this project very interesting. There are already several tools which provide finding duplicated code(PMD, Simian...), but neither one of them can not check code on internet. This will be the greatest difference between them and Apache RAT. This is something completely new and I would like to be a part of it.
  • SMS Transport for Apache Axis2 SMS Transport for Apache Axis2 is a project focusing on implementing a SMPP support for Axis2java (And also it will able be used in Apache Synapse too Since Synapse uses the same axis2Transports ) by which Axis2 will be able to communicate with SMSCs (Short message service centers) or Any other Message centers that support SMPP . And also in this project it will make enable Axis2/Synapse to communicate with Simple GSM modems.
  • SSH and SCP support for Commons-Net Apache Commons-Net has client-side implementations for many protocols, but the Secure SHell protocol is not supported. SSHv2 support will be added through this Summer of Code project. The challenge is a clean and extensible implementation that exposes a suitable API, and integrates well with [net]. This work will in turn be utilized to write SCP client classes, and set the ground for future support for SFTP.
  • Supporting Concurrent Exception Handling at Tuscany SCA The major aim of this work is to provide a fault tolerant mechanism for systems based on Service Oriented Architecture, in especially for those developed using the Service Component Architecture. The fault tolerance will be provided using an exception handling mechanism. Such mechanisms, applied in a Service Component Architecture, will explore the scenario where many components, working concurrently, exchange asynchronous messages in order to compose an collaborative activity.
  • Web-based BPEL debugger for Apache ODE Apache ODE(Orchestration Director Engine) is a WS-BPEL compliant business process executable work-flow engine. However ODE currently doesn't support for BPEL debugging and it is very useful to have a web based BPEL debugger for ODE. This project is to develop an web based BPEL debugger which is capable of showing a graphical representation of the process model and allows for adding breakpoints to activities, variable modifications and managing the debugging process for Apache ODE.
  • Web-based management console for ServiceMix Currently, administration of the ServiceMix Kernel OSGi container can be done through an text-based console. While this is a very powerful tool, we would also like to build a web-based management console to administer the platform. One key requirement for this console is that it allows for easy extension by features being installed on top of Kernel, e.g. ServiceMix 4 installs an NMR and JBI layer on top of Kernel, both of which should be administrable seamlessly through the web UI
  • WS-Security support for JAX-WS Web Services To integrate and enable the WS-Security features of Apache Axis2 and Apache CXF in Apache Geronimo.