GSoC/GCI Archive
Google Summer of Code 2014

R Project for Statistical Computing

License: Academic Free License 3.0 (AFL 3.0)

Web Page:

Mailing List:

R is a free software environment for statistical computing and graphics. 


  • animint animint is an R package for creating web-based interactive graphics based on ggplot2 syntax.
  • bdvis: Biodiversity data visualizations Package bdvis is already under development and was part of GSoC 2013. Right now the package has basic functionality to perform biodiversity data visualizations, but with growing user base for the package, requests for features are coming up. We propose to add the user requested functionality and implement some new visualization functions to take bdvis to next level. We also plan to prepare a detailed vignette and submit the package to CRAN.
  • Dimension Reduction Methods for Multivariate Time Series Multivariate time series are ubiquituous within macroeconomic forecasting. The vector autoregression, the canonical modeling approach, is heavily overparameterized and is intractible in high dimensions. Our project aims to create an easily accessible R package which allows for the estimation of high-dimensional vector autoregressions by incorporating dimension reduction methods from the statistical regularization literature into a multivariate time series setting.
  • Extending mzR The mzR R/Bioconductor package provides a unified application programming interface to the common open and community-driven file formats and parsers available for mass spectrometry data, namely mzXML, mzML and mzData. Currently, mzR provides two back-ends to read mass spectrometry raw data: netCDF and RAMP. The goal is to extend current useful, yet limited capabilities of mzR by adding support for the state-of-the-art proteowizard project.
  • Generalized Linear Mixed Models via Monte Carlo Likelihood Approximation My R package will fit Generalized Linear Mixed Models via Monte Carlo Likelihood Approximation.
  • GenPCA: A Generalized PCA Toolkit for High-dimensional Data Analysis in R The standard principal component analysis (PCA), a popular tool in traditional multivariate analysis, is no longer suitable for the data analysis with the increasing complexity of modern data acquirement. In this project, our goal is to develop an efficient and scalable package for Sparse PCA and Robust PCA, which are two important extensions of the standard PCA. We describe the design and the implementation plan of our project, and propose a timeline for the development of our project.
  • Improving the R-interactive-Graphics-via-HTml (RIGHT) Package This project aims to improve the R Interactive Graphics via HTml (RIGHT) package that enables interactive data visualization and analysis to explore data and gain valuable insight. We plan to improve RIGHT in two ways. One is to execute R code for analysis on a server, overlay the results and update them interactively in response to hiding data.The other is to support ggplot2-like R API to create complex visualization more easily.
  • Kernel Density Estimation and Nonparametric Discriminant Analysis for FD The analysis of functional data has received considerable attention over recent years. It arises when the response variable of interest can be measured continuously, often over time. The aim of the project is to develop a package in R to produce kernel density estimates and statistical discrimination rules based on samples of functional data. Such functions, which utilise the true nature of functional data, are currently not available.
  • pbdPROF: Profiling Tools for High Performance Computing with R This proposed project will extend the pbdPROF package developed in last year’s GSoC for profiling MPI in R. This year, both the general purpose profiler, PAPI, as well as a useful portion of one of the largest profilers, TAU, will be added to pbdPROF. Each of these are state of the art performance analysis tools and will make profiling in R better than ever.
  • PhyloVS: phylogeny-constrained regularization and variable selection Many diseases have been shown to be associated with the disorder of the human microbiome using next generation sequencing. Human microbiome biomarker discovery has attracted increasing attention. We propose to use sparse regression model to achieve variable selection while accounting for the phylogenetic relationship among microbial taxa. Our 'PhyloVS' package aims at providing a fast and scalable toolkit to facilitate microbiome biomarker discovery.
  • PortfolioAnalytics_Ross_Bennett This document is my proposal to work on the PortfolioAnalytics R package for Google Summer of Code 2014. This proposal contains information about my background and my plan for what I will accomplish during GSoC 2014 for contributions to PortfolioAnalytics.
  • Proposal_FactorAnalytics_Sangeetha_Srinivasan This is a proposal to improve the functionality, usability, reliability and documentation of the factorAnalytics package.
  • rOptManifold: An R Package for Optimization over Matrix Manifolds Many machine learning and statistical problems can be boiled down to finding an optimizer over certain manifold, such as in procrustes problems, independent component analysis, and matrix approximation. This project aims to build an R package to solve such optimization problems over a variety of commonly seen matrix manifold structures. Both the commonly used and some newly developed algorithms will be implemented in the R package.
  • Spot volatility estimation: Methods and applications Spot volatility is a measure of the uctuation in returns on financial assets. The optimal estimation of spot volatility is important in high-frequency trading signal generation and risk management. Several spot volatility estimators have been proposed in academic literature over the last few years. This project aims to implement them in the R package highfrequency.
  • Tools for composite index analysis The existing package CItools have included necessary tools/functions to construct composite index CIP. More functions related to multivariate analysis, visualization can be included. Also facility to perform action on user's data and construct different other index and devepoment of GUI for non-R users needs to be done.
  • Tools for pre and post processing of data for Ecological niche models Ecological niche modeling technique has gained popularity in estimating species distributions. Package ‘dismo’ in R has implemented various niche modeling algorithms. However, processing of input and output data are done outside R, especially in ArcGIS. Package ENMGadgets facilitates users to prepare data required for niche models in R environments with few functions like DistanceFilter, PCARaster, PCAProjection, etc. We are proposing to strengthen ENMGadgets by providing few more functions.
  • Turning R objects into Pandoc's markdown This is a proposal for improving the Pander package for markdown rendering of R objects. During this GSoC session I want to focus on 3 things: 1. create new pander methods for not yet supported R classes 2. extend Pandoc.table to support configurable width. 3. refactor existing code base in particular brew function, improve performance of pandoc.table and extend existing test suite.