GSoC/GCI Archive
Google Summer of Code 2012

Berkman Center for Internet & Society at Harvard University

Web Page:

Mailing List:

The Berkman Center's mission is to explore and understand cyberspace; to study its development, dynamics, norms, and standards; and to assess the need or lack thereof for laws and sanctions. We are a research center, premised on the observation that what we seek to learn is not already recorded. Our method is to build out into cyberspace, record data as we go, self-study, and share. Our mode is entrepreneurial nonprofit. Our faculty, fellows, students, and affiliates engage with a wide spectrum of Net issues, including governance, privacy, intellectual property, antitrust, content control, and electronic commerce. Our diverse research interests cohere in a common understanding of the Internet as a social and political space where constraints upon inhabitants are determined not only through the traditional application of law, but, more subtly, through technical architecture ("code").


  • A Distributed Architecture to Stream Twitter and Sina Weibo Microblog Posts This project comprises two parts. For the first third of the summer, I will implement a core architecture for streaming Twitter data, detecting Tweets related to censorship, and extracting the URLs or domain names that are censored. For the remaining two-thirds of the summer, I will focus on three extension goals. First, I will develop a semi-automated learning algorithm that updates the follow and track parameters on each stream on a daily basis in order to capture more censorship-related Tweets. Second, I will duplicate the Twitter architecture for Sina Weibo. I can implement the technical model quickly, but I will need to consult language experts to ensure that I sample the microblog stream correctly. Third, I will extend Herdict's goal of crowdsourcing censorship monitoring by developing a web form similar to the Herdict Reporter test form, which allows users to test whether sites are censored in their region. The technology stack includes Python, MongoDB, Redis, and Ruby on Rails.
  • An examination of how linkage networks in media sites predicts readership levels. My project consists of three parts. The first part of this project adds link tracking between news sites, and the second part adds network visualization and analysis to the media cloud code. After programming these, I will create code allowing for longitudinal analysis of how hyperlink networks predict changes in readership trends of various news sites. Examining this will allow for descriptive analysis of how html linked networks change, and give insight as to why some blogs and news sites become more popular over time.
  • Check-in Check-out Asset Tracker Plugin for Redmine The aim of the summer project is to build and enhance the 'AssetTrackerPlugin' for the Redmine project management web application.
  • CiteProc-Ruby Developing a Citation Style Language (CSL) processor and API in Ruby.
  • Create a jQuery mobile interface for TagTeam The aim of the project is to create a mobile interface for JSON API built into TagTeam with the help of jQuery Mobile.
  • Data Portraits The goal of the project is to develop a series of visualizations of people based on their digital data.
  • Fair Use Tool Revamping Fair Use Tool is an interactive online tool that will be accessible for middle-school students and their teachers. The main goal of the tool is designing and coding web application with Interactive and User-friendly Interface for teaching a playful way of how to use copyrighted content in accordance with fair use policy.
  • Implementation of PageOneX project for an online platform PageOneX is an innovative approach to the analysis and visualization of front page newspaper coverage, that enables communities and advocacy groups to track certain news threads in a easy and visual way. The project makes possible the comparison between two important pieces of the mass media and the Social Media: newspaper front pages and Twitter.
  • Interface 3D Model Inputs via Kinect to Zeega This project aims at developing a system with a web interface which can be used to create 3D models of various objects via Kinect and interface them to Zeega through its API. 1) Developing an intuitive web interface with instructions to use the kinect 2)Scanning of objects via kinect to create 3D models in the form of .ply/.png files 3)Output 3D model to MeshLab for cleaning up of mesh and develop into a robust manipulable file 4)The resulting output file will be sent to the web interface which in turn will be exposed to Zeega via its API
  • Media Cloud multi-language support via plug-in infrastructure, plus an automatic stemmer and a stopword list generator I propose adding multi-language support to Media Cloud in such a way that the new languages (French, German, ...) could be added as plug-ins (adding a new language would not involve further modifications to the core code). Additionally, the process of adding a language would also be made simpler by providing automatic (experimental) stopword list and stemmer generators.
  • Paper Machines My proposal for Paper Machines seeks to render visible the hidden connections within a large textual corpus by interweaving and extending pre-existing textual analysis programs and methodologies. I aim not only to produce scripts and visualizations tailored to Dr. Guldi's data set, but also to develop ways of working that will be useful to other scholars (including myself) interested in visualization as an aid to analysis.