GSoC/GCI Archive
Google Summer of Code 2009

Berkman Center at Harvard University

Web Page:

Mailing List:

The Berkman Center was founded to explore cyberspace, share in its study, and help pioneer its development. We represent a network of faculty, students, fellows, entrepreneurs, lawyers, and virtual architects working to identify and engage with the challenges and opportunities of cyberspace. We investigate the real and possible boundaries in cyberspace between open and closed systems of code, of commerce, of governance, and of education, and the relationship of law to each. We do this through active rather than passive research, believing that the best way to understand cyberspace is to actually build out into it. Our faculty, fellows, students, and affiliates engage with a wide spectrum of Net issues, including governance, privacy, intellectual property, antitrust, content control, and electronic commerce. Our diverse research interests cohere in a common understanding of the Internet as a social and political space where constraints upon inhabitants are determined not only through the traditional application of law, but, more subtly, through technical architecture ("code"). There are several code development projects at the Center, and some of the opportunities to contribute to these are listed on our ideas page. They include the StopBadware project (conducted in partnership with Google), the Internet & Democracy project (surveying global internet freedom), Media Cloud (doing automated content analysis and visualization on the news), and Cohort (a rails-based CRM).


  • Clustering and MediaCloud. I propose two things. First, I would like to develop a suite of clustering algorithms for the MediaCloud project. I am then interested in using this suite to work at resolving a particular problem: determining and comparing the topic clusters for particular media outlets. That is, determining what topics are of greatest interest to, say, Fox News, and how this set of topics differs from that of, say, Daily Kos.
  • Cohort CRM: Extraction of tagging into plugin and further development During my time with the Berkman Center I will work on developing the Cohort CRM Ruby on Rails web application. My primary goal during development will be extracting the already present hierarchical tagging functionality into a plugin, both for reusability and to clean up the application. Since this project is in heavy development I will also be assisting my mentor in developing the application as it goes through a couple releases in the near future.
  • Improvement of System Management and Crawler Components The system management interface or 'dashboard' will help the system administrator monitor tasks, notifications, and general system health. The crawler will support new source/feed discovery and automatically adapt to feed update frequency.
  • Media Cloud Crawling for media content on the web is a special-case of web-crawling. The semantics of the feed (RSS, ATOM etc.) allows us to do a smarter pagination. Extracting story-text from a web-page can be done in many ways - word-density analysis (identify the section of the web-page that contains the highest density of words), page-layout analysis (identify the section of the web-page that contains unique content). Pig Latin can be used to provide rich APIs to the users of Media Cloud.
  • Scriptgen Coding Tool Enhancement The three main tasks under Scriptgen Coding Tool Enhancement are- * Amazon Mechnaical Turk Integration - I propose to achieve this using the Ruby plugin for AWS and the 'Amazon Mech Turk API * Automating Script Generation - This can be achieved using JavaScript Frameworks like and then graceful degradation * Data Reporting - XML generation would be done natively, RSRuby plugin would be used for R language.
  • The StopBadware Project StopBadware Project is an effort to regulate the cyberspace with the aid of community participation., a site already created by for this purpose has just been launched. Additional functionalities can be added to this site help the cause of regulation In this project, I propose to - 1. upgrade existing forums at the 2. create an innovative rating for the posts and community users 3. Release the existing code to open source.
  • Understanding chilling effects in the blogosphere Web blogs have become an important form of web media in recent years. The rising trend of web based applications has turned end users who were mass information consumers to present information producers. For many people around the world, blogging has become a way to attain media stardom, however for some attention comes knocking in the form of a dreadful lawsuit threat that chills legitimitate online activity. More.. (See proposal of project)