GSoC/GCI Archive
Google Summer of Code 2015

Global Alliance for Genomics & Health

License: Apache License, 2.0

Web Page:

Mailing List:

The Global Alliance for Genomics and Health (Global Alliance) is an international coalition, comprised of over 200 institutional members from over 25 countries, that is dedicated to improving human health by maximizing the potential of genomic medicine through effective and responsible data sharing. The promise of genomic data to revolutionize biology and medicine depends critically on our ability to make comparisons across millions of human genome sequences, but this requires coordination across organizations, methods, diseases, and even countries. The members of the Global Alliance for Genomics and Health are working together to create interoperable approaches and catalyze initiatives that will help unlock the great potential of genomic data.

Since its formation in 2013, the Global Alliance for Genomics and Health is leading the way to enable genomic and clinical data sharing. The Alliance’s Working Groups are producing high-impact deliverables to ensure such responsible sharing is possible, such as developing a Framework for Data Sharing to guide governance and research and a Genomics API to allow for the interoperable exchange of data. The Working Groups are also catalyzing key collaborative projects that aim to share real-world data, such as Matchmaker Exchange, Beacon Project, and BRCA Challenge.


  • Extend Python HGVS and UTA packages The hgvs package in python provides an easy-to-use tool to parsing, formatting and manipulating variants. However, manipulating more types of variants is needed. The package would be more useful if it could perform variant canonicalization. The hgvs package utilizes UTA database to mapping and validating variants. Things would be easier when the UTA provides a REST interface to query. This would make the UTA database more convenient to use and reduce package dependencies of hgvs.
  • Implementing C API for the Global Aliance for Genomics and Health In this project we want to have C/C++ code that allow us to add directly GA4GH APIs functionality to any other software or library with the same language. We will add this code to existing htslib library used for many open source softwares.
  • Interactive Visualization of Genetic Data Our DNA are 99.9% identical, but we are all unique. To see why that's the case, GA4GH has implemented an API that allows you to search for these human genomic data to see how the role these 0.1% play in determining our phenotypes and diseases. Now, we want to make these data more interpretable by building a visualization platform with interactive graphs. The application would allow users to better utilize and generate insights from variant information available on GA4GH.