GSoC/GCI Archive
Google Summer of Code 2012

PostgreSQL Project

Web Page: http://www.postgresql.org/developer/summerofcode

Mailing List: http://archives.postgresql.org/pgsql-students/

The PostgreSQL Project develops the most advanced relational database in the world. With a long-time feature set incorporating advanced SQL and many unique extensions of the database engine, our database project is seeking students and other contributors who want to extend it in new and novel ways.  In addition to the core database, we invite students to apply to work on tools and drivers.

Projects

  • Better indexing for ranges Range types are singnificant new feature of PostgreSQL. Indexing of range types is necesary to provide efficient search of ranges. The currently implemented for 9.2 indexing approach for GiST holds ranges in both internal and leaf pages entries. This approach could be very inefficient in the case of highly overlapping ranges and "@>", "<@" operators, because cost of search is similar to cost of search using "&&" operator. Mapping ranges into 2d-space could handle such cases much more efficiently. This project is focused on implementation 2d-space mapping based GiST and SP-GiST operator classes for range types.
  • Document Collection Foreign Data Wrapper Document collection FDW allows users to map an entire directory of documents (e.g. Reuters Corpora RCV1) as a single foreign table in PostgreSQL database. The FDW supports building inverted index and postings file in a user specific location. And then using the index and postings file to support various types of information retrieval tasks such as boolean retrieval, vector space model (VSM) with tf-idf (term frequency - inverted document frequency) weighting schemes.
  • FDW(Foreign Data Wrapper) that wraps JDBC based on PL/Java PostgreSQL has numerous FDWs (Foreign Data Wrapper), which connect to different databases. My project aims at creating a FDW that shall wrap JDBC and can be used to access any database that can be accessed through a JDBC URL.
  • Implementing TABLESAMPLE clause for PostgreSQL TABLESAMPLE is an interesting sql clause. It is defined in SQL standard 2003. An example is SELECT avg(salary) FROM emp TABLESAMPLE SYSTEM (50); It will return a sample of the underlying table of which the size depends on the number specified in the bracket. The SQL standard of TABLESAMPLE clause can be found at <http://www.neilconway.org/talks/hacking/ottawa/sql_standard.pdf>. Microsoft SQL Server and DB2 have implemented this clause. Querying a sample of a table is often occurring in people’s work. Looking at the page: <www.almaden.ibm.com/cs/people/peterh/idugjbig.pdf>. In page 1 and 2, the author described the benefits and usage of a fast sampling method towards the discovering general trends and patterns in data. It will be useful for PostgreSQL to implement this feature and make it available to the users.