GSoC/GCI Archive
Google Code-in 2014 Wikimedia Foundation

pywikibot WikibaseSearchItemPageGenerator

completed by: m4tx

mentors: Fabian, John Vandenberg

Pywikibot (PWB) is a Python-based framework to write bots for MediaWiki. See https://www.mediawiki.org/wiki/Manual:Pywikibot for more information. Patches can be submitted via Gerrit (you need a MediaWiki.org account). More documentation on Gerrit can be found at https://www.mediawiki.org/wiki/Manual:Pywikibot/Gerrit. After you have successfully claimed this task in Google Melange please do use the task in Phabricator for communication instead of Google Melange. This allows more PWB developers to be reached! General development questions can be asked on the Pywikibot mailing list at https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l and the #pywikibot IRC channel (see https://www.mediawiki.org/wiki/MediaWiki_on_IRC).

Wikidata is a multilingual collaboratively edited wiki knowledge base that stores structured data in JSON records.  The software is an extension to MediaWiki called Wikibase.  See https://www.wikidata.org/wiki/Wikidata:Introduction and https://www.mediawiki.org/wiki/Wikibase for more information.   Its primary use is as a central store of facts that can be used by all wiki projects.  Each claim (fact) in Wikidata is essentially in the form of property=value.  A property may have many values.  e.g. India 'shares a border' (property P47) with several countries.  Some facts need qualifiers that qualify (add conditions on) when or how the fact is true. e.g. India was a member of the United Nations Security Council - but only between 1950 and 1951.

This task is to

  1. add a 'wbsearchentities' method to DataSite to wrap the Wikibase API action 'wbsearchentities', which searches for items or properties in a MediaWiki site with the Wikibase Repo extension installed.  The list of languages supported by the site should be cached in the DataSite instance and used to validate the parameter to be passed to the API.
  2. create a new generator (essentially a list iterator) tentatively named 'WikibaseSearchItemPageGenerator' that follows the Pywikibot pattern for page generators (see pagegenerators.py), and command line options to allow the generator to be used a data source for scripts.
  3. Investigate using the 'lang' command line option with family to select the language to be used by WikibaseSearchItemPageGenerator when the multilingual family 'wikidata' is being used.  Raise bugs for any problems encountered.

To complete this task, unit tests needs to be added to "tests/site_tests.py" and "tests/pagegenerators_tests.py" to invoke the new APISite method and simulate calling WikibaseSearchPageGenerator from the command line.

The Phabricator tasks are https://phabricator.wikimedia.org/T68949 and https://phabricator.wikimedia.org/T71255 .


Always refer to https://www.mediawiki.org/wiki/Google_Code-in_2014#Instructions_for_GCI_students for general information and phabricator.wikimedia.org for information on specific tasks.