Populous

Populous is a generic tool for building ontologies from simple spreadsheet like templates. The Populous approach is useful when a repeating ontology design pattern emerges that needs to be populated en-mass. The use of a simple interface, similar to that of a spreadsheet, means that the templates can be populated by users with little or no knowledge of ontology development. Once these teamplates are populated, Populous supports transforming the data into an OWL ontology using a expressive pattern language.

Spreadsheets are currently transformed into OWL/RDF using the Ontology Pre-Processing Language v2 (OPPL). OPPL 2 is a powerful scripting language for generating and manipulating OWL axioms. Populous provides a wizard like interface found in the "Tools" menu to map spreadsheet data to variables in OPPL patterns.

Populous is built on top of RightField. RightField can be used to create Excel spreadsheets that have ontology based restrictions on allowable values in selected cells. RightField spreadsheets allow scientists to annotate their data using standard terminology from ontologies rather than using free text annotations.

Populous and RightField are both open source cross platform Java applications. They use the Apache-POI for interacting with Microsoft documents and manipulating Excel spreadsheets.

tl_files/elico/software/populous/kupo_populous.png
The Populous user interface

Documentation is currently provided by a screen cast Demo of populous in action. There is a set of slides on NaturePrecedings from a recent presentation about Populous given at SWAT4LS 2010 here. The accompanying files used for the demo are provided in the example folder of the downloaded zip file.

The Tutorial in the next tab gives a good introduction into Populous.

Prerequisites

In order to follow this tutorial you will need a copy of the latest version of Populous and the associated tutorial material. These can be downloaded from the downloads page at http://code.google.com/p/owlpopulous.

You will need Populous_v1.1-beta.zip and the material in Populous_tutorial_SWAT4LS_2011.zip. We will refer to the root folder of the tutorial material as $FILES from now on.

Task 1: Start Populous and load the initial workbook

  • Start Populous using the appropriate script
  • Open the basic cell type workbook from $FILES/Workbook/cell_types.xls

You need to load some ontologies into Populous before you can begin to apply ontological restrictions to areas of the workbook. You can load ontologies form your local file system or directly from BioPortal.

NOTE: When working with large ontologies you may need to increase the amount of memory allocated to Populous. You can increase this in the Populous run scripts by increase the value in the –Xmx1000M JVM parameter.

Task 2: Load Some Ontologies

  • Load the cell type ontology from $FILES/Ontology/Input_ontology/cl-redux.owl
  • Select/highlight the cells in column A from rows 2 to 51. We want to restrict these to terms from below the "cell" term in the cell type ontology. Select the "cell" class from the CTO and select “subclasses” from the “type of allowed values” list. Select the “Apply” button to apply this restriction to the selected range in the workbook.
  • The selected cells in column A should change to a green background indicating that a validation has been set. Any terms in column A that match terms in the cell type ontology will be highlighted in green text. Any unmatched terms will appear in red.
  • At this point save the workbook with a new name.

Task 3: Create additional ontological restrictions on the workbook

We want to create some additional restrictions on these cell types. Column D is for asserting superclass information and also contains only cell type terms. Column F is for capturing 'part of' relationships to anatomical terms. We use the UBERON ontology to capture anatomical terms that will be used to restrict the valid terms in this column. Column F and G capture germ line and nucleation information about the cells, we can use the Phenotype and Trait Ontology (PATO) to restrict the values in these columns. Column H captures biological process terms from the Gene Ontology, that are used to describe the function of these cell types. Column I and J capture further information about the cells, including the cell lineage and potentiality.

  • We can use a collection of ontologies to restrict the columns to the appropriate ontological terms.
  • Apply the “subclasses” on cell from the CTO to Column D, row 2 to 51.
  • Load the UBERON ontology from $FILES/Onology/Input_ontology/uberon_redux.owl
  • Select column B, rows 2 to 51.
  • Apply a restriction to all subclasses of ‘anatomical entity’
  • Open the phenotype ontology from $FILES/Onology/Input_ontology/PATO.owl
  • Search the ontologies for “mononucleate”
  • Select the “nucleate quality” class and create a restriction on column G to all subclasses of nucleate quality.
  • Load the gene ontology from $FILES/Onology/Input_ontology/go_daily-termdb.owl and apply a restriction on column H to all subclasses of ‘biological_process’ term.

With the ontology validation loaded we can begin to modify and add new content to the workbook. We can use the auto-complete function on cells in the workbook to assist us in selecting the right terms. For example, for the bladder cells we can begin to add values in Column E to assert "part of" relations between bladder cells and the bladder anatomical region.

Task 4: Working in Excel

Populous generated templates can be exported to the MS excel .xls format so users can work on populating the template using their favourite spreadhseet tool such as MS Excel or OpenOffice.

  • Save the Populous workbook.
  • Open the saved file in either MS Excel or OpenOffice
  • The workbook can be modified like any normal spreadsheet. Drop down lists of terms are provided as validations on cells to assist the user.

Task 5: Converting Workbook to OWL

Once users have populated a workbook, we want to return to Populous to A) validate the content, and B) convert the content to an OWL ontology.

  • Return to Populous and open the modified cell_type.xls workbook. We will use the OPPL wizard to transform the content into an OWL ontology.
  • Start the OPPL wizard from Populous “Tools” menu, when prompted about opening a previous workflow say no.
  • Select the columns from the workbook that you want to transform. For this demo we will select column A, B, D, E, G and H.
  • Select the rows to convert, select start row: 2 and end row: 35. Select Continue.
  • On the next panel we choose the ontologies that will be used to create the new ontology. Any ontology already loaded into p
  • Populous will be shown. You will need to add an additional ontology that contains the skeleton ontology that we will be adding new terms to.
  • Select “Load from file…” and choose $FILES/Ontology/Input_ontology/properties_populous_tutorial_SWAT4LS2011.owl
  • You can specify an Ontology IRI for the newly created ontology e.g. http://www.populous.org.uk/swat4lstutorial/cell_types.owl
  • Set the physical URI for where the new ontology will be saved e.g. $FILES/Ontology/Output_ontology/cell_types.owl

The next panel is for adding OPPL patterns that will be executed to generated the new ontology. The oppl patterns for this tutorial are in $FILES/oppl_script/

  • Load the following oppl patterns:
  1. $FILES/oppl_script/cell_label.oppl
  2. $FILES/oppl_script/subClassOfCell.oppl
  3. $FILES/oppl_script/part_of_Anatomy.oppl
  4. $FILES/oppl_script/cell_label.oppl
  5. $FILES/oppl_script/phenotypic_quality.oppl
  6. $FILES/oppl_script/go_process.oppl
  • Assuming all patterns are validated in green, select continue. If a pattern is invalid you need to check the OPPL pattern syntax is correct. NOTE: At this stage valid patterns that refer to any OWL object such as a class or object property must already be loaded into Populous.

The next stage is to map variables in the OPPL patterns to columns in the workbook. Map the columns as follows:

tl_files/elico/software/populous/populous_step5.png

The next panel deals with new entities. Where possible Populous will use the correct URI from imported ontologies when referring to terms from ontologies already loaded. However, when new/unkown terms are encountered (i.e. ones highlighted in red), Populous will create a new term. The “New Entities” panel allows you specify how a new URI will be created. You can specify the base URI, hash or slash URIs, and an auto number/increment systems. You can also specify if a label should be added using the value in the workbook.

  • Choose to auto generate the id. Check create label and set the new URI value prefix to CTODEV_
  • Select Continue. The OPPL scripts will now be executed against the workbook. Once complete the newly generated ontology will be printed out in Manchester OWL syntax.
  • At this point you can save all the setting used in the ontology generation workflow. This will generate an XML file than can be used again when you want to re-run this workflow.
  • Select Finish to close the OPPL wizard

Task 6: Viewing the generated ontology in Protégé

Once the OPPL wizard has run the new generated ontology will be in $FILES/Ontology/Output_ontology/cell_types.owl. This ontology can now be opened in Protégé for manual inspection. In order to view the newly generated cells in context it is advised to import all the ontologies from $FILES/Ontology/Input_ontology/*.owl. Once imported the ontologies can be classified (recommend HermiT) and we can perfom a DL query such as “cell that participates_in some 'cytokine production'”.

The alpha release of the Populous extension (v0.9) is available here for download.

Populous requires Java 1.6.

  1. Unzip the file
  2. Windows user execute run.bat
  3. Mac/Unix users execute run command