Logo
Semantic Web Mini-Tutorial: Data integration of life science data

Tabular data to RDF

Raw Data Arrow RDF Arrow Query Arrow Results
(Clickable) Diagram 1: Steps for RDF data querying.

We have created a simple data set to explain the different steps in our approach.

Table 1: Tab-delimited data


chr1   147971248        147972628        0.73
chr1   147972629        147973899        1.9

Using a modification of the Mapper program, we converted the tab-delimited tabular format of the data files into an RDF format that preserved the table structure, column names, and data types. In a future scenario that we anticipate, the data would be already formally described in OWL by the data provider. Lacking such a schema, we construct the data model ourselves.

Table 2: Data in RDF


<?xml version="1.0"?>

<!DOCTYPE rdf:RDF [
   <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">

]>

<rdf:RDF
   xmlns:theirModel="http://staff.science.uva.nl/~lpost/DataModels/TheirENCODEChIPchipDataModel.owl#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema#">

   <theirModel:Datafile rdf:about="encodeSangerChipH3K4me3.txt">
      <theirModel:contains>
         <theirModel:Datafile_row rdf:about="encodeSangerChipH3K4me3.xml/row1">
            <theirModel:chrom>chr1</theirModel:chrom>
            <theirModel:chromStart>147971248</theirModel:chromStart>
            <theirModel:chromEnd>147972628</theirModel:chromEnd>
            <theirModel:score>0.73</theirModel:score>
         </theirModel:Datafile_row>
      </theirModel:contains>
      <theirModel:contains>
         <theirModel:Datafile_row rdf:about="encodeSangerChipH3K4me3.xml/row2">
            <theirModel:chrom>chr1</theirModel:chrom>
            <theirModel:chromStart>147972629</theirModel:chromStart>
            <theirModel:chromEnd>147973899</theirModel:chromEnd>
            <theirModel:score>1.9</theirModel:score>
         </theirModel:Datafile_row>
      </theirModel:contains>
   </theirModel:Datafile>

</rdf:RDF>

After transforming the data to RDF, it is loaded into the Sesame repository of (version 1.2.4). We use the SeRQL-S option to search our example data set using a query to find all the data values associated with the chrom property. In this example, we simply list all items that have chromosome id's.

Table 3: SeRQL-S query


SELECT *
FROM {x} theirModel:chrom {y}

USING NAMESPACE
   theirModel = <http://staff.science.uva.nl/~lpost/DataModels/TheirENCODEChIPchipDataModel.owl#>

You can paste the above query into our repository to try it.

The result can be returned as an HTML page or RDF file. In Sesame, the HTML page gives the results as links to entries in the repository for further data exploration.