We have created a simple data set to explain the different steps in our
approach.
Table 1: Tab-delimited data
chr1 147971248 147972628 0.73
chr1 147972629 147973899 1.9
|
Using a modification of the Mapper
program, we converted the tab-delimited tabular format of the data
files into an RDF format that preserved the table structure, column
names, and data types. In a future scenario that we anticipate, the
data would be already formally described in OWL by the data
provider. Lacking such a schema, we construct the data model ourselves.
Table 2: Data in RDF
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">
]>
<rdf:RDF
xmlns:theirModel="http://staff.science.uva.nl/~lpost/DataModels/TheirENCODEChIPchipDataModel.owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<theirModel:Datafile rdf:about="encodeSangerChipH3K4me3.txt">
<theirModel:contains>
<theirModel:Datafile_row rdf:about="encodeSangerChipH3K4me3.xml/row1">
<theirModel:chrom>chr1</theirModel:chrom>
<theirModel:chromStart>147971248</theirModel:chromStart>
<theirModel:chromEnd>147972628</theirModel:chromEnd>
<theirModel:score>0.73</theirModel:score>
</theirModel:Datafile_row>
</theirModel:contains>
<theirModel:contains>
<theirModel:Datafile_row rdf:about="encodeSangerChipH3K4me3.xml/row2">
<theirModel:chrom>chr1</theirModel:chrom>
<theirModel:chromStart>147972629</theirModel:chromStart>
<theirModel:chromEnd>147973899</theirModel:chromEnd>
<theirModel:score>1.9</theirModel:score>
</theirModel:Datafile_row>
</theirModel:contains>
</theirModel:Datafile>
</rdf:RDF>
|
After transforming the data to RDF, it is loaded into the Sesame repository of
(version 1.2.4). We use the SeRQL-S option to search our example data set
using a query to find all the data
values associated with the chrom property. In this example, we simply
list all items that have chromosome id's.
Table 3: SeRQL-S query
SELECT *
FROM {x} theirModel:chrom {y}
USING NAMESPACE
theirModel = <http://staff.science.uva.nl/~lpost/DataModels/TheirENCODEChIPchipDataModel.owl#>
|
You can paste the above query into our repository to try it.
The result can be returned as an HTML page or RDF file. In Sesame, the HTML page gives the results as
links to entries in the repository for further data exploration.