The Histone code case:
A Semantic Web approach to Data integration

In the context of the Virtual Laboratory for e-Science project, we  investigate a new integrative approach based on Semantic Web technology to elucidate the complex relationship between the 'histone code', DNA sequence, and gene expression.

Biological background: the Histone Code case

Histones are proteins that pack DNA into higher order structures and influence processes such as transcription, repair and replication of DNA. They have been implied in diseases such as cancer (e.g. Santos-Rosas and Carldas, 2005) and Huntington's Disease (e.g. Steffan et al., 2001). The histones can undergo several distinct post-translational chemical modifications including acetylation, methylation, phosphorylation and ubiquitylation. These modifications can be passed on to subsequent generations and are hypothesized to govern the transcriptional state of the genome. Because series of histone modifications form patterns that can be recognized or acted on by other proteins, it is believed that these patterns form a 'histone-code' on top of the DNA code (Peterson and Laniel, 2004, Strahl and Allis, 2000). An intricate relationship between the histone code, transcriptional activity, and DNA sequence is suspected, but poorly understood. Combining knowledge and various sources of data related to these facets may provide the tools to reach a higher level of understanding.

A Semantic Web approach to data integration

From the computer science point of view, we investigate the application Semantic Web technology for the integration of heterogeneous data sources in an environment for computational experimentation, an 'e-(bio)science' laboratory. We have defined a strategy for the annotation of biological data with domain-specific ontologies that implies sophisticated integration of data and knowledge and the formation of a histone-code knowledge model. We have built our own ontology for histones, 'HistOn', that covers the necessary concepts and levels of granularity. The flexibility of Semantic Web formats such as the Web Ontology Language (OWL) can be used to merge and extend ontologies as required by a computational experiment. To test our strategy we want to see if it can help to answer questions like: Is there a relationship between histone modifications and transcription factor binding? Therefore, we have used our approach to integrate two datasets from the UCSC genome browser: Transcription Factor Binding Sites and binding density of modified histone H3K4Me3.

Detailed explanation with data

People

Lennart Post is a PhD student at IBU and the Nuclear Organisation Group headed by Prof. Dr. Roel van Driel. Biochemist by education, his role is to apply and co-develop e-science technology for the histone code case study.

Dr. M. Scott Marshall
M. Scott Marshall is a postdoctoral researcher at IBU. Computer scientist by education, his role is to develop a 'semantic framework' that underlies a virtual laboratory for integrative bioinformatics. Scott is active for IBU in W3C's Semantic Web Health Care and Life Sciences Interest Group (HCLSIG).
Marco Roos
Marco Roos is a postdoctoral researcher at IBU. He is molecular cytologist by training with additional training in computer science. With a background in chromatin research and data integration his role is to bridge the gap between e-science developments and case studies related to DNA function.

Acknowledgements

This work is carried out in the context of the Virtual Laboratory for e-Science project (www.vl-e.nl). This project is supported by a BSIK grant from the Dutch Ministry of Education, Culture and Science (OC&W) and is part of the ICT innovation program of the Ministry of Economic Affairs (EZ).