Friday, May 08, 2015


KaBOB: ontology-based semantic integration of biomedical databases

KaBOB recent paper describes how a mashup have been created using 14 ontologies and 18 data sources converted to RDF, all loaded into a triplestore which is not made public. Great work, a mashup well designed based on ontologies and data normalization a quality standard never really put into Bio2RDF's triplestores. Nice work but not available to the bioinformatician community and it is a lot of work to rebuild it from scratch.

The first step of my hackhaton project is to rebuil such a mashup from the dame data collection and expose it on the web as linked data, I will use the namespace for it.

In the past I would have created a triplestore for it, Virtuoso can easily handle 500 millions triples beast. I will try differently and will use Elasticsearch instead and Kibana as a user interface available at

KaBOB currently imports the following 14 ontologies:

1. Basic Formal Ontology (BFO) [9]
2. BRENDA Tissue / Enzyme Source (BTO) [10]
3. Chemical Entities of Biological Interest (ChEBI) [11] (54,838 from ONTOBEE)
4. Cell Type Ontology (CL) [12]
5. Gene Ontology including biological process, molecular function, and cellular component
(GO) [7] (42,807 from ONTOBEE)
6. Information Artifact Ontology (IAO) [6]
7. Protein-Protein Interaction Ontology (MI) [13]
8. Mammalian Phenotype Ontology (MP) [14]
9. NCBI Taxonomy [15]
10. Ontology for Biomedical Investigation (OBI) [16]
11. Protein Modification (MOD) [17]
12. Protein Ontology (PR) [18]
13. Relation Ontology (RO) [19]
14. Sequence Ontology (SO) [8]

KaBOB currently imports data from the following 18 data sources:

1. Database of Interacting Proteins (DIP) [20]
2. DrugBank [21] (19,844 from Bio2RDF)
3. Genetic Association Database (GAD) [22] ()
4. UniProt Gene Ontology Annotation (GOA) [23]
5. HUGO Gene Nomenclature Committee (HGNC) [24] (43,407 from Bio2RDF)
6. HomoloGene [25] (18,712 from Bio2RDF)
7. Human Protein Reference Database (HPRD) [26]
8. InterPro [27]  (25,272 from Bio2RDF)
9. iRefWeb [28]
10. Mouse Genome Informatics (MGI) [29] ()
11. miRBase [30]
12. NCBI Gene [31] (47,728 from Bio2RDF)
13. Online Mendelian Inheritance in Man (OMIM) [32] (14,609 from Bio2RDF)
14. PharmGKB [33] ()
15. Reactome [34] ()
16. Rat Genome Database (RGD) [35]
17. Transfac [36]
18. UniProt [37] (124,567)

In red is the number of document/graph loaded in ES.

Data source :


Uniprot :

and Bio2RDF corresponding SPARQL endpoints.


Unknown said...

Nice blog about data integration. If you want to know more about integration techniques and methods, please contact us from the links below

There are many references or sources to learn about informatica interview questions and answers for experienced. But this is something that is quite perfect making sure all candidates get to know as well as learn every related areas and topics all at ease. The informatica interview questions are all prepared and planned by all experienced professionals associated to this profession.

pankaj karnwal said...

Nice article thanks for sharing with us.
rpsc ras answer key 2016
ras answer key 2016
ssc cgl 2016 answer key
gate 2017 application form
ctet answer key