KaBOB: ontology-based semantic integration of biomedical databaseshttp://www.biomedcentral.com/1471-2105/16/126/abstract
KaBOB recent paper describes how a mashup have been created using 14 ontologies and 18 data sources converted to RDF, all loaded into a triplestore which is not made public. Great work, a mashup well designed based on ontologies and data normalization a quality standard never really put into Bio2RDF's triplestores. Nice work but not available to the bioinformatician community and it is a lot of work to rebuild it from scratch.
The first step of my hackhaton project is to rebuil such a mashup from the dame data collection and expose it on the web as linked data, I will use the kabob.bio2rdf.org namespace for it.
In the past I would have created a triplestore for it, Virtuoso can easily handle 500 millions triples beast. I will try differently and will use Elasticsearch instead and Kibana as a user interface available at http://melina.bio2rdf.org.
KaBOB currently imports the following 14 ontologies:
1. Basic Formal Ontology (BFO) [9]
2. BRENDA Tissue / Enzyme Source (BTO) [10]
3. Chemical Entities of Biological Interest (ChEBI) [11]
(54,838 from ONTOBEE)
4. Cell Type Ontology (CL) [12]
5. Gene Ontology including biological process, molecular function, and cellular component
(GO) [7]
(42,807 from ONTOBEE)
6. Information Artifact Ontology (IAO) [6]
7. Protein-Protein Interaction Ontology (MI) [13]
8. Mammalian Phenotype Ontology (MP) [14]
9. NCBI Taxonomy [15]
10. Ontology for Biomedical Investigation (OBI) [16]
11. Protein Modification (MOD) [17]
12. Protein Ontology (PR) [18]
13. Relation Ontology (RO) [19]
14. Sequence Ontology (SO) [8]
KaBOB currently imports data from the following 18 data sources:
1. Database of Interacting Proteins (DIP) [20]
2. DrugBank [21] (19,844 from Bio2RDF)
3. Genetic Association Database (GAD) [22] ()
4. UniProt Gene Ontology Annotation (GOA) [23]
5. HUGO Gene Nomenclature Committee (HGNC) [24] (43,407 from Bio2RDF)
6. HomoloGene [25] (18,712 from Bio2RDF)
7. Human Protein Reference Database (HPRD) [26]
8. InterPro [27] (25,272 from Bio2RDF)
9. iRefWeb [28]
10. Mouse Genome Informatics (MGI) [29] ()
11. miRBase [30]
12. NCBI Gene [31] (47,728 from Bio2RDF)
13. Online Mendelian Inheritance in Man (OMIM) [32] (14,609 from Bio2RDF)
14. PharmGKB [33] ()
15. Reactome [34] ()
16. Rat Genome Database (RGD) [35]
17. Transfac [36]
18. UniProt [37] (124,567)
In red is the number of document/graph loaded in ES.
Data source :
OBO : http://www.ontobee.org/sparql
Uniprot : http://beta.sparql.uniprot.org/sparql
and Bio2RDF corresponding SPARQL endpoints.