It is a good occasion to explore new avenue to expose RDF biological knowledge in the big data era. So let's try Elasticsearch... (https://www.elastic.co/products/elasticsearch)
it is free, fast and it scale. This would not be doable without the recent availability of the RDF version format in JSON, the JSON-LD project (http://json-ld.org/).
I will use the JSON-LD converter written by Peter Ansell, one of the major contributor to Bio2RDF, (https://github.com/jsonld-java).
So let's try to load some of Bio2RDF triples into ElasticSearch ! I have 24 hours to explore this new approach.
Here is what we will try to achieve :
RDF2ES : Bring KaBOB online as RDF REST services using ElasticSearch
- Description. KaBOB is a semantic integration of 18 different biomedically relevant knowledge sources. The linked paper describes processes for instantiating it as RDF, but does not provide a functional implementation. This is likely because of the significant challenges involved in stably hosting a very large SPARQL endpoint. Perhaps SPARQL isn’t the best way to share this content. This project is to figure out a way to the useful data integration work done in kaBOB available via a set of web services that are both fast and reliable. Willing to sacrifice some of the flexibility of a full sparql endpoint to gain a functional app. Perhaps using Elastic Search.
- First we will load part of Kabob data source for human into an ElasticSearch cluster. (OMIM, GO, CHEBI, Drugbank, OBO ontologies, Reactome, Uniprot and entrez gene)
- Second we will build REST services to access it, there will be available for hacking.
- Third we will explore this data using Kibana tool.
- Finally, we will illustrate how a Talend workflow consuming RDF data can replace a complex SPARQL query. The querying workflow will be exposed at MyExperiments.
- input. Instructions for integrating 18 different biological data sources + code at: https://github.com/UCDenver-ccp/datasource https://github.com/drlivingston/kr https://github.com/drlivingston/kabob I will use bio2rdf version of kabob selected dataset.
If someone has access to Kabob RDF data, we could load it into ES triplestore.
We will try to create a type ahead user experience over those dataset, a feature that Bio2RDF have always been missing. (bio2rdf.org)
Finally, we will explore the data visualisation potential of the Kabina tool over ElasticSearch data in JSON-LD format.