Friday, May 08, 2015

Bio2RDF 10th birthday this year, and I am back on the biohacking road

This weekend is the first biohackathon about BD2K in San Diego:

https://github.com/Network-of-BioThings/nob-hq/wiki/1st-BD2K-3rd-Network-of-BioThings-Hackathon

It is a good occasion to explore new avenue to expose RDF biological knowledge in the big data era. So let's try Elasticsearch... (https://www.elastic.co/products/elasticsearch)

it is free, fast and it scale. This would not be doable without the recent availability of the RDF version format in JSON, the JSON-LD project (http://json-ld.org/).

I will use the JSON-LD converter written by Peter Ansell, one of the major contributor to Bio2RDF, (https://github.com/jsonld-java).

So let's try to load some of Bio2RDF triples into ElasticSearch ! I have 24 hours to explore this new approach.

Here is what we will try to achieve :

  1. RDF2ES : Bring KaBOB online as RDF REST services using ElasticSearch

    1. Description.  KaBOB is a semantic integration of 18 different biomedically relevant knowledge sources.  The linked paper describes processes for instantiating it as RDF, but does not provide a functional implementation.  This is likely because of the significant challenges involved in stably hosting a very large SPARQL endpoint.  Perhaps SPARQL isn’t the best way to share this content.  This project is to figure out a way to the useful data integration work done in kaBOB available via a set of web services that are both fast and reliable.  Willing to sacrifice some of the flexibility of a full sparql endpoint to gain a functional app.  Perhaps using Elastic Search.
      1. First we will load part of Kabob data source for human into an ElasticSearch cluster. (OMIM, GO, CHEBI, Drugbank, OBO ontologies, Reactome, Uniprot and entrez gene)
      2. Second we will build REST services to access it, there will be available for hacking.
      3. Third we will explore this data using Kibana tool.
      4. Finally, we will illustrate how a Talend workflow consuming RDF data can replace a complex SPARQL query. The querying workflow will be exposed at MyExperiments.
    2. input.  Instructions for integrating 18 different biological data sources + code at: https://github.com/UCDenver-ccp/datasource https://github.com/drlivingston/kr https://github.com/drlivingston/kabob I will use bio2rdf version of kabob selected dataset.

      If someone has access to Kabob RDF data, we could load it into ES triplestore.
output. web services that provide useful answers to questions about genes, biological process, and diseases, Those REST services will be created the way Bio2RDF API have been done, they are generated using Talend ESB tool (http://bio2rdf.org/test) and virtuoso triplestore will be replaced by ES storage.

We will try to create a type ahead user experience over those dataset, a feature that Bio2RDF have always been missing. (bio2rdf.org)

Finally, we will explore the data visualisation potential of the Kabina tool over ElasticSearch data in JSON-LD format.

Sunday, September 11, 2011

Bio2RDF: moving forward as a community


 Last week we held our first virtual meeting towards re-invigorating the Bio2RDF project with a significantly larger and vested community. From discussions, we plan to establish 3 focus groups around :

A. policy (information, governance, sustainability, outreach)
B. technical (architecture, infrastructure and RDFization)
C. social (user experience and social networking)

The next step then is for groups to:
1. identify and certify discussion leads (responsibilities: set meeting times and agenda, facilitate and encourage discussion among members, draft reports)
2. identify additional people to recruit from the wider community that would provide additional expertise (interested, but didn't attend the first? sign up now !)
3. extend and prioritize discussion items (what exactly will this group focus its efforts on in the short and long term)
4. identify and assign bite-sized tasks (so we can get things done one step at a time :)
5. collate results and present to the wider community

I suggest that groups self-organize a first meeting in the next two weeks to deal with items 1-4, and either meet again or use the Google documents to collaboratively report findings.

Finally, I'd like for us to hold another meeting with times that are much more accommodating for Europe + North America ;)  Please fill the doodle poll (http://www.doodle.com/fsuz6mgs5cztf2e2)
As always, feel free to contact me if you have any questions, and please sign up to the Bio2RDF mailing list for all future discussions.

Wednesday, October 20, 2010

Tuesday, October 05, 2010

Bio2RDF return to Japan

Bio2RDF is returning in Japan again this year. We will give a talk about Bio2RDF at Biocuration 2010 . Biocuration is from October 11th to October 14th at Odaiba, Tokyo.

Wednesday, February 10, 2010

Bio2RDF Cognoscope presentation at BioHackathon 2010 in Tokyo

François Belleau from the Bio2RDF project was invited as an early Semantic Web technology adopter to present the Bio2RDF project at the annual BioHackathon 2010 held each year in Tokyo.

Monday, November 30, 2009

Registry for original provider HTML pages

If you weren't aware, the Bio2RDF project offers both RDF, and a service that redirects to either HTML, images, or other non-RDF sources that could be useful.

The HTML redirect service is particularly useful, because one can start at the Bio2RDF page, and follow a link that looks like "http://bio2rdf.org/html/namespace:identifier", to get to the original providers web page.

There are currently 142 namespaces that are registered along with HTML pages. Examples of these links are, the NextBio page for Amyloid Beta precursor protein (http://bio2rdf.org/nextbio:1445), the NCBI Entrez Geneid page for Superoxide dismutase 1 (http://bio2rdf.org/geneid:6647), the Pharmgkb page for Superoxide dismutase 1(http://bio2rdf.org/pharmgkb:PA334), and the HGNC page for Superoxide dismutase 1 (http://bio2rdf.org/hugo:SOD1).

The list below, details the namespace prefixes that are currently registered with Bio2RDF for this service. A full set of details about what services are provided for any particular namespaces are provided at here, and the entire RDF configuration that makes the Bio2RDF system work is available here (RDF/XML)

aceview, agi_locuscode, arrayexpress, asap, aspgd, aspgd_locus, aspgd_ref, bind, biogrid, biomodels, biopatml, biosystems, brenda, cas, cath, ccds, cdd, cgd, cgsc, chebi, chemidplus, cid, citations, cog, cpath, cpd, dbpedia, dbsnp, ddbj, dictybase, dictybase_trials, dip, doi, dr, drugbank_drugs, ec, echobase, eck, ecogene, embl, ensembl, enzyme, flybase, gdb, genbank, genedb_pfalciparum, genedb_spombe, geneid, gi, gl, go, goa_ref, gopubmed, gr, gr_gene, gr_protein, gr_qtl, gr_ref, h-invdb, h_inv, hgnc, homologene, hpa, hpa_antibody, hprd, huge_navigator, hugo, intact, interpro, ipi, iproclass, isbn, issn, keywords, lifedb, linkedct_trials, ma, mesh, metacyc, mgc, mgi, msdchem, myexp_user, myexp_workflow, nar, ncbi, nextbio, nist_chemistry_webbook, nmrshiftdb_molecule, oclc, omim, pamgo_vmd, path, pathguide, pdb, pdbsum, pfam, pharmgkb, phosphosite, po, prints, prodom, prosite, pseudocap, psimod, pubchem, pubmed, reactome, rebase, refseq, rgd, rn, scop, seed, sgd, sgd_locus, sgd_ref, sgn, sgn_ref, sid, sider_drugs, sider_sideeffects, smart, so, srs, swoogle, symbol, tair_arabidopsis, taxon, taxonomy, tc, tgd_locus, tgd_ref, um-bbd, uniparc, uniprot, uniref, unists, wikipathways, wikipedia, xenbase, zfin


If you know of a biological database that has webpages for their items and is not listed here then feel free to comment about it here or email the group at bio2rdf@googlegroups.com