Wednesday, October 20, 2010

Tuesday, October 05, 2010

Bio2RDF return to Japan

Bio2RDF is returning in Japan again this year. We will give a talk about Bio2RDF at Biocuration 2010 . Biocuration is from October 11th to October 14th at Odaiba, Tokyo.

Wednesday, February 10, 2010

Bio2RDF Cognoscope presentation at BioHackathon 2010 in Tokyo

François Belleau from the Bio2RDF project was invited as an early Semantic Web technology adopter to present the Bio2RDF project at the annual BioHackathon 2010 held each year in Tokyo.

Monday, November 30, 2009

Registry for original provider HTML pages

If you weren't aware, the Bio2RDF project offers both RDF, and a service that redirects to either HTML, images, or other non-RDF sources that could be useful.

The HTML redirect service is particularly useful, because one can start at the Bio2RDF page, and follow a link that looks like "http://bio2rdf.org/html/namespace:identifier", to get to the original providers web page.

There are currently 142 namespaces that are registered along with HTML pages. Examples of these links are, the NextBio page for Amyloid Beta precursor protein (http://bio2rdf.org/nextbio:1445), the NCBI Entrez Geneid page for Superoxide dismutase 1 (http://bio2rdf.org/geneid:6647), the Pharmgkb page for Superoxide dismutase 1(http://bio2rdf.org/pharmgkb:PA334), and the HGNC page for Superoxide dismutase 1 (http://bio2rdf.org/hugo:SOD1).

The list below, details the namespace prefixes that are currently registered with Bio2RDF for this service. A full set of details about what services are provided for any particular namespaces are provided at here, and the entire RDF configuration that makes the Bio2RDF system work is available here (RDF/XML)

aceview, agi_locuscode, arrayexpress, asap, aspgd, aspgd_locus, aspgd_ref, bind, biogrid, biomodels, biopatml, biosystems, brenda, cas, cath, ccds, cdd, cgd, cgsc, chebi, chemidplus, cid, citations, cog, cpath, cpd, dbpedia, dbsnp, ddbj, dictybase, dictybase_trials, dip, doi, dr, drugbank_drugs, ec, echobase, eck, ecogene, embl, ensembl, enzyme, flybase, gdb, genbank, genedb_pfalciparum, genedb_spombe, geneid, gi, gl, go, goa_ref, gopubmed, gr, gr_gene, gr_protein, gr_qtl, gr_ref, h-invdb, h_inv, hgnc, homologene, hpa, hpa_antibody, hprd, huge_navigator, hugo, intact, interpro, ipi, iproclass, isbn, issn, keywords, lifedb, linkedct_trials, ma, mesh, metacyc, mgc, mgi, msdchem, myexp_user, myexp_workflow, nar, ncbi, nextbio, nist_chemistry_webbook, nmrshiftdb_molecule, oclc, omim, pamgo_vmd, path, pathguide, pdb, pdbsum, pfam, pharmgkb, phosphosite, po, prints, prodom, prosite, pseudocap, psimod, pubchem, pubmed, reactome, rebase, refseq, rgd, rn, scop, seed, sgd, sgd_locus, sgd_ref, sgn, sgn_ref, sid, sider_drugs, sider_sideeffects, smart, so, srs, swoogle, symbol, tair_arabidopsis, taxon, taxonomy, tc, tgd_locus, tgd_ref, um-bbd, uniparc, uniprot, uniref, unists, wikipathways, wikipedia, xenbase, zfin


If you know of a biological database that has webpages for their items and is not listed here then feel free to comment about it here or email the group at bio2rdf@googlegroups.com

Monday, September 14, 2009

Linking Open Drug Data wins the Triplify challenge

Congratulations to Kei's group and their Linking Open Drug Data (LODD) project for winning the Triplify challenge.

http://blog.aksw.org/2009/triplification-challenge-2009-winners/

http://triplify.org/files/challenge_2009/LODD.pdf


It is a new contribution to the LOD cloud and they have linked those new datasets to Bio2RDF and DBpedia URIs. That is the right way to do it !

Sunday, August 16, 2009

HOWTO: Using Bio2RDF

The Bio2RDF URI is formed by taking a datasource and assigning a prefix to it. The prefix is a string which is only allowed to contain letters, numbers, the underscore (_), and the hyphen (-). The unique identifier for each object inside of the namespace, as the primary key for an object, is then included with the namespace prefix to make up the Bio2RDF URI, http://bio2rdf.org/namespaceprefix:identifier. In this example a user wants to find information about Propanolol, and they know there is a Wikipedia article about the topic. Since DBpedia mirrors the Wikipedia structure and represents it using RDF, they could go to http://bio2rdf.org/dbpedia:Propranolol.

If the user then wants to find out where the Wikipedia article Propanolol is referenced in other databases, they can go to http://bio2rdf.org/links/dbpedia:Propranolol (may take a long time given the number of databases that are being used). If they know they only need to find out where the article is referenced in DrugBank, they can use http://bio2rdf.org/linksns/drugbank_drugs/dbpedia:Propranolol (should be much quicker because the number of databases is reduced here).

There is also search functionality embedded into the Bio2RDF system. Searches can be conducted on particular namespaces, or across the entire Bio2RDF system. If a user wants to conduct a search on namespace "chebi" for instance, and they want to search for "propanolol", they could go to http://bio2rdf.org/searchns/chebi/propanolol. If they then also wish to search for "propanolol" including the other namespaces they can go to http://bio2rdf.org/search/propanolol (this may be slow because of the number of databases that are available for search).

If a namespace has been configured with the ability to redirect to its original interface the redirection can be triggered by sending users to http://bio2rdf.org/html/namespace:identifier . For example, a user might be interested in http://bio2rdf.org/drugbank_drugs:DB00571 (the DrugBank identifier for Propanolol), and they want to see the original DrugBank interface. They could then go to http://bio2rdf.org/html/drugbank_drugs:DB00571 and their browser would be redirected to the description of that drug on the original DrugBank interface. Although not all namespaces have their original HTML interfaces encoded into the Bio2RDF system, some do, and it is a useful way of getting back to the non-RDF web.

If someone is interested in taking the Bio2RDF RDF versions and using them internally, they can make sure they request either of the supported RDF formats (RDF/XML and N3), but adding /rdfxml/ or /n3/ to the front of any of the URL's they desire. Each of the links given for URI's in this post have been to request the Bio2RDF HTML versions using /page/, but they can equivalently be requested using http://bio2rdf.org/rdfxml/linksns/drugbank_drugs/dbpedia:Propranolol or http://bio2rdf.org/n3/search/propanolol respectively for RDF/XML and N3 for example.

There are also advanced features for people wanting to determine the provenance of particular documents, since RDF doesn't natively support provenance for individual statements when multiple sources are merged into single documents, as Bio2RDF does. If the user wishes to know which sources of information were used in a particular document they can insert /queryplan/ at the start of the URI in order to get its provenance information http://bio2rdf.org/queryplan/linksns/drugbank_drugs/dbpedia:Propranolol. This information is returned as a set of objects, including Query Types, Providers and Namespaces, among other things. This information can then be used to recreate the exact set of queries, both SPARQL and otherwise, that were used to access the information, as long as the user has access to all of the provider endpoints in the query plan. In order to replicate the queries, users could perform a SPARQL query on the resulting document such as "SELECT ?endpoint ?query WHERE { ?queryBundle a <http://bio2rdf.org/ns/querybundle:QueryBundle> . ?queryBundle <http://bio2rdf.org/ns/querybundle:hasQueryLiteral> ?query . ?queryBundle <http://bio2rdf.org/ns/querybundle:hasQueryBundleEndpoint> ?endpoint . }". This query may not return exactly the same results, as there are also normalisation rules, which require knowledge of the Provider configuration in use (all of which is included in the document). To get these a more advanced query that referenced the "rdf:type to query for is http://bio2rdf.org/ns/querybundle:hasProviderConfigurationUri" predicate that is also attached to each querybundle would be required in order to determine which Provider was being used, and which RDF Normalisation rules (predicate to query for is http://bio2rdf.org/ns/provider:needsRdfNormalisation) were required by that provider configuration.

If there are too many results to return in one hit from a particular endpoint, the results given to the user will not be complete. Although there is currently no way of signalling this to users in the RDF document, users can manually inspect the queryplan to determine what the maximum will be and if the number of results is equal to or greater than this number, they can request subsequence offsets using the /pageoffsetNN/ mechanism, where NN is one or more digits indicating which page of results are being requested. /pageoffset32/ for instance would be interpreted as the 32nd page of results, while /pageoffset1/ is the first page, which is the default if nothing is specified. Each pageoffset may not return the same number of results because the resolution is implemented by distributing queries across endpoints, and it is not efficient (or possible in some cases), to query endpoints for the number of results before getting the information, and there is no natural ordering between the results returned by different endpoint. The resolver should be interpreted to be returning at least NNNN results from each endpoint where possible, and the distinct set of RDF statements that occur in these results are included in the document that is shown to the user. The default limit for the Bio2RDF system is currently 2000, so users can know if they receive more than 2000 results that they may be able to request the next pageoffset, ie, /pageoffset2/, etc., in order to retrieve more results if possible. Some queries may not include the limit as part of the query, and hence they will also not return different results for each pageoffset, so users should be careful that they don't request too many pageoffsets for this reason. The HTML interface for paging requests a maximum of 20 pageoffsets if needed, so links to the other pageoffsets are not picked up by robots (although /pageoffsetNN/ links should not be followed by robots as specified in the Bio2RDF robots.txt file).

The pageoffset can be included together with other instructions about the format and whether the query plan is required in the following order, with each part optional (except for the query) /FORMAT/queryplan/pageoffsetNN/query, where /FORMAT/ can be /rdfxml/, /n3/ or /page/, /queryplan/ is used to get the information about how the query would be resolved without performing the query, and the NN in the pageoffset section determines which page to resolve. For example, the HTML version of the queryplan for the 2nd pageoffset for the "linksns/drugbank_drugs/dbpedia:Propranolol" query can be found using http://bio2rdf.org/page/queryplan/pageoffset2/linksns/drugbank_drugs/dbpedia:Propranolol. A known issue is that the URL links to the RDF/XML and N3 versions at the bottom of the HTML page will request the actual query instead of the queryplan and it will also not have the pageoffset. This will be fixed in a future version, but if the URL is constructed in the correct way it will still currently work.

Because of the way the HTML redirections have been included into the system, requesting the queryplan for the HTML redirection encoded in N3 looks like /n3/queryplan/html/drugbank_drugs:DB00571, since the query in this case is "html/drugbank_drugs:DB00571", and the other parts are used to define the result format and provenance record being required respectively. http://bio2rdf.org/n3/queryplan/html/drugbank_drugs:DB00571