Thursday, April 02, 2009

RDF use and generation improvements

The 0.3 version of the Bio2RDF Servlet implements true RDF handling in the background to provide consistency of output and the potential to support multiple output formats such as NTriples and Turtle in the future, although the only output currently supported is RDF/XML. The Sesame library is being used to provide this functionality.

Provide more RDFiser scripts as part of the source distribution, including Chebi, GO, Homologene, NCBI Geneid, HGNC, OBO and Ecocyc along with guides on the Bio2RDF wiki about how to use the scripts to regenerate new RDF versions using future versions of each database.

Live recent network statistics available

The 0.3 releases provide the ability to show live statistics to diagnose some network issues without having to look at log files. The URL is /admin/stats
  • Shows the last time the internal provider blacklist reset, indicating how much activity is being displayed as the statistics are reset everytime the blacklist is reset. This blacklist is only implemented to prevent malfunctioning queries from being further communicated with.
  • By default shows the IP's accessing the server, with an indication of the total number and duration of their queries. Can be configured in low use and private situations to also show the queries being performed
  • Shows the servers which have been unresponsive since the last blacklist reset including a basic reason, such as an HTTP 503 or 400 error
There is also a live blacklisting functionality provided in version 0.3.2 to prevent crawlers who regularly utilise functionality that they shouldn't according to the Bio2RDF robots.txt file. The settings for this have been set rather high by default, and this functionality can be turned off completely by people who download and install the package and datasets locally. Specifically, a regular user of the public mirrors should make sure that they are not making either more than 40 requests in each 12 minute statistics period, or if they are making more than 40 requests in each 12 minute period, more than 25% of the queries should be for non-Robots.txt queries. These parameters will possibly change depending on further investigation. An individual can access /error/blacklist even if they are not blacklisted currently to show a list of requests from their IP address since the start of the last 12 minute statistics period.

Support provided for more non-Bio2RDF providers

The 0.3 Bio2RDF Servlet release implements support for more non-Bio2RDF SPARQL endpoints such as LinkedCT, DrugBank, Dailymed, Diseasome, Neurocommons, DBPedia, and Flyted/Flybase .

The relevant namespaces for these inside of Bio2RDF are:
  • DBpedia - dbpedia, dbpedia_property, dbpedia_class
  • LinkedCT - linkedct_ontology, linkedct_intervention, linkedct_trials, linkedct_collabagency, linkedct_condition, linkedct_link, linkedct_location, linkedct_overall_official, linkedct_oversight, linkedct_primary_outcomes, linkedct_reference, linkedct_results_reference, linkedct_secondary_outcomes, linkedct_arm_group
  • Dailymed - dailymed_ontology, dailymed_drugs, dailymed_inactiveingredient, dailymed_routeofadministration, dailymed_organization
  • DrugBank - drugbank_ontology, drugbank_druginteractions, drugbank_drugs, drugbank_enzymes, drugbank_drugtype, drugbank_drugcategory, drugbank_dosageforms, drugbank_targets
  • Diseasome - diseasome_ontology, diseasome_diseases, diseasome_genes, diseasome_chromosomallocation, diseasome_diseaseclass
  • Neurocommons - Uses the equivalent Bio2RDF namespaces, with live owl:sameAs links back to the relevant Neurocommons namespaces. Used for pubmed, geneid, taxonomy, mesh, prosite and go so far
  • Flyted/Flybase - Not converted yet, only direct access provided using search functionalities
Provide live owl:sameAs references which match the URI's used in SPARQL queries to keep linkages to the original databases without leaving the Bio2RDF database:identifier paradigm, so if people know the DBPedia, etc., URI's, the link to their current knowledge is given

Some http://database.bio2rdf.org/database:identifier URI's are produced by the owl:sameAs additions, but these aren't standard, and are only shown where there is still at least one SPARQL endpoint available which still uses them. People should utilise the http://bio2rdf.org/database:identifier versions when linking to Bio2RDF.

Any further contributions to this list, or additions of other datasets which already utilise Bio2RDF URI's would be very useful! See the list of namespaces already implemented here.

Provider, query and namespace statistics now available

At the time of posting Bio2RDF supported:
  • 230 namespaces
  • 35 different internal query titles (some of these map to the same URI pattern, so there are not this many URI query options)
  • 140 provider options, including a large number of /html/database:identifier providers which redirect to HTML pages which describe the Bio2RDF Identifier as well as the Bio2RDF SPARQL endpoints
More statistics can be found here

A list of the actual provider URL's mapped back to namespaces and queries can be found by downloading the Bio2RDF Servlet and changing a setting in log4j.properties to make the page more verbose. If the setting were turned on for the public mirrors it would result in a very large file each time.

LSID support for Bio2RDF

From release 0.3.2 of the Bio2RDF Servlet, any URI similar to http://bio2rdf.org/namespace:identifier will be accessible using its equivalent LSID, with http://bio2rdf.org/ as the proxy, using http://bio2rdf.org/urn:lsid:bio2rdf.org:namespace:identifier . The LSID syntax will not be available for use with custom services such as http://bio2rdf.org/links/namespace:identifier or http://bio2rdf.org/search/searchterm.

This will NOT become the standard identifier, but it provides compatibility with some users who wish to utilise LSID's.

Monday, March 30, 2009

Bio2RDF and Semantic Web Pipes

The Bio2RDF Servlet has been packaged with Semantic Web Pipes. It provides runtime support for pipes, without the designer. Pipes you design at either the public pipes website, or your own pipes webapp, will run inside your Bio2RDF server, providing another method for scripting your queries.

Once you download and install the Servlet you will be able to access the pipes functionality using URL's which look like the following:

http://localhost:8080/pipes/bio2rdf_subject_object_slicing/namespace=keywords/identifier=11

Each of the Parameter's in the pipe are entered using "name=value" combinations and put together using "/".

Download the latest Bio2RDF Servlet to experiment.

Saturday, March 21, 2009

Bio2RDF's contribution to the GGG is on the map


I am very pleased to see that Bio2RDF contribution is now on the GGG map of linked data. A big thanks to all the data provider and the active members of the Bio2RDF group. All the SPARQL endpoints we provides are not there yet but it is a great beginning.


Thursday, February 05, 2009

When Bio2RDF meets Taverna

Try this Taverna workflow to explore the possibilities of building a mashup on the fly from Bio2RDF's sparql endpoints.

What is known about HIV using Bio2RDF's SPARQL endpoints ?

Wednesday, October 29, 2008

Tabulator and Bio2RDF sparql point a new integrated way to surf genomic knowledge

Try the Tabulator generic data browser to surf the 2 billions triples from Bio2RDF sparql points network. Install the FireFox plugin and strart from here :

http://bio2rdf.org/demo

Semantic Web Challenge 2008 participation

The Bio2RDF team is presenting a demo at the Semantic Web Challenge 2008.

http://challenge.semanticweb.org/

Good luck to Marc-Alexandre, Michel and Peter and all the others team member.

The paper is available here.

Bio2RDF Network Of Linked Data

Bio2RDF OWL ontology finaly available

This is the new OWL ontology file description of Bio2RDF project. It is a rough description of most types and predicates in the new Bio2RDF network of sparql graph.

http://bio2rdf.org/bio2rdf-2008.owl

A nice way to discover it is by using Protege tool and the Ontoviz plugin.

Here is what 2 billions triples from 40 datasources looks like :

Monday, July 28, 2008

Bio2RDF project's litterature review

Those links brings you to the actual Bio2RDF publications available.

The presentation are on SlideShare.

Bio2RDF: Towards a mashup to build bioinformatics knowledge systems published in Journal of Biomedical Informatics. Its pubmed entry pubmed:18472304.

The initial version presented at WWW2007 HCLS Workshop.

Bio2RDF : A Semantic Web Atlas of Post Genomic Knowledge about Human and Mouse part of Data Integration in the Life Sciences 5th International Workshop, DILS 2008, Evry, France, June 25-27, 2008.

And a look in Google Scholar for citations.

Wednesday, July 16, 2008

Sunday, June 08, 2008

Bio2RDF/Virtuoso demonstration of its SPARQL point will be held at ISMB2008

Technology Track: TT18

How to use Bio2RDF with a Virtuoso Server

Monday, July 21 - 2:45 p.m. - 3:10 p.m.

Room: 701B

Presented by: François Belleau, Centre de Recherche du CHUL (CHUQ), CA

Abstract:
Our vision apply semantic web technology to realize data integration in bioinformatics. In this demo we show how to use Virtuoso server to store Bio2RDF linked data. We also demonstrate how millions of triples from 30 different data providers, can be queried with SPARQL to answer question.



Presentation invitation: http://www.iscb.org/uploaded/css/25/4095.pdf

http://www.iscb.org/ismb2008/programwebconf.php

Bio2RDF at DILS2008 on June 27th in Evry (Paris)

The presentation about the lat Bio2RDF paper is planned at DILS2008.

DILS2008 Program

We hope to have the occasion to meet Bio2RDF futur users. See you there.

Here is the presentation :

http://www.ncbi.nlm.nih.gov/pubmed/18472304 owl:sameAs http://bio2rdf.org/pubmed:18472304

Bo2RDF project is know self-conscious of itself.

http://www.ncbi.nlm.nih.gov/pubmed/18472304

also known as

urn:bm:pubmed:18472304

My deep thanks to my master project supervisor Jean Morissette and Nicole Tourigny, without whom this student project would not have get that far. I would also thank Marc-Alexandre Nolin for his precious collaboration and Philippe Rigault who make his lab essential resources available to the project.

Monday, April 21, 2008

Bio2RDF do SPARQL for the WWW2008 occasion

This is the unofficial presentation of Bio2RDF at the Linked Data on the Web (LDOW2008), thanks to Kingsley Idehen of Open Link software for is support.

You should read is paper Linked Data Spaces & Data Portability.

Monday, April 14, 2008

65 million triples about Human and Mouse available for download


Bio2RDF project will be presented at the DILS2008 conference in Paris. The selected paper describes the 65 million triples mashup about Human and Mouse post genomic knowledge. 30 datasources have been merged together to build this semantic warehouse. The whole graph can be downloaded in N3 format from this adresse http://bio2rdf.org/download. The full list of datasources is available at the Banff Manifesto site on Freebase.

Thursday, February 07, 2008

Bio2RDF first version of its knowledge map

We are using Many Eyes visualization service from IBM to let you explore interactively the knowledge map of Bio2RDF Atlas about Human and Mouse.