Tuesday, April 21, 2009

2,4 billions triples of Bioinformatics RAW DATA NOW

In his recent talk at TED, Tim Berner Lee invited the data provider to make available data in RDF format to help the building process of linked data web. He asked them to offer RAW DATA NOW.

We totally share this approach in the Bio2RDF community, our goal is to make public datasets from the bioinformatics community available in RDF format via standard SPARQL endpoints (Virtuoso server is used for that). We strongly believe in the semantic web approach to solve science problem but we do not want to wait for data provider to do the RAW DATA conversion job. Converting data to RDF is not fun, we did a lot of this dirty job, and here are the results for actual Bio2RDF release of 34 data sources.

Our current datasets in N3 format are available here :


We invite semantic search engine provider to index these files.

The way we produce them is documented in our Wiki at SourceForge in the Cookbook section :


The actual list of SPARQL endpoints in the linked data cloud is hosted here :


Bio2RDF 2,4 billions triples graph of linked data represents 51 % of the actual global linked data graph size.

Finally, this is what this highly connected knowledge world look like.

I would take this occasion to thanks all the enthusiast biologist and researcher who invest themselves by annotating article, protein and gene product. Without this essential work of connecting documents and concepts together, this project would not have been possible.

For the 20th anniversary of the web, I would also want to thanks Tim Berner Lee for his inspiring vision. Bio2RDF may not be the awaited killer app of the life science to demonstrate the semantic web potential, but let's say that it is only the beginning of the linked data cloud build by and for scientists.

The WWW2009 workshop Linked Data on the Web (LDOW2009) was held today, I would like to say how important the work of this community is. Finally a last word to congratulate Virtuoso team and especially Orri Erling for his fantastic work with the new Virtuoso 6.0 server soon to be released. I cannot wait to see Bio2RDF data into this amazing engine.

Bio2RDF's map new graphic representation

This word net represents the actual namespace connection between Bio2RDF SPARQL endpoints. RDF datasets which were analyzed comes from Bio2RDF's download page. These representations are generated with Many Eyes visualization tools.

Static version.

This graph represent connections between namespaces of Bio2RDF's network graph of SPARQL endpoints, highlighted orange dots corresponds to Bio2RDF rdfised database.

Static version.