bioCADDIE Webinar | The CardioVascular Research Grid

May 14, 2015 - 10:00am PDT

Content Mining of the Bioscience Literature

Abstract

The ContentMine has developed Open tools for mining the scientific and medical literature (full text, figures, images and supplemental data). We have developed a pipeline to cover the whole process of Crawling, Scraping, Normalising and Mining articles and storing/republishing the results. We are now doing this on a daily basis.

The ContentMine is funded by a Fellowship to PMR from the Shuttleworth Foundation. The aims include the creation of subcommunities, and unrestricted dissemination of all materials, code and results (Apache 2, CC-BY and CC0 as appropriate). We intend to generate publish 100 million facts per year available for use and re-use. The system is designed to allow anyone to create pluggable resources (code, vocabularies) and to make ContentMining easy and available to anyone. Much of our work is through interactive workshops and we hope to show participants how to start ContentMining. Two of our approaches include downloadable virtual machines and a web service.

Bios

Dr. Peter Murray-Rust is a chemist currently working at the University of Cambridge. As well as his work in chemistry, Dr. Murray-Rust is also known for his support of open access and open data. He leads the team at the ContentMine project which uses machines to liberate 100,000,000 facts from the scientific literature. After obtaining a Ph.D., he became lecturer in chemistry at the (new) University of Stirling and was first warden of Andrew Stewart Hall of Residence. In 1982 he moved to Glaxo Group Research at Greenford to head Molecular Graphics, Computational Chemistry and later protein structure determination. He was Professor of Pharmacy in the University of Nottingham from 1996-2000, setting up the Virtual School of Molecular Sciences. He is now Reader in Molecular Informatics at the University of Cambridge and Senior Research Fellow of Churchill College, Cambridge.

Dr. Murray-Rust's research interests have involved the automated analysis of data in scientific publications, creation of virtual communities (e.g., The Virtual School of Natural Sciences in the Globewide Network Academy and the Semantic Web). With Henry Rzepa he has extended this to chemistry through the development of markup languages, especially Chemical Markup Language. He campaigns for open data, particularly in science, and is on the advisory board of the Open Knowledge Foundation and a co-author of the Panton Principles for Open scientific data. Together with a few other chemists he was a founder member of the Blue Obelisk movement in 2005.

Click here for the latest details on the bioCADDIE website.