Period covered: August 2011 to July 2012
The Diffraction Data Deposition Working Group (DDDWG) was created at the Madrid IUCr Congress with the following terms of reference:
It is becoming increasingly important to deposit the raw data from scattering experiments; a lot of valuable information gets lost when only structure factors are deposited. A number of research centres, e.g. synchrotron and neutron facilities, are fully aware of the need and have established detector working groups addressing this issue. The IUCr is the natural organization to lead the development of standards for the representation of data and associated metadata that can lead to the routine deposition of raw data. A Working Group on these matters has thereby been launched by the IUCr Executive Committee, to which the Working Group will report, to be Chaired by Professor John R. Helliwell. Its provisional title is 'Diffraction Data Deposition Working Group of the IUCr'.
Since Madrid, the group has been active in a number of ways:
(1) An enthusiastic and lively inaugural meeting was held at the Madrid Congress; a report is available at viewtopic.php?f=16&t=45
(2) The DDDWG recruited a number of Consultants and an active consultation group, which Commission Chairs have been invited to join or appoint representatives. The Commission on Biological Macromolecules proposed as a member of the DDWG Dr Tom Terwilliger (TT) who was their specialist on such matters, and which was approved by the IUCr Executive Committee.
(3) A set of forums with different access privileges has been set up at forums.iucr.org. Many of the posts in the public forum have been viewed several hundred times.
(4) Interactions within a vigorous discussion on the ccp4bb has been collated and summarised to sample the views of the macromolecular community; see viewtopic.php?f=21&t=79
(5) A preliminary analysis of the feasibility of storing diffraction data images on the IUCr journals platform concluded that this would not be feasible because of network bandwidth limitations and the heavy investment in infrastructure that would be needed to scale up to host large data sets, even though actual storage costs per terabyte might be affordable on a per article basis (i.e. which could be added as a simple one off fee to the publication costs).
(6) The registration process through the DataCite consortium of digital object identifiers (DOI) for raw data sets at Diamond is being monitored.
(7) Discussions have taken place with University of Manchester data archive and repository staff to explore what is required to archive raw data through institutional repositories. The University plans to launch a data archive in September 2012 as an extension of its staff ‘eScholar’ reprint repository. This initiative is driven by the University feeling the burden of responsibility in effect entrusted to them by the funders of research projects being of the view that PIs are retaining raw data (although precise definitions of raw data seem to vary).
(8) A paper comparing data sets relating to a group of similar protein metal ligand complexes, from different home laboratory X-ray diffractometers, and using different software packages, is being submitted to Journal of Applied Crystallography. It covers both metadata issues and scientific implications of handling diverse image data sets as well as providing access to eleven ‘raw datasets’ comprising ~35 Gbytes of X-ray diffraction data images.
(9) The imgCIF dictionaries continue to be developed in a way that will facilitate interoperability with NeXus/HDF5 workflows at large Synchrotron Radiation facilities, and imgCIF/CBF is now supported as an image format by all the major vendors.
(10) Workshops and smaller-scale briefing/committee meetings are taking place at the 2012 ACA, ECA and AsCA regional crystallography meetings to review progress and identify additional areas for activity. A poster prepared by JRH and BMcM for the British Crystallographic Association brought the work of the WG to a wider audience.
(11) A trilogy of articles on ‘The Living Publication’ has been published under the auspices of ICSTI by JRH, BMcM and TT. These will become open access on August 1st 2012 (after a 3 months embargo). They chart the opportunities already in hand and in the future prospect for developing the results from a publication which can thereby be called a living, ever developing, communication on a piece of science rather than a static object.
(12) Close discussion and careful information flow to/from the Protein Data Bank has been a feature of the DDDWG and detailed queries from senior figures within the USA macromolecular crystallography community have been fielded at times in the past year by JRH, BMcM and TT. TT has also provided detailed input to an USA NSF ‘call for information’ on linking data with literature in consultation with JRH and BMcM. The PDB archive of processed data is given exemplary mention in a new report by The Royal Society, Science as an open enterprise.
John R. Helliwell
Chair