Lunchtime Forum on Diffraction Data Deposition at ECM28

This is a public forum that invites community input on strategies and desirable practices in providing open and long-term access to diffraction data sets.

Postby JRH » Tue Aug 20, 2013 8:06 am

ECM28 Lunchtime Open Forum on Diffraction Data Deposition (DDD)
This Forum for open discussion is organised by the DDD Working Group, appointed by the IUCr Executive Committee to define the need for and practicalities of routine deposition of our primary experimental data. It will take the form of a short review of progress during the Working Group's two years of activity since the IUCr Madrid Congress in 2011 and subsequently the 1 day Workshop satellite held at ECM 27 Bergen. This short review will be followed by opportunities for inputs from the community represented at ECM28. For a summary of Public input thus far see the IUCr Forum at:-
The date and time for the ECM28 Lunchtime Open Forum on Diffraction Data Deposition are Thursday August 29th 12 noon to 1pm.
The venue is the Warwick University Arts Centre Cinema.

A pdf of the starting presentation for the Lunchtime Forum is attached.

Open Forum at ECM28 on Raw data availability.pdf
The starting presentation for the Lunchtime Forum at ECM28.
Re: Lunchtime Forum on Diffraction Data Deposition at ECM28

Postby Brian McMahon » Mon Sep 02, 2013 10:20 am

An Open Forum meeting was held at the 28th European Crystallography Meeting (University of Warwick, 29 August) to continue the process of public consultation that informs the activities of the DDDWG. 15 people were present, including representatives of databases, commercial instrument manufacturers, software developers and practising structural scientists.

John Helliwell gave a necessarily brief report on the progress of the Working Group so far. In changing the proposal 1 wording from "Authors should provide raw data with a publication..." to "Authors may provide raw data with a publication ..." the IUCr Executive Committee had signalled a yet more finely measured progress towards any eventual, community based, requirements to attach raw data, but endorsed the allocation of resources to allow this to be achieved, e.g. within its own journals. There was also a second proposal for IUCr Commissions to define the metadata best suited in their field to characterising their experimental data, which was straightforwardly endorsed by the IUCr Executive Committee.

A set of potentially difficult discussion points was proposed by John Helliwell on behalf of the DDD WG, and any new input solicited from the audience. None were added. Thus the main topics for discussion were:

  • Do people actually request or air a view wishing to have access to raw data, whether published or unpublished?
  • How long should the raw data be available? As much as in perpetuity in the case of publication?
  • After a time period without a publication should raw data derived from public funding be mandated for release? Some research fields operate such a mandate after 3 years (e.g. space research)?
  • Local data archivists, rather than those at a specialised centralised repository, may be inexperienced at checking that depositors give all necessary metadata thus rendering the raw data of limited future use by other researchers?

In the discussion, it was suggested that scientists would be receptive to the provision of a lightweight interface allowing easy annotation of data sets during the process of uploading to a storage service. There seemed to be a strong sense that a community-driven central storage facility would be attractive, but a central metadata registry coupled to distributed storage (possibly using commercial suppliers) was another potentially workable solution. Both the PDB and CCDC would need additional income to allow them to undertake either of these centralised roles. However the decoupling of metadata management from bulk raw data storage might be helpful; on the other hand, the analysis from IUCr journals of bandwidth limitations involved in physical transmission of raw data sets suggested these were a more substantive problem associated with such an approach, rather than the cost of the data storage device after network transmission.

The experience of one recent publication where raw data were referenced from within the paper was encouraging; data sets stored locally and associated with the article of Tanley et al. (2013), J. Appl. Cryst. 46, 108-119 [ doi:10.1107/S0021889812044172 ] had been accessed around 20 times on the home server, and around 250 times (with 12 actual downloads) from a TARDIS mirror (in Australia). A secondary publication arising from downloads of these data has been accepted for JSR and a further paper had subsequently been submitted to Acta Cryst D.

An invitation to argue against the retention of raw data - in principle in perpetuity - raised no dissenting voices. Some discussion touched on an optimum target for realistic retention periods; it was pointed out that in practice this could vary according to national policies (e.g. 10 years after the last access was UK's EPSRC's policy, we were informed). There was also general approval for the idea of collating raw data sets into a repository for structures that had proved impossible to solve, as a future resource to be exploited; this could be pursued in parallel, it was felt, with the main objective of securing more raw data set examples linked with publications.

Perhaps most surprisingly no one spoke against the posited statement regarding release of publicly funded data after a given time period, i.e. where no publication had resulted. This is a practice adopted by e.g. space science and astronomy, who use three years as the time period. Most interestingly this procedure is already adopted by the UK National Crystallography Service for the chemical samples that are submitted by its user-customers.

Report by Brian McMahon and John Helliwell, 2 September 2013
Re: Lunchtime Forum on Diffraction Data Deposition at ECM28

Postby Brian McMahon » Wed Jan 15, 2014 9:45 am

Video recordings of the presentations made at the Crystallographic Information and Data Management symposium, also held at ECM28, a few days before the DDDWG meeting, are available from YouTube through the link ... ture=share

Links to the individual presentations are provided below:

I. Standard information exchange formalisms
1. A coherent information flow in crystallography - B. McMahon
2. mmCIF and structural bioinformatics - J. Westbrook
3. pdCIF and the messy world of real data - B. H. Toby
II. Improving the management of experimental data
4. The data explosion and the need to manage diverse data sources in scientific research - S.J. Coles
5. Deposition and use of raw diffraction images - J.R. Helliwell
6. Managing research data for diverse scientific experiments - E. Yang
7. Managing crystallographic data in facilities using integrated CIF, HDF5 and NeXus - H.J. Bernstein
8. Research data management and UK funding policies - S. Hodson
III. The integrity of published information
9. Publication of small-unit-cell structures in Acta Crystallographica - M.A. Hoyland
10. Validating a small-unit-cell structure; understanding checkCIF reports - A. Linden
11. Writing a macromolecular structure paper with publBio - M. Weiss
12. Deposition and validation of macromolecular structures - S. Velankar
IV. Towards ever better science
13. Data quality and the value of structural databases - C. Groom
14. Towards the semantic web of science - P. Murray-Rust
15. Into the future with CIF - N. Spadaccini
