An Open Forum meeting was held at the 28th European Crystallography Meeting (University of Warwick, 29 August) to continue the process of public consultation that informs the activities of the DDDWG. 15 people were present, including representatives of databases, commercial instrument manufacturers, software developers and practising structural scientists.
John Helliwell gave a necessarily brief report on the progress of the Working Group so far. In changing the proposal 1 wording from "
Authors should provide raw data with a publication..." to "
Authors may provide raw data with a publication ..." the IUCr Executive Committee had signalled a yet more finely measured progress towards any eventual, community based, requirements to attach raw data, but endorsed the allocation of resources to allow this to be achieved,
e.g. within its own journals. There was also a second proposal for IUCr Commissions to define the metadata best suited in their field to characterising their experimental data, which was straightforwardly endorsed by the IUCr Executive Committee.
A set of potentially difficult discussion points was proposed by John Helliwell on behalf of the DDD WG, and any new input solicited from the audience. None were added. Thus the main topics for discussion were:
- Do people actually request or air a view wishing to have access to raw data, whether published or unpublished?
- How long should the raw data be available? As much as in perpetuity in the case of publication?
- After a time period without a publication should raw data derived from public funding be mandated for release? Some research fields operate such a mandate after 3 years (e.g. space research)?
- Local data archivists, rather than those at a specialised centralised repository, may be inexperienced at checking that depositors give all necessary metadata thus rendering the raw data of limited future use by other researchers?
In the discussion, it was suggested that scientists would be receptive to the provision of a lightweight interface allowing easy annotation of data sets during the process of uploading to a storage service. There seemed to be a strong sense that a community-driven central storage facility would be attractive, but a central metadata registry coupled to distributed storage (possibly using commercial suppliers) was another potentially workable solution. Both the PDB and CCDC would need additional income to allow them to undertake either of these centralised roles. However the decoupling of metadata management from bulk raw data storage might be helpful; on the other hand, the analysis from IUCr journals of bandwidth limitations involved in physical transmission of raw data sets suggested these were a more substantive problem associated with such an approach, rather than the cost of the data storage device after network transmission.
The experience of one recent publication where raw data were referenced from within the paper was encouraging; data sets stored locally and associated with the article of Tanley
et al. (2013),
J. Appl. Cryst. 46, 108-119 [ doi:
10.1107/S0021889812044172 ] had been accessed around 20 times on the home server, and around 250 times (with 12 actual downloads) from a TARDIS mirror (in Australia). A secondary publication arising from downloads of these data has been accepted for
JSR and a further paper had subsequently been submitted to
Acta Cryst D.
An invitation to argue against the retention of raw data - in principle in perpetuity - raised no dissenting voices. Some discussion touched on an optimum target for realistic retention periods; it was pointed out that in practice this could vary according to national policies (
e.g. 10 years after the last access was UK's EPSRC's policy, we were informed). There was also general approval for the idea of collating raw data sets into a repository for structures that had proved impossible to solve, as a future resource to be exploited; this could be pursued in parallel, it was felt, with the main objective of securing more raw data set examples linked with publications.
Perhaps most surprisingly no one spoke against the posited statement regarding release of publicly funded data after a given time period,
i.e. where no publication had resulted. This is a practice adopted by
e.g. space science and astronomy, who use three years as the time period. Most interestingly this procedure is already adopted by the UK National Crystallography Service for the chemical samples that are submitted by its user-customers.
Report by Brian McMahon and John Helliwell, 2 September 2013