What are the existing requirements for data deposition

This is a public forum that invites community input on strategies and desirable practices in providing open and long-term access to diffraction data sets.

Moderator: Brian McMahon

Post Reply
Posts: 5
Joined: Fri Sep 16, 2011 6:44 pm

What are the existing requirements for data deposition

Post by terwill » Thu Oct 20, 2011 5:52 pm

John Helliwell points out that it might be useful to know what MX crystallographic data researchers in different countries are already expected to deposit or save. He notes that research funding agencies in the UK expect researchers to preserve their raw experimental data for at least 5 years.

Can people comment on what data they are already expected to save in their countries, and what mechanisms they already have for facilitating this (for example the Australian Research Council TARDIS initiative which helps store raw diffraction images)?

Posts: 1
Joined: Thu Oct 20, 2011 5:19 pm

Re: What are the existing requirements for data deposition

Post by fodje » Wed Oct 26, 2011 8:30 pm

Here are the CIHR requirements in Canada, other funding agencies have similar requirements


Grantees must deposit bioinformatics, atomic and molecular coordinate data into the appropriate public database immediately upon publication of research results (e.g., deposition of nucleic acid sequences into GenBank). Refer to the This link will take you to another Web site Annex of this policy for examples of research outputs and the corresponding publicly accessible repository or database. Data retention, as already required by the majority of institutions, is mandated by CIHR. Grantees must retain original data sets arising from CIHR-funded research for a minimum of five years after the end of the grant. This applies to all data, whether published or not. The grantee's institution and Research Ethics Board may have additional policies and practices regarding the preservation, retention and protection of research data that must be respected.

Posts: 1
Joined: Sat Nov 19, 2011 4:24 pm

Re: What are the existing requirements for data deposition

Post by dikay » Sat Nov 19, 2011 4:52 pm

Dear all,

in Germany the Deutsche Forschungsgemeinschaft (DFG) has laid out in 1998 what it considers as "Good Scientific Practice", and researchers at institutions at least partly funded by the DFG, including universities, are bound to comply with that. The paper (in German and English !) can be downloaded at http://www.dfg.de/download/pdf/dfg_im_p ... s_0198.pdf .

Most relevant is page 55 with recommendation 7 (out of 16). I cut and paste the full text, including Commentary, below. But let me say that currently German universities do _not_ offer any centralized way of storing primary data. I know of at least one "local" initiative, relevant for universities in Baden-Württemberg, which considers Large Scale Data Facility (LSDF) - check out http://cordis.europa.eu/baden-wuerttemb ... wu_en.html . The last sentence sounds highly relevant.

Personally, I believe that storage of primary data is an issue whose importance has been recognized by the high-ranking authorities, but the practical implementation in Germany is not very advanced. Thus it is currently the responsibility of a researcher to make sure that primary data are stored for at least 10 years (as per the DFG recommendations); there is no or little "official" support (that I know of) for such an undertaking. I dare to add that I guess that few researchers are aware of this particular recommendation.

Kay Diederichs

Appendix to this post: From: "Proposals for Safeguarding Good Scientific Practice", Deutsche Forschungsgemeinschaft (1998), pp. 55-56.

Recommendation 7
Primary data as the basis for publications shall be securely stored for ten years in a
durable form in the institution of their origin.

A scientific finding normally is a complex product of many single working steps. In all
experimental sciences, the results reported in publications are generated through in-
dividual observations or measurements adding up to preliminary findings. Observation
and experiment, as well as numerical calculation (used as an independent
method or to support data analysis), first produce “data”. The same is true for empirical
research in the social sciences.
Experiments and numerical calculations can only be repeated if all important
steps are reproducible. For this purpose, they must be recorded.
Every publication based on experiments or numerical simulations includes an
obligatory chapter on “materials and methods” summing up these records in such a
way that the work may be reproduced in another laboratory. Again, comparable approaches
are common in the social sciences, where it has become more and more customary
to archive primary survey data sets in an independent institution after they
have been analyzed by the group responsible for the survey.
Being able to refer to the original records is a necessary precaution for any
group if only for reasons of working efficiency. It becomes even more important when
published results are challenged by others.
Therefore every research institute applying professional standards in its work
has a clear policy for retaining research records and for the storage of primary data
and data carriers, even when this is not obligatory on legal or comparable grounds
following regulations laid down e. g. in German laws on medical drugs, on recombinant
DNA technology, on animal protection, or in professional codes such as Good
Clinical Practice. In the USA it is customary that such policies require the storage of
primary data (with the possibility of access by third parties entitled to it):
 in the laboratory of origin
 for eight to ten years after their generation.
In addition these policies regularly provide for the event that the person responsible for
generating the data moves to another institution. As a rule, the original records remain
in the laboratory of origin, but duplicates may be made or rights of access specified.
Experience indicates that laboratories of high quality are able to comply comfortably
with the practice of storing a duplicate of the complete data set on which a publication
is based, together with the publication manuscript and the relevant correspondence.
Space-saving techniques (e. g. diskette, CD-ROM) reduce the necessary effort.
The published reports on scientific misconduct are full of accounts of vanished
original data and of the circumstances under which they had reputedly been lost.
This, if nothing else, shows the importance of the following statement: The disappearance
of primary data from a laboratory is an infraction of basic principles of careful
scientific practice and justifies a prima facie assumption of dishonesty or gross negligence

Post Reply