Notes on archiving raw crystallographic data and metadata.
Tom Terwilliger, 2012-01-18
Principal reasons for archiving data for a crystallographic experiment in addition to its interpretation as a model
1. Verification that the data are consistent with their interpretation
error and fraud checking
2. Reinterpretation of the data
improving the models
systematic rebuilding of entire database of models
obtaining new biological information
solving previously unsolvable structures
3. Development of improved methods
4. Education
allowing new researchers to see how data are analyzed
What data needs to be archived?
1. For verification
sufficient data for validation
possibly the procedures used as well to validate the process
2. For reinterpretation and methods development
raw data
metadata on the experiment
What form should archived data and metadata take?
1. Raw images
native
derivatives
alternative wavelengths
alternative crystal forms
2. Processed and output data
unmerged structure factor amplitudes
uncertainties
electron density maps used in map interpretation
3. Data collection metadata
beam wavelength, bandpass, polarization, intensity, size at crystal
detector distance, center, orientation; crystal position and orientation, rotation (angle vs time)
time of start and end of collection, dose
crystal contents, macromolecule sequences and modifications, metals, small molecules, buffers
concentrations/stoichiometries in crystallization
source of materials (e.g. expression system which may leave small molecules attached to purified macromolecules)
4. Related information
intent of the experiment (e.g., SeMet MAD; 2-deriv MIR; MR)
search model for MR
strategy used for structure solution
exact procedure used for structure solution (optional)
software used at each step, all non-default parameters
How can it be verified that all the information needed for reproducibility and interpretation is deposited?
1. Automatic validation of the interpretation
2. Automatic reinterpretation of the data
Discussion points on archiving raw data
Do we need to document the process used to obtain a model?
Yes--some aspects of a model are only plausible if the process is carried out properly (placing ligands only in density). A highly biased model (such as one obtained by iterative low freeR choice and recombination) is too difficult to disentangle
No -- any aspect of a model can be verified after the model is complete (by bias-removal techniques)
Is experimental derivative/anomalous data crucial or is the final model sufficient?
Yes -- these data are crucial, particularly at low res where the final model may not be well-defined. Even at high-res, the phase information from derivs/anom may be useful (e.g., for identifying anomalous scatterers) and for improving the phases
No -- at high-res, once you have the final model it can be validated in the absence of deriv/anom data which is different anyhow. There is no need to re-determine the structure once it is obtained with high-resolution data.
Is it necessary to document NCS?
No -- these can be extracted easily from coordinates
Is it necessary to document TLS groupings?
Yes, these are part of the model
Is it necessary to document heavy-atom sites?
No -- these can be obtained easily from LLG maps if data are available
Yes -- the interpretation of what site goes with what atom type is part of the model
Is it necessary to document restraints used (as opposed to coordinates/B/occ)?
No -- these are not part of the model itself, they are only used to generate the model
Yes -- they are essential for evaluating aspects of the model (if a bond length is restrained then its value is affected by those restraints and it should not be used as a new free estimate of that bond length).
What data need to be archived?
This forum allows IUCr Commissions, subject experts and invited consultants to provide input to the IUCr Working Group on Diffraction Data Deposition.
Return to “Consultation on diffraction data deposition”
Jump to
- Executive Committee
- ↳ Journals review committee
- ↳ IYCr steering committee
- Commissions
- ↳ Crystallographic Nomenclature
- ↳ Biological Macromolecules
- IUCr journals
- ↳ Journal Editors and Section Editors
- ↳ Acta A Co-editors
- ↳ Validation and publication
- International Tables for Crystallography
- ↳ Volume H: Powder Diffraction
- ↳ Volume H planning
- ↳ Volume B: Reciprocal Space
- ↳ Volume B/C planning
- Standing Committees and Working Groups
- ↳ Diffraction data deposition
- ↳ Consultation on diffraction data deposition
- ↳ Public input on diffraction data deposition
- ↳ Description of Nanomaterials
- ↳ Committee on Data
- ↳ Public input to CommDat
- Crystallographic Information Framework
- ↳ CIF Application Programming Interface
- ↳ CIF dictionary namespace conventions
- ↳ NeXus HDF5 CIF convergence
- ↳ Core CIF review and update
- ↳ CIF2.0
- Sandbox
- ↳ Online dictionary tests
- ↳ Testing
- ↳ test