This is a summary of some of the unique points made in a discussion on archiving images for PDB depositions Oct-Dec, 2012 on the CCP4 bulletin board. The IUCr has commissioned The Diffraction Data Deposition Working Group (DDDWG) to consider these issues. There is a public DDDWG web site where issues are discussed.
Relevant topics on the CCP4 bulletin board include:
- IUCr committees, depositing images
- IUCr discussion forum on diffraction data deposition
- raw data deposition
- raw data deposition (off-list)
- To archive or not to archive, that's the question!
- Archiving for fraud detection
- Archiving Images for PDB Depositions
- Image compression
Unique points made during the discussion
Information contained in raw images but not processed data
- Diffuse solvent contributions, commensurate and incommensurate superstructures, split reflections
- Multiple lattices
- Resolution of the data
Usefulness of raw images in reinterpretation of the data
- The data can be reanalyzed and reinterpreted
- Data can be reprocessed with greater care or skill than was applied the first time
- New biological insight can be obtained by reanalysis
- There are many examples where re-interpretation changes conclusions (somewhat)
- A completely different biological conclusion can be obtained by re-refinement
- Saved images allow systematic re-analysis of all deposited data
Improvements expected by saving raw images
- Better data processing software will certainly be developed and will provide a better data model, allowing better structure models.
- Improved methods can be applied for removing ice rings from images
- The raw images are a permanent experimental record
- Raw images make a rigorous check by reviewers possible
- Raw images are highly useful for methods developers
- Raw images allow re-analysis of modulated structures
- Saving raw images is the next logical step after saving structure factors
- Solving structures by completely new methods may require the raw data
- Finding over- or under-merged datasets requires the raw data
Disadvantages of saving raw images
- We may move totally beyond today's crystallography in the future and all this will be moot
- The effort would be better spent to save the crystals instead
- Perhaps we should focus instead on storing DNA
- Maybe the question is not "To archive or not to archive" but "What to archive"
- A view of one who is unconvinced of the utility
- And a rebuttal...
- It takes a lot of effort to store data in an organized way
- Preparing data for standardized deposition is time-consuming
Feasibility and costs of raw data storage
- Estimate of cost of data storage
- The cost of data storage is decreasing a factor of two every 14 months
- Storage is already being done at synchrotrons is
- Only saving a few datasets will be necessary for each structure so the amount of storage will not be huge.
- Local storage of images is entirely feasible and should be done routinely
- Who will pay for the costs?
- It is important to compare the costs of storage of data with the costs of repeating the experiment
What data should be saved?
- Save raw images through processed data along with metadata in a systematic way
- A systematic scale for data to be saved
- Not-quite-solved datasets would be useful to save in addition to solved datasets
- A focus on the datasets representing deposited structures improves practicality
Document identifiers and standardization
- A practical approach to storage of images is assigning DOI's to data and storing locally (e.g. at synchrotrons)
- DOI's can be assigned locally and saved as part of deposition; this can be tested systematically
- It is important to have standardization on image headers for re-useability
- Making sure data is safe
- Alternative storage locations may be useful
Examples of archiving images and metadata
- The IUCr is collecting examples of data that could be deposited and examining whether archiving as part of publication could be carried out
- TARDIS example of federated data storage of raw images
- The JCSG database is an example of how all this could be done
- A detailed analysis of the practical issues in storing images
- As a first step, we should save unmerged data
- Should release of raw data be compulsory?
- Who will ensure that the images deposited correspond to the structure deposited?
- A repository of data makes it possible to find images when you need them
- Saving images for fraud detection requires knowing the last trusted computer