by Brian McMahon » Mon Jan 23, 2012 10:39 am
(Posted on behalf of Tom Terwilliger for technical reasons.)This is a summary of some of the unique points made in a discussion on archiving images for PDB depositions Oct-Dec, 2012 on the
CCP4 bulletin board. The IUCr has commissioned
The Diffraction Data Deposition Working Group (DDDWG) to consider these issues. There is a
public DDDWG web site where issues are discussed.
T. Terwilliger Relevant topics on the CCP4 bulletin board include: Unique points made during the discussion Information contained in raw images but not processed dataUsefulness of raw images in reinterpretation of the dataImprovements expected by saving raw images Disadvantages of saving raw imagesFeasibility and costs of raw data storage What data should be saved?Document identifiers and standardizationExamples of archiving images and metadataPractical issues Image compression Crystallographers' opinions
[i](Posted on behalf of Tom Terwilliger for technical reasons.)[/i]
This is a summary of some of the unique points made in a discussion on archiving images for PDB depositions Oct-Dec, 2012 on the [url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=CCP4BB]CCP4 bulletin board[/url]. The IUCr has commissioned [b]The Diffraction Data Deposition Working Group[/b] (DDDWG) to consider these issues. There is a [url=http://forums.iucr.org/viewforum.php?f=21]public DDDWG web site[/url] where issues are discussed.
[i]T. Terwilliger[/i]
[b]Relevant topics on the CCP4 bulletin board include:[/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1110&L=CCP4BB#71]IUCr committees, depositing images[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1110&L=CCP4BB#72]IUCr discussion forum on diffraction data deposition[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1110&L=CCP4BB#126]raw data deposition[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1110&L=CCP4BB#127]raw data deposition (off-list)[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1110&L=CCP4BB#148]To archive or not to archive, that's the question![/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1111&L=CCP4BB#17]Archiving for fraud detection[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1111&L=CCP4BB#18]Archiving Images for PDB Depositions[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1111&L=CCP4BB#56]Image compression[/url][/*][/list]
[b]Unique points made during the discussion[/b]
[b]Information contained in raw images but not processed data[/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=185307]Diffuse solvent contributions, commensurate and incommensurate superstructures, split reflections[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=246933]Multiple lattices[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=409739]Resolution of the data[/url][/*]
[/list]
[b]Usefulness of raw images in reinterpretation of the data[/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&P=R62960&1=CCP4BB&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4]The data can be reanalyzed and reinterpreted[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=417794]Data can be reprocessed with greater care or skill than was applied the first time[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=365596]New biological insight can be obtained by reanalysis[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=422087]There are many examples where re-interpretation changes conclusions (somewhat)[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=396076]A completely different biological conclusion can be obtained by re-refinement[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=246100]Saved images allow systematic re-analysis of all deposited data[/url][/*]
[/list]
[b]Improvements expected by saving raw images [/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=185307]Better data processing software will certainly be developed and will provide a better data model, allowing better structure models.[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=246933]Improved methods can be applied for removing ice rings from images[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=240448]The raw images are a permanent experimental record[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=240448]Raw images make a rigorous check by reviewers possible[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=240448]Raw images are highly useful for methods developers[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=329858]Raw images allow re-analysis of modulated structures[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=388870]Saving raw images is the next logical step after saving structure factors[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1111&L=CCP4BB&F=&S=&P=15651]Solving structures by completely new methods may require the raw data[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1111&L=CCP4BB&F=&S=&P=36275]Finding over- or under-merged datasets requires the raw data[/url][/*]
[/list]
[b]Disadvantages of saving raw images[/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=386987]We may move totally beyond today's crystallography in the future and all this will be moot[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=404903]The effort would be better spent to save the crystals instead[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1111&L=CCP4BB&F=&S=&P=1285]Perhaps we should focus instead on storing DNA[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=441194]Maybe the question is not "To archive or not to archive" but "What to archive"[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=443413]A view of one who is unconvinced of the utility[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=452150]And a rebuttal...[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1111&L=CCP4BB&F=&S=&P=75414]It takes a lot of effort to store data in an organized way[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=260251]Preparing data for standardized deposition is time-consuming[/url][/*]
[/list]
[b]Feasibility and costs of raw data storage [/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=312350]Estimate of cost of data storage[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=237026]The cost of data storage is decreasing a factor of two every 14 months[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=320177]Storage is already being done at synchrotrons is[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=187707]Only saving a few datasets will be necessary for each structure so the amount of storage will not be huge.[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=184537]Local storage of images is entirely feasible and should be done routinely[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=323904]Who will pay for the costs?[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=225031]It is important to compare the costs of storage of data with the costs of repeating the experiment[/url][/*]
[/list]
[b]What data should be saved?[/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=344225]Save raw images through processed data along with metadata in a systematic way[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=354796]A systematic scale for data to be saved[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=246100]Not-quite-solved datasets would be useful to save in addition to solved datasets[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=326700]A focus on the datasets representing deposited structures improves practicality[/url][/*]
[/list]
[b]Document identifiers and standardization[/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=257200]A practical approach to storage of images is assigning DOI's to data and storing locally (e.g. at synchrotrons)[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=363135]DOI's can be assigned locally and saved as part of deposition; this can be tested systematically[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=258954]It is important to have standardization on image headers for re-useability[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=434437]Making sure data is safe[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=438157]Alternative storage locations may be useful[/url][/*]
[/list]
[b]Examples of archiving images and metadata[/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=366234]The IUCr is collecting examples of data that could be deposited and examining whether archiving as part of publication could be carried out[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=393655]TARDIS example of federated data storage of raw images[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1111&L=CCP4BB&F=&S=&P=91668]The JCSG database is an example of how all this could be done[/url][/*]
[/list]
[b]Practical issues [/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=425107]A detailed analysis of the practical issues in storing images[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1111&L=CCP4BB&F=&S=&P=4084]As a first step, we should save unmerged data[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=401356]Should release of raw data be compulsory?[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=425107]Who will ensure that the images deposited correspond to the structure deposited?[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=445867]A repository of data makes it possible to find images when you need them[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1111&L=CCP4BB&F=&S=&P=40497]Saving images for fraud detection requires knowing the last trusted computer[/url][/*]
[/list]
[b]Image compression [/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=308985]Perhaps we should save compressed images[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1111&L=CCP4BB&F=&S=&P=71742]Would people use compressed images?[/url][/*]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1111&L=CCP4BB&F=&S=&P=81856]No need for compression[/url][/*]
[/list]
[b]Crystallographers' opinions[/b]
[list]
[*][url=https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1110&L=CCP4BB&F=&S=&P=379444]An informal poll on saving diffraction images[/url][/*]
[/list]