by jamesrhester » Tue Jan 08, 2013 1:36 am
jcbollinger wrote:Thanks, James, for the clarifications.
I don't see any show-stoppers, but I do see some issues that should be considered:
- For the _audit.domain item to be of much use, its values need to be drawn from a list of standard alternatives. Domains could self-select, but if CIF were some day to spread to a large number of domains then the risk of domain collisions would be non-negligible. Thus, a complete solution requires a central registry of CIF domains, yet requiring domain registration limits (slightly) how independent domains can really be.
I think that there should be a simple central registry, run by the IUCr. This should be viewed as a service which allows different domains to avoid collisions, rather than appearing as a requirement on all CIF users. COMCIFS would develop a set of criteria for granting domain status, something along the lines of an application having to come from a credible central body that would undertake to coordinate that domain (such as a learned society). The single requirement would be that no domain is allowed to redefine the '_audit.domain' dataname(s). Those who wish to use CIF with no reference at all to the IUCr would then simply have to avoid a single dataname.
- We should consider how the _audit.domain item interacts with data blocks and save frames. As I presently see it, CIF holds that only an _audit.domain item appearing in the same frame could apply to items in a save frame, and only one appearing in the same block could apply to items appearing directly in a data block. Is that what is wanted?
The relationship of save frames to the datablock has been little discussed, presumably because up until now they have only been used in dictionaries as a convenient encapsulation device. I do discuss some of the issues
in the PyCIFRW paper, section 2.3.2. For the present discussion it must be true that the scope of an _audit.domain dataname appearing in the datablock proper would include the contents of save frames, as otherwise the datafile could contain datanames from different domains, which we are trying to avoid. This behaviour is broadly in line with dataname use in DDL2 dictionaries.
If _audit.domain appears in a save frame, and that save frame is then incorporated into a datablock, as the DDLm paper proposes, then the datanames appearing the save frame must come from the same domain as the datablock datanames if we are to avoid mixing domains. Logically, therefore, all save frames must have the same _audit.domain unless they are simply used as an encapsulation device. To avoid too much program logic tracing use of save frames through the datafile, I suggest that we specify that the scope of _audit.domain is the entire datablock in which it appears, and recommend that it only appears in the datablock proper. If there were a way in DDLm to limit _audit.domain use to the datablock proper, I would embrace that immediately.
- Do we really want to have cross-domain dictionary merging as originally suggested? The clarifications present a persuasive argument against it.
No, we definitely do not want cross-domain dictionary merging.
- We should consider whether _audit.domain is itself defined in its own dictionary(-ies), whether it is explicitly or implicitly added to one or more domain dictionaries, or whether its definition is a special case, defined by the standard instead of by any dictionary.
I think that only the latter can work - a separate 'audit' dictionary that is considered common to all domains, and is always implied in any CIF file. This is consistent with the first point above.
- As a related, minor issue, we should consider with what name formalism(s) the item should comply
I continue to assert that semantically there is no such thing as a DDL1 or DDL2 dataname, as the dataname structure carries no information. Given that DDLm has adopted the DDL2 conventions, I think that a DDL2-style name is adequate. DDL1 programs are perfectly capable of checking and handling datanames that contain a period.
- We should make recommendations as to how to handle CIFs that do not carry the _audit.domain item.
I believe that this is really up to the programmer and particular problem space. For example, the more datanames that are successfully read in, the more likely you are reading a file belonging to your expected domain. I think that all we can provide is a discussion of the issues.
Some choices are:
(i) prompt the user to confirm choice of domain (for interactive programs)
(ii) check for datanames that really should be unique to crystallography
(iii) include a warning in output
(iv) terminate
[quote="jcbollinger"]Thanks, James, for the clarifications.
I don't see any show-stoppers, but I do see some issues that should be considered:
[list][*]For the _audit.domain item to be of much use, its values need to be drawn from a list of standard alternatives. Domains could self-select, but if CIF were some day to spread to a large number of domains then the risk of domain collisions would be non-negligible. Thus, a complete solution requires a central registry of CIF domains, yet requiring domain registration limits (slightly) how independent domains can really be.
[/quote]
I think that there should be a simple central registry, run by the IUCr. This should be viewed as a service which allows different domains to avoid collisions, rather than appearing as a requirement on all CIF users. COMCIFS would develop a set of criteria for granting domain status, something along the lines of an application having to come from a credible central body that would undertake to coordinate that domain (such as a learned society). The single requirement would be that no domain is allowed to redefine the '_audit.domain' dataname(s). Those who wish to use CIF with no reference at all to the IUCr would then simply have to avoid a single dataname.
[quote]
[list]
[*]We should consider how the _audit.domain item interacts with data blocks and save frames. As I presently see it, CIF holds that only an _audit.domain item appearing in the same frame could apply to items in a save frame, and only one appearing in the same block could apply to items appearing directly in a data block. Is that what is wanted?
[/quote]
The relationship of save frames to the datablock has been little discussed, presumably because up until now they have only been used in dictionaries as a convenient encapsulation device. I do discuss some of the issues [url=http://journals.iucr.org/j/issues/2006/04/00/wf5020/index.html#SEC2.3.2]in the PyCIFRW paper, section 2.3.2[/url]. For the present discussion it must be true that the scope of an _audit.domain dataname appearing in the datablock proper would include the contents of save frames, as otherwise the datafile could contain datanames from different domains, which we are trying to avoid. This behaviour is broadly in line with dataname use in DDL2 dictionaries.
If _audit.domain appears in a save frame, and that save frame is then incorporated into a datablock, as the DDLm paper proposes, then the datanames appearing the save frame must come from the same domain as the datablock datanames if we are to avoid mixing domains. Logically, therefore, all save frames must have the same _audit.domain unless they are simply used as an encapsulation device. To avoid too much program logic tracing use of save frames through the datafile, I suggest that we specify that the scope of _audit.domain is the entire datablock in which it appears, and recommend that it only appears in the datablock proper. If there were a way in DDLm to limit _audit.domain use to the datablock proper, I would embrace that immediately.
[quote]
[list]
[*]Do we really want to have cross-domain dictionary merging as originally suggested? The clarifications present a persuasive argument against it.[/list][/quote]
No, we definitely do not want cross-domain dictionary merging.
[quote]
[list]
[*]We should consider whether _audit.domain is itself defined in its own dictionary(-ies), whether it is explicitly or implicitly added to one or more domain dictionaries, or whether its definition is a special case, defined by the standard instead of by any dictionary.[/list][/quote]
I think that only the latter can work - a separate 'audit' dictionary that is considered common to all domains, and is always implied in any CIF file. This is consistent with the first point above.
[quote][list][*]As a related, minor issue, we should consider with what name formalism(s) the item should comply[/list]
[/quote]
I continue to assert that semantically there is no such thing as a DDL1 or DDL2 dataname, as the dataname structure carries no information. Given that DDLm has adopted the DDL2 conventions, I think that a DDL2-style name is adequate. DDL1 programs are perfectly capable of checking and handling datanames that contain a period.
[quote][list][*]We should make recommendations as to how to handle CIFs that do not carry the _audit.domain item.[/list][/list][/list][/quote]
I believe that this is really up to the programmer and particular problem space. For example, the more datanames that are successfully read in, the more likely you are reading a file belonging to your expected domain. I think that all we can provide is a discussion of the issues.
Some choices are:
(i) prompt the user to confirm choice of domain (for interactive programs)
(ii) check for datanames that really should be unique to crystallography
(iii) include a warning in output
(iv) terminate