Mixed-discipline data CIFs

Discussion of namespace conventions for community-developed CIF dictionaries.

Moderators: Brian McMahon, jcbollinger

jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Re: Mixed-discipline data CIFs

Post by jcbollinger » Thu Aug 08, 2013 8:53 pm

jamesrhester wrote:So in practice Solution 4 should work as follows: Programmer A wishing to emit CIF data files that include moon surface temperatures from the 'astronomy' discipline together with (IUCr dataname) chemical structures found on that moon would create an adapter dictionary that e.g. prefixes all astronomy datanames with '_astro', and then write the output program assuming that dictionary. Programmer B writing a program that wishes to read such files would understand the contents of the adapter dictionary, and that the '_astro' prefix is prepended to astronomy discipline datanames, and thus be able to find the temperatures.


The CIF producer might or might not need to create an adapter dictionary himself. At IUCr or interestested-party option, supposing we agree to allow it, some adapter dictionaries could be centrally curated just as other ancillary dictionaries already are now. Still, if the CIF producer did need to create an adapter dictionary himself, one of the alternatives that could be supported is to do so parametrically, rather than as separate, literal dictionary. More on that below.

On the consumer side there are two main cases:
  • The CIF consumer sees the data name without particular prior knowledge, and wants to know what it means, how to validate it, or whatever. In this case it simply looks up the name in the appropriate local-domain dictionary. Under some circumstances it might be referred from there to a definition (likely of a different name) in some other dictionary. The correct local-domain dictionary is chosen by the mechanisms already defined, or based on a dictionary specification included in CIF form in the file itself.
  • The CIF consumer is looking for a particular datum. That item perforce belongs to the local domain, because there is no provision for direct usage of foreign items. For the consumer to have any reason to expect the item might be present, it must either be relying on a centrally curated adapter dictionary, or it must have a special arrangement with the CIF producer. Either way, it knows what name to look for -- it does not have to try to deduce what local name is mapped to a foreign name of interest.
Nevertheless, this proposal can and should provide for sufficient data to be encoded so that a general CIF consumer can recognize which data names are associated with foreign-domain items, including which foreign domain, which dictionary, and which foreign item name. That is sufficient for a program to locate in a given CIF the item associated with a particular foreign definition, even though it should not usually be necessary to do so.

There are several non-exclusive possibilities for adapter dictionaries that IUCr might choose to support, including:
  • IUCr or any third party creates adapter dictionaries for (a subset of) other domains in complete, standalone form, reserving name prefixes with IUCr for the purpose just like might be done for any other ancillary dictionary in the IUCr domain.
  • IUCr or any third party creates adapter dictionaries for (a subset of) other domains in an indirect form primarily referencing and relying on a foreign dictionary, reserving name prefixes with IUCr for the purpose just like might be done for any other ancillary dictionary in the IUCr domain.
  • Individual CIF producers use [local] names, either with a formal dictionary of their own creation, or else simply with an implicit dictionary.
  • Individual CIF producers express per-CIF adapter dictionaries in a virtual, parameterized, indirect form described by data items included in the CIF and referencing a foreign dictionary (analogous to XML's provisions for binding namespace prefixes to schemas).

jamesrhester wrote:Questions:
1. Are all CIF-writing programs and/or dictionary writers expected to understand that the _astro prefix is reserved? If so, in what way is this different to the current prefix system? Or is the method of translation of datanames specified in each adapter dictionary?


For formally curated adapter dictionaries, the prefix part is not different or separate from the current prefix system. That's part of the point: we use a mechanism we have already established instead of creating a new one. I don't see why it would be useful or desirable to create a separate, parallel, mechanism.

For adaptation via [local] data names, all the same provisions apply as for any other such names. Again, nothing fundamentally new is required.

If support for virtual adapter dictionaries were adopted, then the parameters describing those dictionaries would need to include everything necessary to map local data names to those defined in the referenced foreign dictionary. For per-CIF virtual dictionaries it would not be essential to use reserved prefixes. CIF processors wishing to validate data items against such adapter dictionaries would need to be able to understand the indirection involved in order to retrieve the foreign dictionary and map data names into it.

Although the name-mapping mechanism would be expressed either implicitly or explicitly in adapter dictionaries, it's not directly relevant to most use cases, as discussed above. CIF-consuming programs generally should not need to know about it, though the parser might need the information internally, especially if it validates.

jamesrhester wrote:2. How are the adapter dictionaries constructed? Who does this work and makes sure that it agrees with all other adapter dictionaries that might be relevant to the same datablock?


Whoever wants badly enough to use foreign items will create and maintain appropriate adapter dictionaries, just as dictionary creation works now. However, where complete and direct item-by-item correlation is what is wanted, adapter dictionaries could easily be created via a program, provided only that target dictionaries are written in DDLs that we are prepared to read. If a given target dictionary is not written in a DDL that we are prepared to read then some sort of standalone adapter dictionary is required no matter what.

I'm not sure what concern you see with multiple adapter dictionaries relevant to the same block. At least, I don't see what new problem adapter dictionaries would present in this regard.

jamesrhester wrote:3. Are all CIF reading programs expected to read the adapter dictionaries in order to understand how to obtain foreign discipline datanames?


Foreign-domain data names are not directly relevant to most CIF consumers under this proposal. That, too, is part of the point. It is the intention that adapter dictionaries should (or at least could) contain all data needed to validate CIFs using the datanames defined therein, either directly or indirectly, just like any other CIF dictionary. Being defined in an adapter dictionary inherently makes data names local, even if they are based on or refer directly to items in a dictionary belonging to a different domain.

If a mechanism for indirect adapter dictionaries were accepted, such as either of the two described above, then CIF readers wishing to validate items drawn indirectly from foreign dictionaries would need to obtain the foreign dictionary and use the defined name mapping to validate local items against it. That information would be carried by the adapter, though programs could conceivably shortcut for centrally-curated adapters. Whether any given program provides such a facility is at the discretion of the author.

jamesrhester wrote:As you can probably tell I don't think I've completely understood the subtleties of the proposal, so if you have a chance John could you take us through an example usage scenario?


Inasmuch as you may be saying that you perceive flaws in the proposal, please go ahead and lay them out. Though the idea seems good to me at the moment, maybe I'm blind to some egregious shortcoming. Until such time as I am persuaded that there is a better alternative, however, I'll answer criticisms as best I can.

The main potential subtlety I have recognized is already discussed above: that computing the data name by which to look up a foreign-domain item is generally not the problem at hand (though the proposal can nevertheless provide the data needed to do that).

Otherwise, perhaps you are looking for subtleties that aren't there. Certainly, some of the key alternatives for adapter dictionaries would allow nominally interdomain CIFs to be created and used exactly as any CIF is today. Consumers would not necessarily even need to recognize their cross-domain nature. That's one of the objectives of the proposal; to exercise the simplest options, we don't need anything new.

The major underlying principles are these:
  • Because domains are fundamentally organizational units for dictionaries, cross-domain relationships should be handled at dictionary level. Therefore, this is a proposal for binding foreign items into the local domain via dictionaries, not for (directly) referencing foreign-domain items.
  • As much as possible, we should use mechanisms and facilities that already exist instead of creating new ones.
  • Valid uses of the cross-domain support should produce the minimum possible breakage of current CIF software (and in particular, the system avoid defining standard aliases for local-domain data names).

Additionally, it is desirable that
  • domains have as much independence as possible, including to control which foreign data items, if any, they want to formally support in their CIFs, and that
  • the system not depend on foreign domains to use the same DDLs that the local domain does (though doing so may facilitate supporting inter-domain CIFs).


The very simplest alternative would produce adapter dictionaries containing definitions such as this (please forgive my uncouth use of DDL1):

Code: Select all

data_astro_lunar_surface_temp
    _name                      '_astro_lunar_surface_temp'
    _category                    astro_lunar
    _type                        numb
    _list                        no
    _enumeration_range           0.0:
    _definition
;              The lunar surface temperature, expressed in Kelvin.
               Drawn from item _lunar_surface_temp of IAU-domain core
               dictionary astro-cif.dic, version 1.0.1.
;


At the other extreme, use of a per-cif virtual adapter dictionary might look something like this:

Code: Select all

data_example
loop_
  _foreign.dictionary_prefix
  _foreign.domain
  _foreign.dictionary_uri
  _foreign.dictionary_expected_version
  _foreign.name_mapping
  'astro' 'IAU' 'http://www.iau.org/cif/astro-cif.dic' '1.0.1' 'remove-prefix'

_astro_lunar.surface_temp 390(2)
_cell.length_a 12.345(6) 


Or even like this:

Code: Select all

data_example
loop_
  _foreign.dictionary_prefix
  _foreign.domain
  _foreign.dictionary_uri
  _foreign.dictionary_expected_version
  _foreign.name_mapping
  'astro::' 'IAU' 'http://www.iau.org/cif/astro-cif.dic' '1.0.1' 'remove-prefix'

_astro::_lunar.surface_temp 390(2)
_cell.length_a 12.345(6) 


There are a variety of intermediate possibilities, all of them mutually compatible in that different alternatives could be used for different foreign dictionaries, or even for the same foreign dictionary in different cifs.

Post Reply