Comments vs. whitespace

Forum for CIF developers to define an application programming interface for CIF software.

Moderators: Brian McMahon, jcbollinger

Post Reply
jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Comments vs. whitespace

Post by jcbollinger » Mon Sep 30, 2013 7:09 pm

While working on details of the CIF API's lexical scanner, I have discovered a couple of possible ambiguities with respect to where comments are allowed. Below I document the issue and how I have chosen to handle it in the scanner, but if anyone thinks the specs demand a different interpretation then I am happy to have that discussion here. In that case, I will alter the scanner as needed to conform to the consensus interpretation.

The problem arises because "whitespace" is a defined term in the CIF2 specs, and its definition does not include comments. In fact, neither the CIF 1.1 specification text nor the CIF 2 changes document discusses where comments may appear, though the formal grammar appended to the CIF 1.1 specs does cover the issue: according to the grammar they can appear in most places where whitespace is allowed, pretty much as we would expect (or as I would, anyway). Note, however, that the CIF 1.1 grammar requires comments to be separated from other tokens by spaces, tabs, and/or newlines.

Ambiguity 1
The first ambiguity, then, is whether CIF 2 still requires comments to be separated from other tokens by bona fide whitespace, even though they are not explicitly covered by CIF 2's change 10. I am opting here for consistency with CIF 1.1, requiring comments to be separated from preceding tokens by whitespace and taking the following newlines as separating them from subsequent tokens. I hope this is not controversial. Examples:

Code: Select all

<beginning of file># No preceding whitespace needed because there are no preceding tokens
data_foo # whitespace required before this comment
_name 'value' # whitespace required here, too
_name2
;value 2
; # whitespace also required here


Ambiguity 2
Guidance from CIF 1 is weaker when it comes to values of type list and table. From the changes document we have that CIF 2 list elements and table entries must be separated from each other by whitespace. Since comments are not whitespace in the sense of the defined term, there is a question as to whether they may appear inside list and table values. Following from CIF 1 general practice and typical rules for other file formats, I am opting here to allow comments inside list and table values. I think this is useful and also what users will expect. Examples:

Code: Select all

_list1 [ # comment allowed here
  'value1' # and here
  'value2' 'value3'
# and here
  'value4'
  #and here
]

_table1 { #comment allowed here
  'key1':'value1' # and here
  'key2':'value2'
# and here
}


Ambiguity 3
We have that the delimiting brackets or braces of list and table values do not require whitespace separation from the elements / entries within. Allowing comments in list and table values then raises an additional ambiguity: if a comment appears before any elements of a list or before any entries of a table, does it need to be separated from the opening bracket or brace by whitespace characters, as is required in other contexts where comments appear? By analogy with list elements and table entries, I am opting to not require whitespace separation in such cases. Example:

Code: Select all

_table1
{#no whitespace needed
'key1':'value1' 'key2':[#not needed here, either
]
'key3':[[{}]]
}
Justification for this is weaker, but I think it will be less surprising to users than the alternative.

If there is any objection to or discussion of these interpretations then I would be delighted to hear them.


John

jamesrhester
Posts: 39
Joined: Mon Sep 19, 2011 8:21 am

Re: Comments vs. whitespace

Post by jamesrhester » Wed Oct 02, 2013 1:02 am

I have no objection to any of these proposals. I think the general rule of thumb would be not to change CIF1 behaviour without good reason, and to be flexible if it does not complicate the parser.

Post Reply