Whitespace in table entries

Forum for CIF developers to define an application programming interface for CIF software.

Moderators: Brian McMahon, jcbollinger

Post Reply
jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Whitespace in table entries

Post by jcbollinger » Tue Oct 01, 2013 3:44 pm

It is my interpretation of the changes document and of the DDLm working group archives that it is not syntactically correct for whitespace to appear between key and separating colon or between colon and value of a table entry in a CIF 2 document. This is a bit ticklish, however, because the values in table entries can be text blocks, which have whitespace (end-of-line) as part of their delimiter.

It is reasonable to account the (one) end-of-line of a text block's opening delimiter as part of its physical value, just as the opening quote of a quoted value would be accounted part of that physical value. By that reasoning, text blocks as table values do not inherently violate of the restriction against whitespace between colon and value. Nevertheless, that makes the language a bit trickier to parse correctly, and it introduces the possibility for strange behavior. For example:

Code: Select all

# valid:
_table1 {'key':
;value1
;}

# invalid:
_table2 {'key':
;value2
;}

Don't be confused if you can't see the difference: that's the point. In fact, the difference is just that in the second table there is a space character after the colon and before the opening newline/semicolon delimiter of the value. (Or at least that's the way I typed it; I think the forum software may be eating the trailing space.)

Also, inasmuch as it is a rather fine distinction to allow text blocks as table values by accounting the newline as part of the delimiter, I think it will be lost on many users why the following is not valid when the _table1 example is:

Code: Select all

# also invalid
_table3 {'key':
value3
}

Is not the newline between key and value there also a delimiter? We even call such values "whitespace-delimited" in the changes document (though that's not an entirely correct characterization; "bare" would be more accurate).

Bottom line: the syntax and grammar as I interpret them can be successfully parsed, but the details in this area are likely to surprise users. Although it is not necessary to change anything, we should consider allowing arbitrary whitespace (including comments) between the colon and value in table entries.

jamesrhester
Posts: 39
Joined: Mon Sep 19, 2011 8:21 am

Re: Whitespace in table entries

Post by jamesrhester » Wed Oct 02, 2013 1:07 am

I agree with John's proposal to allow whitespace around the full colon in tables. As the key is a delimited string, detecting the end of the key is not dependent on an immediately following full colon, and likewise, there is no ambiguity introduced by allowing whitespace after the full colon.

I would view these issues as being "clarifications" rather than serious fiddling with the specifications. These are akin to the clarifications that often arise when a BNF grammar is being produced.

jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Re: Whitespace in table entries

Post by jcbollinger » Wed Oct 02, 2013 3:26 pm

jamesrhester wrote:I agree with John's proposal to allow whitespace around the full colon in tables. As the key is a delimited string, detecting the end of the key is not dependent on an immediately following full colon, and likewise, there is no ambiguity introduced by allowing whitespace after the full colon.

Hmmm. What I proposed was not whitespace "around" the colon but only whitespace after it, between colon and value. The more general "around" alternative does solve the problems I described, however, and it's what Nick at one point claimed was the original intent. I'll see how that goes.

jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Re: Whitespace in table entries

Post by jcbollinger » Thu Oct 03, 2013 4:02 pm

jcbollinger wrote:The more general "around" alternative does solve the problems I described, however, and it's what Nick at one point claimed was the original intent. I'll see how that goes.

For the curious: I have set up the parser to allow whitespace on both sides of the colon in a table entry, and in the process discovered that although it does solve the problems I described, it wreaks havoc on error recovery code and changes (a bit) what kinds of errors a parser is able to recognize.

I will accept that exchange if it's the extent of what we are willing to do, but I would still prefer that whitespace be allowed only between colon and value, not between key and colon. That would permit the parser to more precisely diagnose cases where the opening brace of a table is omitted, and to more reliably re-synchronize after a table entry corrupted by omission of key, colon, or value. I emphasize that these error diagnosis and recovery issues arise from the characteristics of the CIF language with and without whitespace allowed between key and colon; they are not peculiar to my parser implementation.

In case there is any willingness to consider allowing whitespace between colon and value but forbidding it between key and colon, I also observe that it would be much easier for us to remove constraints later (say in a CIF 2.1) than to add them.

jamesrhester
Posts: 39
Joined: Mon Sep 19, 2011 8:21 am

Re: Whitespace in table entries

Post by jamesrhester » Tue Oct 08, 2013 12:13 am

I have no objection to allowing whitespace only between colon and datavalue. As you note here and elsewhere, it is easy to relax this constraint in the future if necessary.

jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Re: Comments vs. whitespace

Post by jcbollinger » Mon Sep 15, 2014 2:34 pm

irag12 wrote:I agree with John's proposal to allow whitespace around the full colon in tables. As the key is a delimited string, detecting the end of the key is not dependent on an immediately following full colon, and likewise, there is no ambiguity introduced by allowing whitespace after the full colon.


Although I appreciate the support, I feel compelled to observe that my proposal was not to allow whitespace around the colon in table entries, but rather to allow it only after the colon. Well-formed tables can be parsed successfully under any of those syntax rules, but the language is more robust and errors are easier to recognize and diagnose if well-formed table keys can be distinguished from values.

Post Reply