by jcbollinger » Mon Jan 09, 2012 9:40 pm
Thank you, Herbert, I appreciate your insight.
yayahjb wrote:Having made the mistake of doing a non-validating CIF API and then adding validation to it -- I would say the biggest issues start in the design of the lexer and parser which need to be designed to allow for recovery after an error, rather than a hard abort in order to facilitate reporting of meaningful multiple errors in each pass rather than driving the user nuts with one error at a time. This relates to the need to design in a simple and effective error reporting/logging mechanism. Those are actually fairly general software engineering issues for any language.
That's an excellent point, and one that I'm all too prone to overlook despite having run into it before, both in CIF and in general context. Truly, validation considerations magnify the importance those issues.
It still seems like we're not quite connecting, however, for it is my fervent hope that the group never has need to discuss the details of a CIF lexer in this forum. The design specifications I am trying to reach, at least initially, will be primarily for the functions and data structures that programs using the API will touch -- that is, the public interface. Logging and error recovery are relevant there, to be sure, but even if we did not address them before considering validation, I don't see there being enough specifications altogether to make additions and changes an onerous task at the point where I would like to take that up.
yayahjb wrote:For CIF in particular, the most important design feature is to provide dictionary support. This has a strong impact on the design of the API because the dictionary format is somewhat different from the data CIF format. Less critical for the small molecule community is the need to cope sensibly with mixed DDL1/DDL2 data -- even if you intend to treat such mixing as an error, the API works better for users if you plan the hooks to tell the user what they did, rather than producing cryptic aborts.
I am very much hoping that we can come up with a design that localizes the differences between dictionary formats to a smallish number of routines for each DDL, but that's beside the point at the moment. Dictionaries and DDLs are purely validation considerations, and I can't see how the public interface for any "core" function would need to differ much with which DDLs were supported by the validation subsystem.
yayahjb wrote:There is more, but I think you get the point -- a design goes better if you plan ahead.
Oh, I certainly get the point. Design
is planning ahead, and that's just what I'm trying to do. I spent several years as a professional software architect for a profitable consultancy, and more moonlighting as a freelance designer, so I am no stranger to the concept. I am also, however, trying to make us as productive as possible by keeping our attention as focused at any given time as it is feasible to do. Moreover, I know from repeated, sometimes bitter experience that incremental design stands a far better chance of overall success than does trying to capture an entire system all at one go.
I hope that the limited scope of what I'm trying to do before attending to validation will allay your concerns. If not, then I'm sure I can trust you to continue to make those concerns known to the group.
Thank you, Herbert, I appreciate your insight.
[quote="yayahjb"]Having made the mistake of doing a non-validating CIF API and then adding validation to it -- I would say the biggest issues start in the design of the lexer and parser which need to be designed to allow for recovery after an error, rather than a hard abort in order to facilitate reporting of meaningful multiple errors in each pass rather than driving the user nuts with one error at a time. This relates to the need to design in a simple and effective error reporting/logging mechanism. Those are actually fairly general software engineering issues for any language. [/quote]
That's an excellent point, and one that I'm all too prone to overlook despite having run into it before, both in CIF and in general context. Truly, validation considerations magnify the importance those issues.
It still seems like we're not quite connecting, however, for it is my fervent hope that the group never has need to discuss the details of a CIF lexer in this forum. The design specifications I am trying to reach, at least initially, will be primarily for the functions and data structures that programs using the API will touch -- that is, the public interface. Logging and error recovery are relevant there, to be sure, but even if we did not address them before considering validation, I don't see there being enough specifications altogether to make additions and changes an onerous task at the point where I would like to take that up.
[quote="yayahjb"]For CIF in particular, the most important design feature is to provide dictionary support. This has a strong impact on the design of the API because the dictionary format is somewhat different from the data CIF format. Less critical for the small molecule community is the need to cope sensibly with mixed DDL1/DDL2 data -- even if you intend to treat such mixing as an error, the API works better for users if you plan the hooks to tell the user what they did, rather than producing cryptic aborts.[/quote]
I am very much hoping that we can come up with a design that localizes the differences between dictionary formats to a smallish number of routines for each DDL, but that's beside the point at the moment. Dictionaries and DDLs are purely validation considerations, and I can't see how the public interface for any "core" function would need to differ much with which DDLs were supported by the validation subsystem.
[quote="yayahjb"]There is more, but I think you get the point -- a design goes better if you plan ahead.[/quote]
Oh, I certainly get the point. Design [i]is[/i] planning ahead, and that's just what I'm trying to do. I spent several years as a professional software architect for a profitable consultancy, and more moonlighting as a freelance designer, so I am no stranger to the concept. I am also, however, trying to make us as productive as possible by keeping our attention as focused at any given time as it is feasible to do. Moreover, I know from repeated, sometimes bitter experience that incremental design stands a far better chance of overall success than does trying to capture an entire system all at one go.
I hope that the limited scope of what I'm trying to do before attending to validation will allay your concerns. If not, then I'm sure I can trust you to continue to make those concerns known to the group.