Scope of the API - features

Forum for CIF developers to define an application programming interface for CIF software.

Scope of the API - features

Postby jcbollinger » Wed Dec 21, 2011 12:24 am

The first question that occurred to me when I was presented with the idea of a standard CIF API was that of scope. What actions and data structures, generally speaking, will the API provide to clients? Here are some of the things that it might provide:

  • Data structures representing CIFs and their components
  • Functions for building, manipulating, and examining in-memory CIF data
  • Functions for reading and writing CIF files
  • Data structures representing CIF dictionaries and their components
  • Functions implementing the CIF dictionary merging protocol
  • Functions or options for validating CIF
Does anyone have other classes of features that we should consider including?

Does anyone want to omit any of the feature groups above, or characterize them differently? In particular, does this API need to address validation? If yes, then does the initial version of the API need to do so, or could that work be deferred to a later version of the API or to a companion API?
jcbollinger
 
Posts: 57
Joined: Tue Dec 20, 2011 3:41 pm

Re: Scope of the API - features

Postby jamesrhester » Fri Dec 23, 2011 1:18 am

I think it would be productive to identify a core set of features to start with, and once that is decided to tackle the less widely used features. My list of core features in terms of actions would be:

(1) Open, read, write, close a CIF file
(2) Read, write a key-value pair
(3) Open/Create a loop structure
(4) Read/Write packets from a loop structure
(5) Add/remove columns from a loop structure

These features roughly match to the first 3 points on John's list.
jamesrhester
 
Posts: 39
Joined: Mon Sep 19, 2011 8:21 am

Re: Scope of the API - features

Postby yayahjb » Fri Dec 23, 2011 4:48 am

Validation is difficult to add later if it has not been provided for in the initial design. If the concern is efficiency, I would suggest designing on the basis of a validating parser and then providing the option of a bypass of the validation for efficiency.
yayahjb
 
Posts: 18
Joined: Sun Sep 11, 2011 9:54 pm

Re: Scope of the API - features

Postby rjgildea » Fri Dec 23, 2011 7:49 pm

yayahjb wrote:Validation is difficult to add later if it has not been provided for in the initial design. If the concern is efficiency, I would suggest designing on the basis of a validating parser and then providing the option of a bypass of the validation for efficiency.


It is not clear to me why the validation step should in any way be involved in the parsing step as they are two completely distinct steps. In my opinion, parsing is solely for the purpose of syntax checking and populating the internal data structure representation of the CIF format, whilst validation of the content is functionality that is performed upon that data structure independently of parsing. In my experience it is mostly parsing and building of some internal data structure that is required by an application, with dictionary-based validation used far less frequently.
rjgildea
 
Posts: 3
Joined: Fri Dec 23, 2011 7:34 pm

Re: Scope of the API - features

Postby yayahjb » Fri Dec 23, 2011 9:31 pm

Certainly many CIFs can be parsed successfully without recourse to a dictionary. However, there are also CIFs for which parsing without a dictionary can be difficult (e.g. due to confusion between strings and numbers). If we are trying to design a common API to be used by a wide range of applications on a wide range of CIFs, it make senses to provide the necessary hooks for dictionary use when needed as well as the ability to use the same API without reference to a dictionary when desired. Designing a common reference API for use by the entire community is a different task from designing APIs to serve particular subsets of the community. In the end it may not be possible to provide a single API that satisfies all needs, but I suggest that it is worth considering the possibility of doing so.
yayahjb
 
Posts: 18
Joined: Sun Sep 11, 2011 9:54 pm

Re: Scope of the API - features

Postby rjgildea » Fri Dec 23, 2011 9:49 pm

yayahjb wrote:However, there are also CIFs for which parsing without a dictionary can be difficult (e.g. due to confusion between strings and numbers).


Do you have an example where this is the case? I have yet to see a CIF that can't be parsed using only the formal definition of the syntax? Interpreting the content is another matter, however I rarely find that programmatic recourse to a dictionary is necessary even for that.
rjgildea
 
Posts: 3
Joined: Fri Dec 23, 2011 7:34 pm

Re: Scope of the API - features

Postby yayahjb » Fri Dec 23, 2011 11:22 pm

The most difficult cases to handle without a dictionary that I encounter are unquoted string of digits with leading zeros and embedded pluses hyphens. These could be intended as numbers or as serial numbers in bibliographic context or as symmetry operations. Having the dictionary type specified greatly reduces possible confusion in parsing them. However, the point is not whether you or I have particularly troubling cases, but whether the design of the API will allow the API to support a reasonably wide range of application developers, some of whom may be new to CIF and may rely heavily on the API to help them avoid mistakes, and some of whom may be old hands with very clean data and very limited need for support from the API. I believe a good API should support both.
yayahjb
 
Posts: 18
Joined: Sun Sep 11, 2011 9:54 pm

Re: Scope of the API - features

Postby jcbollinger » Tue Jan 03, 2012 7:59 pm

yayahjb wrote:Validation is difficult to add later if it has not been provided for in the initial design. If the concern is efficiency, I would suggest designing on the basis of a validating parser and then providing the option of a bypass of the validation for efficiency.

That's a fair consideration, but I think it impacts the API implementation more than the design. Am I mistaken? Before we expend a great deal of energy on what appears to me to be a procedural question, I would like to have an idea of the stakes.

What design-level difficulties might arise from addressing validation after designing at least a rough version of the some of the other, more universal API features?

Or else, how might an API design that addresses validation be distinguished from one that addressed only James's list of core features? What makes those distinctions difficult to add after the fact?
jcbollinger
 
Posts: 57
Joined: Tue Dec 20, 2011 3:41 pm

Re: Scope of the API - features

Postby jcbollinger » Tue Jan 03, 2012 8:16 pm

jcbollinger wrote:That's a fair consideration, but I think it impacts the API implementation more than the design.

I realized immediately after I wrote that that there are two separate questions here:
  • I offered the possibility that perhaps validation could be addressed via a companion API, which clearly has implementation considerations.
  • James, as I read him, merely suggested that we focus first on the design of the core features, from which he excludes validation.
I think we can follow James's suggestion while reserving judgement on the other. Indeed, that approach may put us in a better position to decide, later, how validation would best be incorporated. Are there any objections?
jcbollinger
 
Posts: 57
Joined: Tue Dec 20, 2011 3:41 pm

Re: Scope of the API - features

Postby jcbollinger » Mon Jan 09, 2012 4:15 pm

jcbollinger wrote:Are there any objections?

I take most of a week of silence as the absence of objections. I will shortly open one or more new topics dedicated to requirements for the "core" features, and to the extent we can reasonably do so, we will defer discussion of validation details.
jcbollinger
 
Posts: 57
Joined: Tue Dec 20, 2011 3:41 pm

Re: Scope of the API - features

Postby yayahjb » Mon Jan 09, 2012 4:47 pm

As previously noted, I think failing to make allowances for validation in the initial design will greatly increased the difficulty in incorporating it later, while designing to include validation from the start costs very little and, when properly done can easily be turned off when efficiency or other considerations demand it.
yayahjb
 
Posts: 18
Joined: Sun Sep 11, 2011 9:54 pm

Re: Scope of the API - features

Postby jcbollinger » Mon Jan 09, 2012 8:04 pm

yayahjb wrote:As previously noted, I think failing to make allowances for validation in the initial design will greatly increased the difficulty in incorporating it later, while designing to include validation from the start costs very little and, when properly done can easily be turned off when efficiency or other considerations demand it.

Indeed you did say so previously, but you did not respond to my request for elaboration. In particular:
jcbollinger wrote:What design-level difficulties might arise from addressing validation after designing at least a rough version of the some of the other, more universal API features?

Or else, how might an API design that addresses validation be distinguished from one that addressed only James's list of core features? What makes those distinctions difficult to add after the fact?

Lest there be any confusion, by a "design" I mean roughly function and data type form and behavior specifications, including function prototypes or an equivalent, but excluding implementation code. If we would indeed be risking later difficulties by holding off on validation considerations then I surely want at minimum to understand the risk. I'm not seeing it, however, so please enlighten me.
jcbollinger
 
Posts: 57
Joined: Tue Dec 20, 2011 3:41 pm

Re: Scope of the API - features

Postby yayahjb » Mon Jan 09, 2012 8:43 pm

jcbollinger wrote:
What design-level difficulties might arise from addressing validation after designing at least a rough version of the some of the other, more universal API features?

Or else, how might an API design that addresses validation be distinguished from one that addressed only James's list of core features? What makes those distinctions difficult to add after the fact?

Having made the mistake of doing a non-validating CIF API and then adding validation to it -- I would say the biggest issues start in the design of the lexer and parser which need to be designed to allow for recovery after an error, rather than a hard abort in order to facilitate reporting of meaningful multiple errors in each pass rather than driving the user nuts with one error at a time. This relates to the need to design in a simple and effective error reporting/logging mechanism. Those are actually fairly general software engineering issues for any language. For CIF in particular, the most important design feature is to provide dictionary support. This has a strong impact on the design of the API because the dictionary format is somewhat different from the data CIF format. Less critical for the small molecule community is the need to cope sensibly with mixed DDL1/DDL2 data -- even if you intend to treat such mixing as an error, the API works better for users if you plan the hooks to tell the user what they did, rather than producing cryptic aborts.

There is more, but I think you get the point -- a design goes better if you plan ahead.
yayahjb
 
Posts: 18
Joined: Sun Sep 11, 2011 9:54 pm

Re: Scope of the API - features

Postby jcbollinger » Mon Jan 09, 2012 10:40 pm

Thank you, Herbert, I appreciate your insight.
yayahjb wrote:Having made the mistake of doing a non-validating CIF API and then adding validation to it -- I would say the biggest issues start in the design of the lexer and parser which need to be designed to allow for recovery after an error, rather than a hard abort in order to facilitate reporting of meaningful multiple errors in each pass rather than driving the user nuts with one error at a time. This relates to the need to design in a simple and effective error reporting/logging mechanism. Those are actually fairly general software engineering issues for any language.

That's an excellent point, and one that I'm all too prone to overlook despite having run into it before, both in CIF and in general context. Truly, validation considerations magnify the importance those issues.

It still seems like we're not quite connecting, however, for it is my fervent hope that the group never has need to discuss the details of a CIF lexer in this forum. The design specifications I am trying to reach, at least initially, will be primarily for the functions and data structures that programs using the API will touch -- that is, the public interface. Logging and error recovery are relevant there, to be sure, but even if we did not address them before considering validation, I don't see there being enough specifications altogether to make additions and changes an onerous task at the point where I would like to take that up.
yayahjb wrote:For CIF in particular, the most important design feature is to provide dictionary support. This has a strong impact on the design of the API because the dictionary format is somewhat different from the data CIF format. Less critical for the small molecule community is the need to cope sensibly with mixed DDL1/DDL2 data -- even if you intend to treat such mixing as an error, the API works better for users if you plan the hooks to tell the user what they did, rather than producing cryptic aborts.

I am very much hoping that we can come up with a design that localizes the differences between dictionary formats to a smallish number of routines for each DDL, but that's beside the point at the moment. Dictionaries and DDLs are purely validation considerations, and I can't see how the public interface for any "core" function would need to differ much with which DDLs were supported by the validation subsystem.
yayahjb wrote:There is more, but I think you get the point -- a design goes better if you plan ahead.

Oh, I certainly get the point. Design is planning ahead, and that's just what I'm trying to do. I spent several years as a professional software architect for a profitable consultancy, and more moonlighting as a freelance designer, so I am no stranger to the concept. I am also, however, trying to make us as productive as possible by keeping our attention as focused at any given time as it is feasible to do. Moreover, I know from repeated, sometimes bitter experience that incremental design stands a far better chance of overall success than does trying to capture an entire system all at one go.

I hope that the limited scope of what I'm trying to do before attending to validation will allay your concerns. If not, then I'm sure I can trust you to continue to make those concerns known to the group.
jcbollinger
 
Posts: 57
Joined: Tue Dec 20, 2011 3:41 pm

Re: Scope of the API - features

Postby jamesrhester » Wed Jan 18, 2012 12:34 am

yayahjb wrote:As previously noted, I think failing to make allowances for validation in the initial design will greatly increased the difficulty in incorporating it later, while designing to include validation from the start costs very little and, when properly done can easily be turned off when efficiency or other considerations demand it.


I agree with Herbert insofar as I think it is worth keeping in mind the question "Would this change if we were validating?" as we move forward. I disagree with Herbert insofar as I think the answer is almost always "No" in a well-designed system.
jamesrhester
 
Posts: 39
Joined: Mon Sep 19, 2011 8:21 am


Return to CIF Application Programming Interface

Who is online

Users browsing this forum: No registered users and 1 guest

cron