Requirements for "core" features

Forum for CIF developers to define an application programming interface for CIF software.

Moderators: Brian McMahon, jcbollinger

Post Reply
jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Requirements for "core" features

Post by jcbollinger » Mon Jan 09, 2012 3:42 pm

In the "Scope - Features" topic, James characterized the core features of a CIF API like so:

(1) Open, read, write, close a CIF file
(2) Read, write a key-value pair
(3) Open/Create a loop structure
(4) Read/Write packets from a loop structure
(5) Add/remove columns from a loop structure

Here, I hope to tease those apart a bit to establish some specific initial definitions and requirements for these functionalities. Any requirements that we establish at this point are of course subject to later re-evaluation, but I hope that by agreeing on requirements, we can reduce later conflicts over API details. I guess we'll see how well that turns out.

yayahjb
Posts: 18
Joined: Sun Sep 11, 2011 9:54 pm

Re: Requirements for "core" features

Post by yayahjb » Mon Jan 09, 2012 4:08 pm

I would suggest two separate efforts:

1. A simple event-driven parser/writer for applications requiring a light footprint; and
2. A full stored tree/database based parser for maximal functionality

In either case, we need to be able to simultaneously open multiple CIFS and dictionaries
for read/write/update processing, and we need to have some sort of support for
C, C++, Fortran, Fortran-95, Python and Java, either directly or via wrappers.

Before going much further, I would suggest reviewing the existing APIs to look for useful
features and vexing problems, so this effort represents an overall improvement in the
state of support for CIF in the community. I believe Brian has a fairly good list on his
website we could use as a starting point. Is anyone aware of other CIF APIs that
are not on Brian's list?

One interesting issue that arose in the development of CBFlib was the addition of
easy-to-use higher level functions packaged along with the API. If the "core" is too
minimal, users may have trouble learning to make use of it.

jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Re: Requirements for "core" features

Post by jcbollinger » Mon Jan 09, 2012 6:10 pm

yayahjb wrote:I would suggest two separate efforts:

1. A simple event-driven parser/writer for applications requiring a light footprint; and
2. A full stored tree/database based parser for maximal functionality

That sounds analogous to SAX and DOM parsers in the XML world. Is that the sort of thing you have in mind?
yayahjb wrote:In either case, we need to be able to simultaneously open multiple CIFS and dictionaries
for read/write/update processing,

Agreed.
yayahjb wrote:and we need to have some sort of support for
C, C++, Fortran, Fortran-95, Python and Java, either directly or via wrappers.

By "Fortran" do you mean Fortran 77? I think that might be reasonable, and the other suggested targets definitely so. Lest separate, independant API implementations be required in each language, however, I think that's pointing to a C base implementation with wrappers / bindings for other languages.

My largest concern with supporting Fortran 77 is that it does not have derived data types (though many implementations offer them as an extension). If we stipulate that a requirement to support Fortran 77 will not interfere with using user-defined data types in other languages' API faces, then I will support Fortran 77 as a target for API wrapper specifications.

Nevertheless, although I accept that it is relevant to consider the other-language bindings we will eventually need to define, I intend initially to focus our work as much as possible on the central target, which I'm now supposing will be C.
yayahjb wrote:Before going much further, I would suggest reviewing the existing APIs to look for useful
features and vexing problems, so this effort represents an overall improvement in the
state of support for CIF in the community.

This is just the place to consider such things, and it is not to soon to start doing so. Our consideration of the general requirements is at such an early stage, however, that I don't think it yet needs to be forestalled by analysis and criticism of the existing alternatives.

It will remain an open question, but does anyone have notable features or vexing problems with any of the existing libraries that they would like to raise now?

I shall also solicit comments on the cif-developers list, and perhaps others can suggest where else we should seek opinions. Unless the proposal is that we conduct a de novo analysis instead of merely relying on existing users' experiences?
yayahjb wrote:I believe Brian has a fairly good list on his
website we could use as a starting point. Is anyone aware of other CIF APIs that
are not on Brian's list?

No publicly-available ones spring to mind.
yayahjb wrote:One interesting issue that arose in the development of CBFlib was the addition of
easy-to-use higher level functions packaged along with the API. If the "core" is too
minimal, users may have trouble learning to make use of it.

I am entirely open to considering proposals for such extended features, but at the same time I am wary of falling prey to scope creep. Ideas for higher-level functions are welcome, but I may end up favoring reporting them out for a separate effort to address. Personally, I will have to see how many such ideas there are and how meritorious I judge them.

yayahjb
Posts: 18
Joined: Sun Sep 11, 2011 9:54 pm

Re: Requirements for "core" features

Post by yayahjb » Mon Jan 09, 2012 8:11 pm

Strange to say, fortran 77 is still heavily used. Now that there is an
ISO C-binding spec for fortran 2xxx, C can be used to support fortran
reasonably well, but even in that case there are people who absolutely
refuse to mix C and fortran for sound reasons.

SAX and DOM are good illustrations of the idea of even-driven vs. full stored
tree/database based parser. In the CIF world, CIFtbx is an example of the
former and CBFlib is an example of the latter. It also happens that CIFtbx
is written in Fortran and CBFlib is written in C, is directly callable from
C++, is callable from fortran 2xxx vis the ISO C-binding spec, has a heavily used
python wrapper and the beginnings of a java wrapper. The PDB has a large
suite of very useful CIF software. There is much, much more.

The biggest problem with being minimalist and starting from scratch is that
we may end up making things hard for software designers who have existing
CIF aps using the existing packages to move from existing packages to the
new API. I think we should try to make things easy for them, as well
as trying to make things easy for people who are doing an ab initio effort
at using CIF.

Returning to the "core" features. I would suggest considering

(1) Open, read, update, create, close, delete a CIF file with or without supporting dictionaries.
(2) Open, read, update, create, close, delete one or more datablocks withing a CIF file
(3) Open, read, update, create, close, delete one or more saveframes within a CIF dictionary
(4) Open, read, update, create, close, delete one or more categories within a datablock or saveframe
(5) Open, read, update, create, close, delete one or more rows or columns withing a category
(6) Open, read, update, create, close, delete one or more data values at specified rows and columns
(7) Features to view and control layout of a CIF, including whitespace and comments
(8) Utilities for conversion among CIF and other useful formats, such as XML and HDF5
(9) A CIF editor based on all of the above as a demonstration of the use and capabilities of the API

It the API will not support an editor, it will have trouble supporting a wide range of aps.

jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Re: Requirements for "core" features

Post by jcbollinger » Mon Jan 09, 2012 10:49 pm

yayahjb wrote:Strange to say, fortran 77 is still heavily used. Now that there is an
ISO C-binding spec for fortran 2xxx, C can be used to support fortran
reasonably well, but even in that case there are people who absolutely
refuse to mix C and fortran for sound reasons.

I have seen a lot of Fortran 77, and I have no problem with it. My point about it is that I want to avoid defining seriously non-idiomatic interfaces to other languages for the purpose of easing the definition or implementation of a Fortran 77 interface. It is quite possible to do mixed Fortran / C programming independently of F2K's C interop features (CCP4 has been doing it for years), but in truth, I don't care if someone tasked with the job prefers to implement the proposed Fortran 77 API bindings from scratch instead of just writing wrappers for the C interface I think we have settled on.
yayahjb wrote:SAX and DOM are good illustrations of the idea of even-driven vs. full stored
tree/database based parser.

I'm glad to hear we understanding one another.
yayahjb wrote:In the CIF world, CIFtbx is an example of the
former and CBFlib is an example of the latter. It also happens that CIFtbx
is written in Fortran and CBFlib is written in C, is directly callable from
C++, is callable from fortran 2xxx vis the ISO C-binding spec, has a heavily used
python wrapper and the beginnings of a java wrapper. The PDB has a large
suite of very useful CIF software. There is much, much more.

Surely CIFlib is among the libraries that we would want to evaluate against our requirements, once they are complete.
yayahjb wrote:The biggest problem with being minimalist and starting from scratch is that
we may end up making things hard for software designers who have existing
CIF aps using the existing packages to move from existing packages to the
new API. I think we should try to make things easy for them, as well
as trying to make things easy for people who are doing an ab initio effort
at using CIF.

I think you mistake my intentions. I do not especially intend minimalism of the API, and we have not even considered wether to start from scratch. I am attempting to lead us through determining our requirements and general specifications for the as-yet vague concept of a "CIF API". I am attempting to employ an incremental approach that I anticipate will work well. How can we decide whether any library out there meets or comes close enough to our requirements when we cannot (yet) document what those requirements are?

jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Re: Requirements for "core" features

Post by jcbollinger » Mon Jan 09, 2012 11:31 pm

yayahjb wrote:Returning to the "core" features. I would suggest considering

(1) Open, read, update, create, close, delete a CIF file with or without supporting dictionaries.
(2) Open, read, update, create, close, delete one or more datablocks withing a CIF file
(3) Open, read, update, create, close, delete one or more saveframes within a CIF dictionary
(4) Open, read, update, create, close, delete one or more categories within a datablock or saveframe
(5) Open, read, update, create, close, delete one or more rows or columns withing a category
(6) Open, read, update, create, close, delete one or more data values at specified rows and columns
(7) Features to view and control layout of a CIF, including whitespace and comments
(8) Utilities for conversion among CIF and other useful formats, such as XML and HDF5
(9) A CIF editor based on all of the above as a demonstration of the use and capabilities of the API

It the API will not support an editor, it will have trouble supporting a wide range of aps.


I'm about to be rather nitpicky; please don't take it personally. Anyway, one or two of these comments apply similarly to James's list.

Regarding (1):
What is the significance of opening a CIF file that is separate from reading or writing it?
Is there any reason to have a close CIF function, other than that a standalone open CIF function is proposed?
Why does our API need to address deleting a CIF file? Is that not adequately covered by the OS and / or standard library?

Regarding (2):
What is the intended meaning of opening, reading, or closing a data block, independant of file-level operations?
What is the intended meaning of updating a data block, independent of both file-level and lower level (3 - 6) operations (i.e. what's changes)?

Regarding (3):
What is the intended meaning of opening, reading, or closing a save frame, independant of file- and block-level operations?
What is the intended meaning of updating a save frame, independent of both higher-level (1-2) and lower level (4 - 6) operations (again, what changes)?
Does the specifcation that these operation apply to a CIF dictionary consitute anything more than a simple recognition that only dictionary CIFs, not data CIFs, are permitted to contain save frames?

Regarding (4):
"Category" is a DDL concept, not a CIF concept. Is it intended in this context to mean anything different than what I might describe as a "loop"?
What is the intended meaning of opening or closing a category?
What is the intended meaning of reading or updating a category, independent of the higher-level and lower-level operations?

Regarding (5):
What is the intended meaning of opening or closing rows or columns within a category?

Regarding (6):
What is the intended meaning of opening or closing data values at specified coordinates?
What is the intended meaning of creating or deleting data values at specified coordinates, independent of creating or deleting rows and columns?

Regarding (7):
What makes this consideration independent of the previous ones, especially (1)?

Regarding (8) and (9):
These are not CIF API features at all, they are independant programs. It might be reasonable and even wise for such programs, based on our API, to be distributed with API implementations, but they are in no way appropriate to include in the core API requirements.

General:
How do these requirements address unlooped data? Is that subsumed into the concept of "category"?

yayahjb
Posts: 18
Joined: Sun Sep 11, 2011 9:54 pm

Re: Requirements for "core" features

Post by yayahjb » Tue Jan 10, 2012 12:53 am

Dear John,

This discussion is not converging. You assert you are not being minimalist, but are concerned about "scope creep". This is not some abstract discussion -- it is about the design of a real tool for people to use for real work, and the work they need to do with the tool should determine the scope.

Now to your questions. I will try to be brief by referencing easily available examples.

The meaning of open depends on whether one is doing an event-driven parser or a tree/database-based parser. In an event driven parser, open is a separate action from reading. See CIFtbx for an example. In a tree/database-based parser, open is combined with read. The distinction for data blocks, save frames, categories, columns, rows and value depends on the services being provided by the API which have a major impact on what aps can be efficiently supported by the API -- i.e. what tree-traversal tools will be provided. See CIFlib abd CBFlib for interesting examples.

The meaning of update is again a service support issue. An update can be simulated by full CIF read followed by full CIF write and then full CIF reread, but many aps, such as editors, require finer-grained services. With DDLm category (which is essentially a relation) has become the best terminology to use to cover DDL1, DDL2, especially because DDL1 makes the very strange distinction between looped and unlooped presentations of what is essentially the same information.

Failing to provide test cases, such as an editor, with the API, would be a major mistake. A new API without complete working examples of its use can be very hard to understand and, even worse, almost impossible to validate on a new platform.
Most importantly, failure to plan specific aps to be implemented as part of the API development is likely to result in major holes in the API. An alternative to developing aps would be to take existing open course aps and provide versions of thise aps that have been converted to use of this API. Perhaps we should do both.

In an earlier message you said, "Our consideration of the general requirements is at such an early stage, however, that I don't think it yet needs to be forestalled by analysis and criticism of the existing alternatives." I disagree. Nothing is forestalled by looking at the current reality. Most productive science starts with a sound review of the state of the field. Then you look for ways to improve, extend or change it. Reinventing the wheel is usually an inefficient, ineffective and bug-prone way to do science, software engineering and many other things. I always have a hard time getting my students to do their research and planning before they jump in and start implementing. Surely this group has more patience and forethought.

I now have to deal with other urgent matters. I will return to this list in a couple of weeks. Please do not take my silence as agreement or disagreement or lack of concern.

rjgildea
Posts: 3
Joined: Fri Dec 23, 2011 6:34 pm

Re: Requirements for "core" features

Post by rjgildea » Tue Jan 10, 2012 7:42 am

jcbollinger wrote:
yayahjb wrote:I believe Brian has a fairly good list on his
website we could use as a starting point. Is anyone aware of other CIF APIs that
are not on Brian's list?

No publicly-available ones spring to mind.


One piece of software that is missing from the list at http://www.iucr.org/resources/cif/software is our own recently published iotbx.cif that is available as part of the cctbx:

http://dx.doi.org/10.1107/S0021889811041161
http://cctbx.sourceforge.net/iotbx_cif/

jcbollinger
Posts: 57
Joined: Tue Dec 20, 2011 2:41 pm

Re: Requirements for "core" features

Post by jcbollinger » Tue Jan 10, 2012 4:30 pm

I have attempted to distill the requirements that have so far been suggested and that do not seem overly controversial: http://forums.iucr.org/viewtopic.php?f=27&t=73&p=216#p217. I would be obliged if people would consider them and reply with any commentary they have regarding them. For the most part, the requirements relevant to this forum topic (what I have been referring to as the "core" featrues) are in the inner bullet list.

The requirements are intended to cover the functionality, but not necessarily to correlate directly to API functions. Inasmuch as I intend those requirements in part as a tool to aid in evaluating both existing libaries and possibly de novo design proposals, I hope in this way to minimize bias toward specific implementation choices.

jamesrhester
Posts: 39
Joined: Mon Sep 19, 2011 8:21 am

Re: Requirements for "core" features

Post by jamesrhester » Tue Jan 17, 2012 11:22 pm

I think this list is a good starting point and have nothing to add or subtract from it.

Post Reply