jamesrhester wrote:I have finally returned to thinking about the CIFAPI, and would like to discuss the possibility that Richard Gildea's iotbx.cif work could form the basis of an API that we can go forward with. To summarise: cctbx contains an iotbx module which reads in CIF files. The parsing is done using C++ code generated by ANTLR from a grammar file that looks quite similar to the CIF BNF. The nature of the datastructure constructed by this generated parser is left unspecified by using C++ virtual functions that need to be defined in any particular compilation.
I am singularly uninterested in any approach that fundamentally relies on C++. However, inasmuch as one of the points raised in favor of iotbx.cif is that the data structure it yields is defined by the application, I could accept moving forward on the basis of defining the (standard C) data structure to be produced, and / or a set of (standard C) functions for accessing and manipulating it, per your (1) and (2) below. A working library with iotbx.cif as the back end could serve as a reference implementation.
jamesrhester wrote:If we go down this path, the following work appears necessary.
(1) Choosing a standard datastructure
(2) Development of the higher-level functions to manipulate and write the datastructure, as outlined in our requirements.
(3) Adding CIF2.0 support - should be relatively easy given the simplicity of the ANTLR grammar files
Items (1) and (2) are surely the priority. Item (3) is less important, because the API doesn't depend on a specific parser implementation, and also because a lot of development and testing can be performed based on v1.1 CIFs. Item (2), of course, is most of what this group is tasked with devising.
jamesrhester wrote:As a second suggestion, a lot of the work in (1) and (2) may potentially be avoided by using SQLite to handle the CIF file as an in-memory database. Many manipulation functions then have equivalent SQL expressions. This does however create an extra library dependency (670K for libsqlite3 on Linux).
A useful standard for judging footprint would be a flex+bison parser with a naive C datastructure.
I am intrigued by the idea of an SQLite-based approach. It could be very powerful, provided that we can be confident of representing all valid CIF instance documents in a useful relational form. To be able to accommodate CIFs written against an arbitrary dictionary or no dictionary at all, the schemas would need to be very simple -- SQL realizations of the CIF Data Model, in fact. I think those would qualify as useful forms. This idea definitely bears investigation.