I have completed initial implementation of the CIF formatter (function cif_write() in the API docs), and I am now working on the parser. A preliminary version of the scanner is in place, and the rest of the parser is roughed in, but there is still a lot of work to do.
The decision to define CIF 2.0 as UTF-8-only both simplifies and complicates these bits. On one hand, it is a great simplification to avoid encoding detection and adaptation issues. On the other hand, CIF 2.0 is now fundamentally a binary format, even though it can do a pretty good imitation of a text format on many systems. To write parsing code that is portable to systems whose default character encoding is not ASCII-compatible (e.g. EBCDIC variants), however, all characters need to be identified by their Unicode code point values. C Char literals such as '_' are are not portable for this purpose, because they are interpreted according to the compiler's default encoding. I think it is still possible to use standard tools such as lex/flex and yacc/bison, but care is necessary in defining syntax rules and perhaps grammar rules.
Note also that this binary vs. text distinction can constitute an incompatibility between CIF 1.1 and CIF 2.0, depending on how you interpret the CIF 1.1 specs. If you interpret CIF 1.1 as specifying an encoding-agnostic text format, as some have consistently and persuasively done, then there are systems on which no well-formed, non-trivial CIF 1.1 file is well-formed CIF 2.0 (relative to the current specs as I understand them), and vise versa. I can and will address this issue in the API implementation, so that it is as transparent as possible to users, but if there are any objections or discussion then I would prefer to hear it now, before I do the work.
Status update
Forum for CIF developers to define an application programming interface for CIF software.
Moderators: Brian McMahon, jcbollinger
-
- Posts: 57
- Joined: Tue Dec 20, 2011 2:41 pm
Return to “CIF Application Programming Interface”
Jump to
- Executive Committee
- ↳ Journals review committee
- ↳ IYCr steering committee
- Commissions
- ↳ Crystallographic Nomenclature
- ↳ Biological Macromolecules
- IUCr journals
- ↳ Journal Editors and Section Editors
- ↳ Acta A Co-editors
- ↳ Validation and publication
- International Tables for Crystallography
- ↳ Volume H: Powder Diffraction
- ↳ Volume H planning
- ↳ Volume B: Reciprocal Space
- ↳ Volume B/C planning
- Standing Committees and Working Groups
- ↳ Diffraction data deposition
- ↳ Consultation on diffraction data deposition
- ↳ Public input on diffraction data deposition
- ↳ Description of Nanomaterials
- ↳ Committee on Data
- ↳ Public input to CommDat
- Crystallographic Information Framework
- ↳ CIF Application Programming Interface
- ↳ CIF dictionary namespace conventions
- ↳ NeXus HDF5 CIF convergence
- ↳ Core CIF review and update
- ↳ CIF2.0
- Sandbox
- ↳ Online dictionary tests
- ↳ Testing
- ↳ test