DataScript is a language to describe and manipulate binary data formats as types.

DataScript consists of two components: a constraint-based specification language that uses DataScript types to describe the physical layout of data and a language binding that provides a simple programming interface to script binary data. A DataScript compiler generates libraries that are linked with DataScript scripts. In short, DataScript is a yacc/lex for arbitrary binary data.

For a description of the idea and some motivation, please read sections 1 and 2 of

"DataScript - A Specification and Scripting Language for Binary Data", Godmar Back, Proceedings of the ACM Conference on Generative Programming and Component Engineering Proceedings (GPCE 2002), published as LNCS 2487. ACM. Pittsburgh, PA. October 2002. pp. 66-77. [PDF file, PostScript file].

Currently (May 2003), this is a version of the software that I didn't plan to release; but since people asked for something to play with, and to promote the cause, I decided to put it up on sourceforge. I plan to be very liberal in accepting contributions and giving write access to the repository.

The current compiler is nothing more than proof of concept, if that. It implements all features described in the paper below, except for offsets on write. The current Java language-binding is ad hoc at best and definitely subject to change.

People can contribute to this project in many ways, such as:

To the sourceforge project page

-- Godmar Back (


Disclaimer: DataScript is not related to any of the following enterprises or products from other domains:

Last modified June 19, 2003

$Id: datascript.html,v 1.2 2003/06/19 20:59:23 gback Exp gback $