Gazelle

a system for building fast, reusable parsers


Speedy Gazelle

Gazelle is a system currently in development for parsing context-free grammars, especially programming languages. It takes inspiration from yacc/bison and ANTLR, but seeks to take these ideas to the next level.



Why another parser generator? What does Gazelle bring to the table?

reusable parsers
Unlike existing parsing tools, Gazelle puts a major focus on making grammars reusable. Grammars written using Gazelle should be usable out-of-the-box for anything from a compiler to the syntax-highlighting in a text editor. Making grammars reusable will let the open source community cooperate to write and maintain a set of canonical parsers for all popular languages, so that you can use off-the-shelf parsers instead of reinventing the wheel.
language agnostic
Many existing parsing systems are tied to one specific host language (for example, Python has PyParsing, Ruby has Treetop, etc). Grammars written using these systems are useless in the case that you want to invoke these parsers from a different language. This hinders reusability and wide adoption.

Gazelle parsers can be used from any language by writing bindings for the Gazelle C Runtime. You only need to write these wrappers once — once they're written, you can parse any language that has a Gazelle grammar. And because the runtime is in C, you get very fast parsing, even if you are parsing from a high-level language like Ruby that wouldn't be a speed demon on its own.
the fastest parsing money can buy
Gazelle uses fast algorithms, keeps its memory footprint minimal, and avoids memory allocation/deallocation wherever possible. There are plans to write a JIT (likely using DynASM) that takes full advantage of SSE, which lets you do things like 16 bytewise comparisons in a single instruction.
integrated lexing and parsing
The Gazelle grammar contains all necessary lexing and parsing information in a single unified grammar description language. There is no need to create and maintain a separate lexer. Any tricky work you would otherwise need to do in the lexer (for example, using lexer states) is automatically inferred from the grammar by Gazelle.
grammars compile to bytecode, not C
The Gazelle compiler takes the grammar description and compiles it to extremely compact bytecode. The bytecode can then be loaded by the Gazelle runtime (which is written in compact C) to do the actual parsing. At no time during grammar development is a C compiler required.
modular grammars
Embedding one language inside another is quite common these days, especially with templating languages like PHP or RHTML that embed an imperative language inside HTML. Gazelle will let you directly express the idea that RHTML is just HTML with Ruby in between these special tags. And because Gazelle integrates lexing and parsing, you don't have to worry about writing a lexer smart enough to know when to switch between languages.
lightweight, embeddable compiler
The compiler will be available as a library, just as the runtime is. This means that users of dynamic languages can opt to load the grammar in text form directly, thereby eliminating a compile step of any kind.
extremely flexible runtime
You want an abstract syntax tree? You got it. You want to do blazingly fast event-based parsing, without incurring the cost of building an AST? No problem. You want to do whitespace-preserving transforms by walking a parse tree? At your service. You want to do syntax highlighting, by getting the grammar data for each character of the input? Sure thing! The runtime gives you just as much support as you need, without getting in your way.

Another Angle

There are an uncountable number of parsing frameworks out there. Too many. Gazelle seeks to bring some unity to this landscape through two main goals:

If these goals are accomplished, then fast, complete, correct parsers for common languages can become a commodity, like they ought to be. No one should have to "roll their own" unless they are parsing something very new or unusual.

Status

Starting with 0.2, I feel that Gazelle is in a state where a person could reasonably tinker with it. It is still missing some major features that prevent it from being useful for real work, and all of the APIs and the language are still subject to change. But I welcome feedback, and I plan to start making regular, incremental releases.

For the most recent status updates, please visit the "Gazelle" category of my blog.

Documentation

The latest version of the manual is available online, and it contains a great deal of information about Gazelle. It is the best place to find out what Gazelle is all about.

You can also check out a visual grammar dump of JSON, which the latest release is capable of producing.

Download

Gazelle is free software, released under the BSD license. The most recent release is 0.2, released on June 29, 2008.

Gazelle uses Git for source control. View the public repository (note: this has moved from repo.or.cz to GitHub).

Contact

For questions or comments, please post to the gazelle-users Google Group. I read and respond to posts made on this list.

Gazelle is written by Joshua Haberman. To reach me directly, email joshua@reverberate.org.


If you were looking for the Gazelle Movie Editor, its homepage is here.
Thanks to the Bradshaw Foundation for the Gazelle pic.