[pypy-dev] Blog / Parser
Toby Watson
toby at thetobe.com
Fri Jan 11 12:19:08 CET 2008
Hi,
I've just read the blog post, "Visualizing a Python tokenizer" and it
reminded me of this:
"OMeta: an object oriented language for pattern matching"
http://www.cs.ucla.edu/~awarth/papers/dls07.pdf
OMeta is an extension and generalisation of the idea of PEGs*. It
provides a nice way to describe a language both at the character level
(tokens), the grammar itself and productions into the AST. Finally the
grammars are extensible (possibly from within the language itself).
The implementation is discussed in "Packrat Parsers Can Support Left
Recursion" and there is some discussion of the performance there. http://www.vpri.org/pdf/packrat_TR-2007-002.pdf
I wonder whether the same idea behind PyPy can be applied to the
grammar. Write a program in some language (a python version of OMeta
for instance) which is then transformed by the translator, compiler,
or JIT into something that runs fast.
What could be nice about this is bringing the tokenising and parsing
closer in spirit to the heart of PyPy, writing 'nicer' code, and
providing a (I think tantalising) way to try new syntax going forward.
And there are things to play with on this page:
http://www.cs.ucla.edu/~awarth/ometa/ometa-js/
* Parsing Expression Grammar
With regard to railroad diagrams (I think that's what they're called):
There used to be a script that generated them - it's mentioned at the
top of the python grammar file, and here http://www.python.org/search/hypermail/python-1994q3/0294.html
But I've seen discussion elsewhere that it has been lost :(
How about this? http://www.informatik.uni-freiburg.de/~thiemann/haskell/ebnf2ps/README
cheers,
Toby
More information about the pypy-dev
mailing list