Ghadi Shayban | 30 Nov 17:32 2015

Pex, a new PEG parsing library

Here is a compliant JSON parser in 99 LOC, implemented with pex, a new parsing library. [1]

I'm making public an early version of pex [2], a parsing library in the PEG family.  This is alpha software.  For those of you familiar with Lua's LPEG library, this is similar, but with a Clojure-y twist.  Like LPEG, pex uses a virtual machine to parse rules (as opposed to combinators or packrat.)  Pex operates on traditional character types (as opposed to generic sequences/data structures).

Here is another tiny example grammar to match floating point numbers:

(def Number '{number  [digits (? fractional) (? exponent)]
              fractional ["." digits]
              exponent   ["e" (? (/ "+" "-")) digits]
              digits     [(class num) (* (class num))]})

The only other input this particular grammar needs is to let pex know what a `num` character class is.  (There is an interface that you can implement to match things, and several helpers.  I'm planning to have several common ones out of the box.)  Well, you also need to tell the grammar compiler what rule to start with (number).

The grammar format has user defined macros which let you hide a lot of boilerplate, or make higher order rules.  For example, it's very common to chew whitespace after rules, so hiding that is useful.  There are also captures and actions that operate on a virtual "Value Stack".  For example, while parsing a JSON array, you push all the captured values from the array onto the stack, then reduce them into a vector with an action.

It's very early, but pex's completely unoptimized engine can parse a 1.5MB file in ~58ms vs ~9ms for Cheshire/Jackson, which is a handwritten highly-tuned JSON parser with many thousands of lines of code behind it.  I plan on closing that gap by a) implementing some of LPEG's compiler optimizations and b) improving some of the terribly naive impls in the parser. The win here is high expressive power per unit of performance, not raw performance... 

Internally, the grammar data structure is analyzed, compiled into special parsing bytecode, and then subsequently run inside a virtual machine [3].

Hope you can find this useful in your data munging endeavors.  Next up is to make CSV & EDN example parsers, tune the performance, make grammar debugging better, and write more docs & tests.  I encourage any feedback.

PEG mailing list
PEG <at>
Hayden Livingston | 10 Sep 01:15 2015

your favorite parser for java or c++?

I'm looking for a PEG Parser that has semantic actions for things that
require some sort of lookup.
Sérgio Medeiros | 9 Sep 20:57 2015

Re: Left associative left/right recursion?

On Tue, Sep 8, 2015 at 5:43 PM, Nikita Nemkin <nikita <at>> wrote:
> Hi Sérgio,
> thanks for your answer (and your awesome paper).
> The idea and the goal of precedence checks is very clear. I'm just
> missing one detail.
> I'm trying to make the grammar "E = E + E / n" left-associative without
> extra annotations, i.e. by defaulting ALL precedences to 1.
> Is this even possible with alternative "left biased" versions of lvar.4/lvar.5 ?
> Following the example grammar on input "n+n":
> 1. top level E is entered, it seeds recursion memo table
>         L = {(E, "n+n") -> fail}
> 2. top level E executes it's body, the left E fails and we only get n
> as a result.
> 3. top level E memoizes the resulting n with the default precedence 1:
>         L = {(E, "n+n") -> ("+n",E[n], 1)}
> 4. HERE is the issue: top level E tries to grow the recursion bound,
>     it executes it's body again, but the left E fails precedence check!
>     All precedences are 1 and modified lvar.5 applies (k <= k')
>     The bound doesn't grow and we're ruined before we even get to
> right recursive part.

Hi, Nikita.

I think you are right, the change in rules lvar.4 and lvar.5 suggested in
the paper is wrong, it seems to give a parser that always fails when trying
to increase the bound of a left-recursive rule. I am sorry about that. I think
we thought the change was so simple that we did not implement/try it.

> It could work, if top level E started with a lower precision, like 0.
> But then, in a real
> multi-rule grammar it would mean manual precedence assignment...
> If left associativity is impossible without extra annotations (or
> tricky backstage
> precedence assignment algorithms), I'd rather make all precedences explicit.
> The mechanism itself is fully general and really powerful.

If you always want left associativity, one way to achieve this is to
use a map M, where the keys are variables, and the associated value
is a suffix of the input or an index of the input.

The idea is to save in this map the input position X where you first used
variable V. You should keep this information in the map while tyring to
increase the bound of V. This should be done in rules lvar.1 and lvar.2

Given a variable V, to apply rule "lvar.4" in position Y you should see if Y
is not greater than map[V] (actually I think you only need to check
whether map[V] == Y). When Y is greather than map[V], this means
that you are trying to apply a right-recursive incarnation of V.

You should use L as it was used before the introduction of precedence levels.

I am sending attached a draft of this new semantics.
I have tested it :-)

If you also want precedence levels, I think you should adapt rules
lvar.5 and lvar.4, so rule lvar.4 would be applied only when the condition
related to L and the condition related to M are satisfied. The condition
related to L is the same that appears on p.15 (Figure 3).
I hope this works well for you.


P.S.: I am sending this mail to the PEG list too.

PEG mailing list
PEG <at>
Nikita Nemkin | 7 Sep 23:27 2015

Left associative left/right recursion?


I wonder if someone could help me with a certain assertion made in this
paper: (Medeiros, Mascarenhas,
Ierusalimschy. "Left Recursion in Parsing Expression Grammars" v3).

The last paragraph on p.14:

> We could impose left-associativity as the default in case of a grammar that
> mixes left and right recursion just by changing k ≥ k' in rule lvar.4 to k > k'
> and by changing k < k' in rule lvar.5 to k ≤ k'. The disadvantage is that this
> makes specifying precedence levels with right-associative operators a little
> harder

I can't see how it's supposed to work. Consider this grammar:

    E = E[1] "+" E[1] / "n"

where [1] is precedence and the initial precedence is also 1.

Default precedence check is "fail if current rule precedence <= memoized
precedence," leading to right associative parse tree, as expected.

Modified precedence check is "fail if current rule precedence < memoized
precedence" and it does NOT lead to left associativity. Instead, recursive
E[1] invocation always fails due to precedence check, reducing the grammar
to E = "n" "+" "n" / "n".

General algorithm producing left association by default would be much
preferred from a usability perspective, if only it was possible.


PEG mailing list
PEG <at>
Toby Goodwin | 6 Aug 21:08 2015

pacc-0.2 released

Can I join the "new version of <peg parser generator>" party? :-)

A few days ago I released pacc-0.2 which is by far the best version yet,
with new features and several significant bug fixes since the previous
release, as well as improved documentation.

Pacc is a parser generator. It reads a PEG grammar written in a carefully
designed dialect based on Ford's pappy. It writes a packrat parser in C. 

One unique (I believe) feature of pacc is that it supports interactive
use. There is a flag which causes pacc to write two parsers: the usual
one which evaluates complete valid sentences of the target language, and
a second parser which recognises (but does not evaluate) prefixes of
sentences. It is easy to combine these two parsers with a wrapper
program that can read a line from the user, and if necessary prompt for
further input.

Here's a sample session with the trivial pacc-built calculator described
by this grammar[1]. Each complete input is terminated with "=".  The
initial prompt is "$ ", and the continuation prompt is "> ".

  $ 5 + 4 =
  $ 5 +
  > 4 =
  $ 5  + +
  1:5: expected Sum

Pacc's website is

Pacc is also on github at

I will be delighted to answer any questions you may have by private
email, or on github, or here on the list if they are of wider interest
to the PEG community.


Hui Zhou | 3 Aug 23:00 2015

Imperative Parser

Hi PEG list,

Recently I tried writing some parsers using a refactoring tool (a 
meta-layer) of my own. Without a better name, I refer to the result -- 
imperative parser:

I am not really been following up with academic computer science, so only 
afterwards (today) I learned about PEG. My imperative parser shares 
with PEG in that the order of parsing (rules) matters, no ambiguity 
(as a limit of method), and linear complexity. I refer to it 
imperative parser to contrast against traditional parser which starts 
with CFG, therefore fundamentally declarative.

With the help of MyDef (a meta-layer of my own invention), I was able 
to have the code well organized to achieve semantic separation and 
extensibility (IMHO).

The basic algorithm was certainly not new. It is essentially operator 
precedence based shift-reduce parsing for expression and top-down 
context dependent parsing for statements (I am not sure about the 
latter, but it is definitely top-down semantically). However, this 
mixed approach seems novel from my limited search.

Anyway, as I am not really been following academic, I am not exactly 
sure where it stands or whether this is anything new at all. Therefore 
this post to PEG list, seeking feedbacks.

Thanks for attention,


Hui Zhou
Juancarlo Añez | 2 Aug 14:32 2015

New version of Grako (PEG in Python)

There have been several releases of Grako since I last reported here.

This is the latest:

 FileTypePy VersionUploaded onSize
grako-3.6.3.tar.gz (md5) Source 2015-08-01 105KB
  • Downloads (All Versions):
  • 296 downloads in the last day
  • 1233 downloads in the last week
  • 3724 downloads in the last month

    Juancarlo Añez
    PEG mailing list
    PEG <at>
    Roman Redz | 1 Aug 18:15 2015

    New version of "Mouse"

    Just released version 1.7 of PEG parser generator "Mouse": .
    The package contains PEGs for Java 1.8 and C.
    All comments welcome.
    PEG mailing list
    PEG <at>
    Daniel Frey | 1 Aug 16:23 2015

    PEGTL v1.1.0 released

    A new version of the C++11 header-only Parsing Expression Grammar Template Library (PEGTL) has been
    released. You can find the PEGTL at
    Any feedback welcome.
    PEG mailing list
    PEG <at>
    Roman Redz | 16 Jul 15:55 2015

    Two papers

    Just to inform you that my paper "More about converting BNF to PEG" appeared in Fundamenta Informaticae 133, 2-3 (2014), pp. 257-270.
    Another paper, "Cut points in PEG" is due to appear later this year.
    Both are available from .
    PEG mailing list
    PEG <at>
    Henri Tuhola | 29 Apr 00:04 2015

    Is this correct implementation of GLL, or something entirely else?

    Before writing this, I studied various papers about GLL, such as GLL Parsing and "Modelling GLL Parser Implementations". Also read the paper that described parser combinators of this kind and took a peek into djspiewak/gll-combinators that was written in scala.

    I found the project written in scala very hard to read, and the papers weren't any much better. But I understood something about the description.

    I wrote a simple recognizer that appears to work. It prints out state transitions similar to what LR parser might go through with:

    For someone who have actually understood the papers better, is my implementation even near of what GLL is? What is it if it's not?
    PEG mailing list
    PEG <at>