### Re: Breaking the Mizushima(2010) cut operator

Marcus Brinkmann <marcus.brinkmann <at> ruhr-uni-bochum.de>

2014-08-11 22:22:50 GMT

Hi,
I reported this in August 2013 to the grako bug tracker:
https://bitbucket.org/apalala/grako/issue/17/cut-seems-overzealous
I think you will find that your example is similar to mine.
The cut operator in grako was implemented like you describe. But note
that this is ___not___ the Mizushima cut operator. The Mizushima paper is
very careful to limit application of the cut operator to an empty
backtracking stack, which avoids the problem: "Generally, when the
backtracking stack of a parser is empty at position n, the parser can
discard all regions in the memoization table whose column indices are
less than n."
Later the Mizushima paper gives a negative example: "Because the
backtracking stack is not empty and the position 0 is pushed to the
stack, the parser cannot discard space for memoization at this point."
You ignored the condition (empty backtracking stack), and thus your
example showed non-linear behavior. It is an easy mistake to make.
When I reported this to Juancarlo, he told me (again, see the
bugtracker): "When I read Kota's paper I empirically found that deleting
the memoization cache left speed unchanged, but dramatically decreased
memory use, so I left it in in the know that swiping the whole cache was
overdoing it :-"
As I wrote at the time, it seems difficult (at least for me!) to come up
with a proper example that actually shows non-linear behaviour on
arbitrary long input (rather than a small example grammer that only
accepts a finite set of sentences). If you succeed with that, I'd be
interested! I described my difficulties like this: "It's surprisingly
hard to come up with this example, and I can't find an example that has
non-linear runtime on the input, as the cutting rules themselves are
memoized, and thus the memoization deletions are in some sense
"self-healing."" and "I can not prove that this can break the linear
time promise of PEG parsing, and at the moment I suspect that due to the
fact that the number of cuts in any grammar is constant and independent
of the input length, the linear time promise is still preserved.
However, I can't prove that either :P"
Thanks,
Marcus