Vitaly S. Lugovskiy | 6 Sep 14:27 2014
Picon

MBase 1.0 released under QPL

MBase is a simple metaprogramming framework for .net platform, featuring, among other compiler building blocks, an extensible Packrat+Pratt parser frontend with a transparent left recursion support.


The PEG-related code is in src/l/lib/parsing/ and src/l/lib/pfront/

See (a somewhat outdated) demo here: https://www.meta-alternative.net/pfront.pdf 



_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
mathew palsberg | 4 Sep 19:54 2014
Picon

random program/string generator for PEG

Hello,
I recently built a small language with PEG. I now want to test my language with random generated(not necessarily meaningful) programs. Does anybody know a tool that given a PEG grammar, it generates random sequences that satisfy the given grammar?

Thanks;

_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Mayo Kaze | 14 Aug 03:13 2014

Is there any PEG implementation in ActionScript?

Hi mates:

I'd like to ask is there any PEG implementation in ActionScript? If there isn't I am planing to create it myself.

Kind regards,
TJ
cashin.peter | 13 Aug 17:48 2014
Picon

Serv ed

Pity sugar makes it useless he added taking a sip and shuddering


you have a new message from me

read your video message
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Marcus Brinkmann | 12 Aug 00:22 2014
Picon

Re: Breaking the Mizushima(2010) cut operator

Hi,

I reported this in August 2013 to the grako bug tracker:

https://bitbucket.org/apalala/grako/issue/17/cut-seems-overzealous

I think you will find that your example is similar to mine.

The cut operator in grako was implemented like you describe.  But note 
that this is _not_ the Mizushima cut operator.  The Mizushima paper is 
very careful to limit application of the cut operator to an empty 
backtracking stack, which avoids the problem: "Generally, when the 
backtracking stack of a parser is empty at position n, the parser can 
discard all regions in the memoization table whose column indices are 
less than n."

Later the Mizushima paper gives a negative example: "Because the 
backtracking stack is not empty and the position 0 is pushed to the 
stack, the parser cannot discard space for memoization at this point."

You ignored the condition (empty backtracking stack), and thus your 
example showed non-linear behavior.  It is an easy mistake to make.

When I reported this to Juancarlo, he told me (again, see the 
bugtracker): "When I read Kota's paper I empirically found that deleting 
the memoization cache left speed unchanged, but dramatically decreased 
memory use, so I left it in in the know that swiping the whole cache was 
overdoing it :-"

As I wrote at the time, it seems difficult (at least for me!) to come up 
with a proper example that actually shows non-linear behaviour on 
arbitrary long input (rather than a small example grammer that only 
accepts a finite set of sentences).  If you succeed with that, I'd be 
interested!  I described my difficulties like this: "It's surprisingly 
hard to come up with this example, and I can't find an example that has 
non-linear runtime on the input, as the cutting rules themselves are 
memoized, and thus the memoization deletions are in some sense 
"self-healing."" and "I can not prove that this can break the linear 
time promise of PEG parsing, and at the moment I suspect that due to the 
fact that the number of cuts in any grammar is constant and independent 
of the input length, the linear time promise is still preserved. 
However, I can't prove that either :P"

Thanks,
Marcus
Roman Redziejowski | 23 Jul 14:09 2014
Picon

Breaking cut operator

I believe Ger Hobbelt discovered a problem.

The "cut point", in the sense of a point after which there is no use of backtracking, is the property of a specific selection operator.

Once B in Ger's examples sucessfully consumed A, it indeed does not need to backtrack (as this must fail).
But G in the second example has to backtrack after failing B, even if B passed its cut point. So B has no right to erase the history because it may be used by the containing expression.
So the cut points should somehow be considered in the entire hierarchy of parsing expressions.

I recently wrote a note about cut points, to be presented at the CSP workshop this year. It was inspired by a section in the paper by Maidl et al. about producing error messages; apparently failures in 'no return' areas defined by cut points are those relevant ones.
I missed there the fact that cut points are local.

(My note can be found at http://www.romanredz.se/papers/CSP2014.pdf. Remember: it is preliminary and not peer-reviewed yet.)

Roman
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Ger Hobbelt | 10 Jul 16:14 2014

Breaking the Mizushima(2010) cut operator

L.S.,

I'm looking for feedback = to be corrected about mistakes, any, more or less obvious, because I think I ran into an issue and after having doublechecked several times what I consider to be the cause several times, I can't find any goof(s) in it anymore.

The observation below is written as a fact, but I know sometimes I overlook the obvious elephant in the room. :'-(  And I want to make sure I got this one nailed.

Thanks!

...

# Intro #

Take grammar1:

G = A + B / B               (1)
B = A + C / C               (2)
A = a
C = c 

Matching Inputs:

c                            (1)
a+c                          (2)
a+a+c                        (3)

Application of M's cut operator rules will *not* add a cut (marked as '#') in rule G(1) as the ExtFirst(e) set for both the A+B and B choices will include the 'a' prefix (full common 'first' prefix: 'a+').
It does add an automatic cut operator to rule B(2) as there the union of the ExtFirst of both choices is empty:

B = A # + C / C               (2)

And this automatic cut insertion is correct as it does not alter the behaviour of the grammar in any way.

If we would be using, as suggested in the next paper, BITES (Redziejowski, 2011) rather than the FIRST basic building block used by Mizushima et al(2010), things don't change: The inferred ExtBites(e) would deliver 'a+' rather than just 'a' but that's about it for this example grammar.


# A Breaking Grammar #

Take grammar2, which is a minimal alteration of grammar1 which accepts the same input set ('language') as grammar1:

G = B / A + B               (1)
B = A + C / C               (2)
A = a
C = c 

Matching Inputs:

c                            (1)
a+c                          (2)
a+a+c                        (3)

Note that rule G has 'flipped' both choices but still parses the same input.
Mazushima's algorithm does not mind, so, once again, only rule B gets a cut:

G = B / A + B               (1)
B = A # + C / C             (2)
A = a
C = c 

as the ExtFirst() sets turn up the same results: B-choice1 = {'a'}, B-choice2 = {'c'}. Also upgrading to Redziejowski's BITES does not change things.


## The Problem ##

However, the cut in B will now kill the memoization cache for the initial 'a' parsed by B in the inputs 'a+c' and 'a+a+c'. When we now inspect the latter, it will fail the first B choice, backtracking to G, where it will be matched by the second choice in G, but this time around, without any help of memoization for the initial 'a' it would otherwise have received in grammar1. 

In other words: for the input 'a+a+c', the second evaluation of the leading matching 'A = a' at input column 0 (now in rule G(1), in grammar1 its second evaluation is in the first choice of rule B(2)) does not receive memoization support anymore. 
This means that using the cut operator, using the Mazushima/Redziejowski ExtFirst/ExtBites approach, for clearing out and reducing your memoization cache footprint, breaks=removes the memoization support unintended, hence the grammar does not exhibit guaranteed linear time behaviour any more.


## Side Notes ##

This is the minimal grammar example that I can devise to showcase the issue. 
Having rule G as

G = B / A + B               (1)

looks suspicious to the human eye, when you're aware of 'prefix hiding', even while the notation as such does *not* imply 'prefix hiding' as we're looking at non-terminals here, but that suspicion can take a long time to develop when this construct occurs in a production grammar along the lines of grammar3 below:

G = B / D                   (1)
B = A # + C / C             (2)
D = A + B                   (3)
A = a
C = c 

where rule D is 'obfuscating' the 'A +' common prefix in both choices of rule G(1). Of course, in a production grammar, the B(2) and D(3) definitions will very probably be further apart if this issue would not indirectly be caught by 'probable trouble' in the visual patterns of the grammar definition itself.
('probable trouble' being a human brain thing, not something a machine would use. Neither is 'probable trouble' a sure-fire thing; it's just that when you look at grammars long and often enough, you may get the feeling 'something is odd with this one'. And you *may* be right. Then again, you may be not.)


# Conclusion #

As the Mazushima cut operator only looks into the future (ExtFirst) but does not consider history/context of the rule currently under scrutiny (i.e. the rule G's behaviour when we determine rule B(2) can have a cut inserted) I expect a solution/fix to this issue would be derived from propagating the invoking rule's First/Bites info into the rule-under-scrutiny in order to add the outer context into the condition set for the automatic insertion of the cut operator. (To be addressed further in another email.)


# References #

Mizushima et al, 2010: Packrat Parsers can handle practical grammars in mostly constant space (Mizushima, Maeda, Yamaguchi, 2010)

Redziejowski, 2011: BITES instead of FIRST for parsing expression grammar (Redziejowski, 2011) -- 'upgrading' Mizushima's algorithm (ExtFirst -> ExtBites) is already suggested in its ch.1:Introduction. 

...

I have searched the Nets to the best of my ability but wasn't able to find earlier mention of this issue; if I missed it, I love to hear about it and a reference to it. :-)  (Because if I missed this one, I'm bound to have missed others in that vicinity too.)


Thank you for bearing with me.

 
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   ger <at> hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Mackenzie High | 10 Jul 10:14 2014
Picon

First Official Release of Snowflake: A Graphical PEG Parser

Greetings everyone,

I would like to officially announce the release of the Java based Snowflake project, which allows users to test PEG grammars on sample input as they write them. In addition, the project is a parser-generator. 



I envision that this program would especially be useful for students who are learning about parsing, since it allows one to immediately see the parse-tree that results from a given grammar and sample input. 

I hope someone finds this program useful. I would be happy to hear any of your questions, comments, or bug reports, regarding this project. 


Sincerely, 
Mackenzie High


_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Juancarlo Añez | 5 Jul 16:19 2014
Picon

Grako 3.1.0

Grako 3.1.0 has been released.

Grako is a PEG/Packrat parser generator for Python. It takes grammars in a variation of EBNF as input, and parses in-memory, or generates a structured parser in Python.

New features in this release:
  • Support for direct and indirect left recursion in grammars.
  • Arguments in grammar rules.
  • Rule inclusion.
  • Rule inheritance
  • Include files.
  • JSON-compatible abstract syntax trees (AST).
  • AST walkers.
Cheers,

--
Juancarlo Añez
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Mathias Doenitz | 26 May 22:04 2014

Re: error handling in PEGs

> hi. That makes sense. It’s basically identifying a no viable alternative situation if the char occurs
at the start of a rule.
> 
> I think one could get better error messages and recovery using tokens instead of scannerless but one gives
up composability/modularity. For token errors, it’s easy to just skip over stuff in the lexer. The
parser need not even see those.
> 
>  when using tokens, it seems one could get a better error message like “missing ID, found INT” instead
of “invalid input ‘3’ expected [a-zA-Z]+” or whatever.

I’m not sure that tokenizing will always result in better error messages.
In fact, there are situation in which tokenizing appears to actually *decrease* error message quality
(for example when there are typos in language keywords).

The approach I outlined for PEGs gives you a list of rule “traces” for an error position whereby each
trace is the stack of rules that failed.
There are several conceivable ways to turn such a list of traces into a proper error message.

Going up the trace to the first parent rule which failed at its first sub-rule (rather than somewhere in the
middle) for reporting an expected alternative is only one option.
You could also allow the user to somehow explicitly mark the rules that are considered to be terminals which
would traditionally be matched by the lexer. (This marking could also be minimally invasive like
ALL-CAPS naming). And if you allow for special constructs in the grammar that give the user even more
control over what and how errors are reported (like in the paper that Sérgio pointed to) the
possibilities multiply.

However, the negative syntactic predicate available in PEGs can pose somewhat of a problem for fully
automatic error reporting. This starts with the determination of the error position (do you count
progress made in a negative syntactic predicate towards the overall progress the parser has made or not?)
and also affects the display of “expected” input (e.g.: “expected anything but FOO or BAR”).

> Single token insertion and deletion is great, although I wonder how well it works with characters
compared to tokens.

Actually it works well and can be superior to recovery on a token stream.
Again, consider a typo in a keyword (like `publc` rather than `public`). Since recovery for PEGs runs on the
same level that the user usually makes mistakes on (i.e. the character level rather than the token level)
single insertion/deletion/replacement is quite effective in real-world applications according to
our experiences.

(Forgot to mention before that parboiled doesn’t only try single-char deletion and insertion but also
replacement, which is essentially a deletion followed by an insertion.)

> Yep, if ANTLR cannot repair inside a rule, it consumes tokens until something in  what could follow
appears. Do you consider the actual call stack or the complete follow set? If you have the call stack, you
can limit the set of things to consume to just what is possible given the input as opposed to giving all input.

parboiled 1.x determines the follower-set by working up through a rule trace, so the follower-set is not
determined through a static analysis but does take the actual input into account.
However, it’s not yet as smart as it could be. Ideally we’d attempt resynchronisation on *all* rule
traces for the current error and choose the best one (i.e. the one that skips the least number of input chars).
IIRC parboiled currently always resynchronises on the first trace only.

Cheers,
Mathias

---
mathias <at> parboiled.org
http://www.parboiled.org

On 26 May 2014, at 18:30, Terence Parr <parrt <at> cs.usfca.edu> wrote:

> 
> On May 26, 2014, at 8:48 AM, Mathias Doenitz <mathias <at> parboiled.org> wrote:
> 
>> Terence,
>> 
>> for error reporting parboiled applies this logic:
>> 
>> 1. Determine error position (first input pos that cannot be parsed), if this is end-of-input we have no
error and are done.
>> 
>> 2. Restart the parser from the very beginning and “watch" the parser try to match the error position.
Record all the rules (rule stacks) that are applied against the error position. From this list of rules we
can construct proper error messages like:
>> 
>>    Invalid input 'x', expected 'f', Digit, hex or UpperAlpha (line 1, column 4):
>>    abcx
>>       ^
>> 
>>    4 rules mismatched at error location:
>>      targetRule / | / "fgh" / 'f'
>>      targetRule / | / Digit
>>      targetRule / | / hex
>>      targetRule / | / UpperAlpha
> 
> hi. That makes sense. It’s basically identifying a no viable alternative situation if the char occurs
at the start of a rule.
> 
> I think one could get better error messages and recovery using tokens instead of scannerless but one gives
up composability/modularity. For token errors, it’s easy to just skip over stuff in the lexer. The
parser need not even see those.
> 
>  when using tokens, it seems one could get a better error message like “missing ID, found INT” instead
of “invalid input ‘3’ expected [a-zA-Z]+” or whatever.
> 
>> Restarting the parser for gathering error info is acceptable in almost all use cases since errors are
infrequent and added overhead here is not a problem.
> 
> I have no problem restarting parsers upon syntax errors; it’s what I recommend people do in ANTLR 4 for
maximum speed. ;)   That way, we can use a faster mechanism for the common case and fail over to a more
expensive one to determine if the syntax error is correct or a result of a weakness in the first strategy.
BTW, I will be presenting the ALL(*) at OOPSLA this year; here’s more on the strategy:
> 
> http://www.antlr.org/papers/allstar-techreport.pdf
> 
> It basically properly builds a parser for any CFG you give it that does not have indirect left recursion.
(ANTLR rewrites direct left-recur under the hood.) No backtracking and it does all analysis on-the-fly,
memoizing lookahead-to-predicted-alternative mappings with DFA.
> 
> I’ve actually been using the supplied parser interpreter these days instead of generating grammars
for instant gratification :) Generating code and compiling is tedious during prototyping.
> 
>> Error recovery:
>> Here we have the following logic:
>> 
>> 1. Try to “remove” the error position from the input and see whether the parser can continue.
>> 2. If not, try to insert a single character that the parser “expects” (see reporting) at the error
position. If this allows us to continue parsing, great.
>> 3. If no single-char fix “repairs” the input we resynchronise the parser in a similar way that ANTLR
does it (IIRC), i.e. we skip ahead in the input until the parser can continue.
> 
> Single token insertion and deletion is great, although I wonder how well it works with characters
compared to tokens.
> 
> Yep, if ANTLR cannot repair inside a rule, it consumes tokens until something in  what could follow
appears. Do you consider the actual call stack or the complete follow set? If you have the call stack, you
can limit the set of things to consume to just what is possible given the input as opposed to giving all input.
> 
> Thanks very much for the summary.
> 
> Ter
Mathias Doenitz | 26 May 22:04 2014

Re: error handling in PEGs

<forgot to CC the list before>
---

Terence,

for error reporting parboiled applies this logic:

1. Determine error position (first input pos that cannot be parsed), if this is end-of-input we have no
error and are done.

2. Restart the parser from the very beginning and “watch" the parser try to match the error position.
Record all the rules (rule stacks) that are applied against the error position. From this list of rules we
can construct proper error messages like:

   Invalid input 'x', expected 'f', Digit, hex or UpperAlpha (line 1, column 4):
   abcx
      ^

   4 rules mismatched at error location:
     targetRule / | / "fgh" / 'f'
     targetRule / | / Digit
     targetRule / | / hex
     targetRule / | / UpperAlpha

Restarting the parser for gathering error info is acceptable in almost all use cases since errors are
infrequent and added overhead here is not a problem.

Error recovery:
Here we have the following logic:

1. Try to “remove” the error position from the input and see whether the parser can continue.
2. If not, try to insert a single character that the parser “expects” (see reporting) at the error
position. If this allows us to continue parsing, great.
3. If no single-char fix “repairs” the input we resynchronise the parser in a similar way that ANTLR
does it (IIRC), i.e. we skip ahead in the input until the parser can continue.

Cheers,
Mathias

---
mathias <at> parboiled.org
http://www.parboiled.org

On 26 May 2014, at 17:38, Terence Parr <parrt <at> cs.usfca.edu> wrote:

> Hi Mathias, Could you summarize quickly how it recovers and reports errors? I think the basic strategy
initially was to simply report an error at the token following the deepest correctly matched token, but I
don’t remember anything about recovery.
> 
> PEGs are so beautiful and simple it makes one proud to be a computer scientist! I’ve heard reports
however of people backing off to a more traditional deterministic parser for reasons of side effecting
action execution during the parse and error recovery.  For example, the xtext guys tried to get away from
ANTLR by building  their own PEG based system but error recovery and reporting gave them lots of trouble and
so they returned to ANTLR.
> 
> Ter
> On May 26, 2014, at 8:35 AM, Mathias Doenitz <mathias <at> parboiled.org> wrote:
> 
>> Terence,
>> 
>> parboiled (http://parboiled.org, http://parboiled2.org) does what I’d consider good error reporting.
>> The 1.x version also supports error recovery in a nice fashion.
>> 
>> Cheers,
>> Mathias
>> 
>> ---
>> mathias <at> parboiled.org
>> http://www.parboiled.org
>> 
>> On 26 May 2014, at 17:30, Terence Parr <parrt <at> cs.usfca.edu> wrote:
>> 
>>> Hi. Just curious. Has anyone figured out how to get good error reporting and recovery in PEGs? I.e.,
competitive with deterministic parsers?
>>> 
>>> Terence
>>> 
>>> 
>>> _______________________________________________
>>> PEG mailing list
>>> PEG <at> lists.csail.mit.edu
>>> https://lists.csail.mit.edu/mailman/listinfo/peg
>> 
> 

Gmane