Ondřej Bílka | 1 Jan 23:01 2011
Picon

break at grammars

Hello
as I started to LR/RR eliminator I noticed that break is handy

It gave idea to introduce break statement to break iteration.
For example C strings could be parsed as
 '"' ('"' break | '\"' | . )*
whad do you think about it
--

-- 

SCSI's too wide.
Francisco Mota | 2 Jan 00:12 2011
Picon

Re: break at grammars

It might be convenient sometimes, but I don't think it increases the power of PEGs.

You can always rewrite an expression like "(E1 break | E2)*" as "(!E1 E2)* E1?"
I wonder if there are any grammars that can't be expressed without break?

2011/1/1 Ondřej Bílka <neleai <at> seznam.cz>
Hello
as I started to LR/RR eliminator I noticed that break is handy

It gave idea to introduce break statement to break iteration.
For example C strings could be parsed as
 '"' ('"' break | '\"' | . )*
whad do you think about it
--

SCSI's too wide.

_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Roman Redziejowski | 2 Jan 11:57 2011
Picon

Re: break at grammars

This seems to be another way of writing the "repeat-until" operation.

Apparently, version 4.0 of APG (ABNF Parser Generator)  by Lowell D. Thomas
had this operation in the form *A!B, meaning "repeat matching A until B is found."
It was removed in version 5.0 as being equivalent to *(!B A)B, which is the same
as indicated by Francisco (for some reason APG writes star before expressions).
See http://www.coasttocoastresearch.com/apg/docs/doc50, section 3.2.3.5.

My argument for introducing "repeat-until" would be a very concise implementation
(at least in the style I use in my "Mouse"), much shorter than that of *(!B A)B.
Have been considering syntax such as A**B, but cannot make up my mind.

Happy New Year!
Roman


On 2011-01-02 00:12, Francisco Mota wrote:
It might be convenient sometimes, but I don't think it increases the power of PEGs.
You can always rewrite an expression like "(E1 break | E2)*" as "(!E1 E2)* E1?"
I wonder if there are any grammars that can't be expressed without break?

2011/1/1 Ondřej Bílka <neleai <at> seznam.cz>
Hello
as I started to LR/RR eliminator I noticed that break is handy

It gave idea to introduce break statement to break iteration.
For example C strings could be parsed as
 '"' ('"' break | '\"' | . )*
whad do you think about it
--

SCSI's too wide.

_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

_______________________________________________ PEG mailing list PEG <at> lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Peter Goodman | 3 Jan 23:38 2011
Picon

C++ PEG library with left recursion support

Hello,

Over the holidays I have been working on a very simple C++ PEG
implementation. So far, it looks like I have properly handled left
recursion, even in light of right recursion. In fact, the parser
allows one to choose the associativity of right recursive rules in the
presence of left recursion. Here is an example calculator that I have
made using the library: http://codepad.org/YAj17QGg

The goals of the parser include:
 - proper support for left recursion, especially in the presence of
right recursion
 - a "dynamic" feel to the definition of grammars (unfortunately,
getting this goal means doing some annoying stuff like T<'x'>() for a
terminal and E<>() for epsilon).
 - simplicity - this is more of a proof of concept rather than being
meant for widespread use. As such, I make the assumption that the
token stream is actually a pre-allocated array that can be indexed and
that the length of the token stream is known in advance of parsing.

Soon, I hope to make a write-up about how I supported left recursion
and publicly release the code.

Things of note:
 - I have not done any analysis on the runtime performance of my
approach with left recursion, but I expect it to be worst-case O(n^2)
where n is the length of the token stream.

Happy new year,

Peter Goodman,
http://ioreader.com
70 Winston Circle,
Montreal, Quebec
H9S 4X6
Ondřej Bílka | 4 Jan 10:28 2011
Picon

Re: break at grammars

It turned out that we need split break as combination of what i denote cut and stop
When we encounter cut in or we skip subsequent branches so A cut B | C has semantics of A B | ~A C
A branch can be marked by stop and we exit loop if and only if we successfuly took marked branch

We can rewrite ordinary loop A* as (A | stop)

This distinction does not make much sense for PEG but for my beast 

It is nondeterministic topdown parser where we choose lexicographicaly smallest derivation 
with respect to took choices (I introduced break to add or in loop to catch greedy matching)
Condition to make it linear-time is that we limit recursion to left,rigth and inside expressions 
marked as nested. We assume that expression marked as nested is suffix-free. Semanticaly we want capture 
being enclosed by pair tags like for example nested('(' expression ')' )

On Sun, Jan 02, 2011 at 11:57:17AM +0100, Roman Redziejowski wrote:
>    This seems to be another way of writing the "repeat-until" operation.
> 
>    Apparently, version 4.0 of APG (ABNF Parser Generator)  by Lowell D.
>    Thomas
>    had this operation in the form *A!B, meaning "repeat matching A until B is
>    found."
>    It was removed in version 5.0 as being equivalent to *(!B A)B, which is
>    the same
>    as indicated by Francisco (for some reason APG writes star before
>    expressions).
>    See [1]http://www.coasttocoastresearch.com/apg/docs/doc50, section
>    3.2.3.5.
> 
>    My argument for introducing "repeat-until" would be a very concise
>    implementation
>    (at least in the style I use in my "Mouse"), much shorter than that of
>    *(!B A)B.
>    Have been considering syntax such as A**B, but cannot make up my mind.
> 
>    Happy New Year!
>    Roman
> 
>    On 2011-01-02 00:12, Francisco Mota wrote:
> 
>      It might be convenient sometimes, but I don't think it increases the
>      power of PEGs.
>      You can always rewrite an expression like "(E1 break | E2)*" as "(!E1
>      E2)* E1?"
>      I wonder if there are any grammars that can't be expressed without
>      break?
>      2011/1/1 Ondřej Bílka <[2]neleai <at> seznam.cz>
> 
>        Hello
>        as I started to LR/RR eliminator I noticed that break is handy
> 
>        It gave idea to introduce break statement to break iteration.
>        For example C strings could be parsed as
>         '"' ('"' break | '\"' | . )*
>        whad do you think about it
>        --
> 
>        SCSI's too wide.
> 
>        _______________________________________________
>        PEG mailing list
>        [3]PEG <at> lists.csail.mit.edu
>        [4]https://lists.csail.mit.edu/mailman/listinfo/peg
> 
> 
>  _______________________________________________
>  PEG mailing list
>  [5]PEG <at> lists.csail.mit.edu
>  [6]https://lists.csail.mit.edu/mailman/listinfo/peg
> 
> References
> 
>    Visible links
>    1. http://www.coasttocoastresearch.com/apg/docs/doc50
>    2. mailto:neleai <at> seznam.cz
>    3. mailto:PEG <at> lists.csail.mit.edu
>    4. https://lists.csail.mit.edu/mailman/listinfo/peg
>    5. mailto:PEG <at> lists.csail.mit.edu
>    6. https://lists.csail.mit.edu/mailman/listinfo/peg

> _______________________________________________
> PEG mailing list
> PEG <at> lists.csail.mit.edu
> https://lists.csail.mit.edu/mailman/listinfo/peg

--

-- 

kernel panic: write-only-memory (/dev/wom0) capacity exceeded.

_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Roman Redziejowski | 4 Jan 11:07 2011
Picon

PEG Archives

I wonder what happened to "PEG Archives"?
Since the New Year's day, I get this when I click on the link:

Not Found

The requested URL /pipermail/peg was not found on this server.

Apache/2.2.9 (Debian) mod_ssl/2.2.9 OpenSSL/0.9.8o Server at lists.csail.mit.edu Port 443

Regards
Roman

_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Peter Goodman | 6 Jan 18:28 2011
Picon

The "Right" Parse Tree

Hello PEG list,

Suppose I have the following PEG:

S -> S S / a.

For the string "aaaa", Warth et al.'s LR support method would yield
the following tree: (a(a(a(aa)))). However, it has been suggested that
the natural/expected tree for this string is really ((((aa)a)a)a).
What does everyone think the parse tree should be for the following
PEG:

S -> S B / a
B -> S

I ask because supporting ((((aa)a)a)a) is, in effect, making a special
case for left recursion in the presence of non-left recursion. But, if
one were to think in terms of the language of B, then the parse of the
above grammar should probably be (a(a(a(aa)))). If this is acceptable,
then it means getting right-leaning trees with left recursion is no
more complicated then adding an intermediary variable. The other side
of things is that (roots of) left recursion should be prioritized over
all else as a means of getting similar trees to what one might expect
from a left-to-right bottom-up parser. What are people's thoughts on
this?

Best Regards,

Peter Goodman,
http://ioreader.com
70 Winston Circle,
Montreal, Quebec
H9S 4X6
Peter Cashin | 6 Jan 21:29 2011
Picon

Re: The "Right" Parse Tree

Hi Peter:

I like the idea of a well defined implicit parse tree.

The three simple cases (left tree, right tree, and flat list) work very well:

sum   = sum '+' int | int

Left recursion is left-associative: 1+2+3+4 => (((1+2)+3)+4)

sum   = int '+' sum | int

Right recursion is right-associative: 1+2+3+4 => (1+(2+(3+4)))

sum   = int ('+' int)*

Repeat generates a simple flat list : 1+2+3+4 => (1+2+3+4)

But I can't find a simple way to define:

sum   = sum '+' sum | int

Perhaps it should be a balance, so that 1+2+3+4 => ((1+2)+(3+4))

But what about 1+2+3? Should that be (1+(2+3)) or ((1+2)+3) or (1+2+3)?

So your question is in my "too hard" basket!

I hope someone else has a better answer....

Cheers,
Peter Cashin


On Fri, Jan 7, 2011 at 6:28 AM, Peter Goodman <peter.goodman <at> gmail.com> wrote:
Hello PEG list,

Suppose I have the following PEG:

S -> S S / a.

For the string "aaaa", Warth et al.'s LR support method would yield
the following tree: (a(a(a(aa)))). However, it has been suggested that
the natural/expected tree for this string is really ((((aa)a)a)a).
What does everyone think the parse tree should be for the following
PEG:

S -> S B / a
B -> S

I ask because supporting ((((aa)a)a)a) is, in effect, making a special
case for left recursion in the presence of non-left recursion. But, if
one were to think in terms of the language of B, then the parse of the
above grammar should probably be (a(a(a(aa)))). If this is acceptable,
then it means getting right-leaning trees with left recursion is no
more complicated then adding an intermediary variable. The other side
of things is that (roots of) left recursion should be prioritized over
all else as a means of getting similar trees to what one might expect
from a left-to-right bottom-up parser. What are people's thoughts on
this?

Best Regards,

Peter Goodman,
http://ioreader.com
70 Winston Circle,
Montreal, Quebec
H9S 4X6

_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Alex Repain | 6 Jan 21:45 2011
Picon

Re: The "Right" Parse Tree



2011/1/6 Peter Cashin <cashin.peter <at> gmail.com>
Hi Peter:

I like the idea of a well defined implicit parse tree.

The three simple cases (left tree, right tree, and flat list) work very well:

sum   = sum '+' int | int

Left recursion is left-associative: 1+2+3+4 => (((1+2)+3)+4)

sum   = int '+' sum | int

Right recursion is right-associative: 1+2+3+4 => (1+(2+(3+4)))

sum   = int ('+' int)*

Repeat generates a simple flat list : 1+2+3+4 => (1+2+3+4)

But I can't find a simple way to define:

sum   = sum '+' sum | int

Perhaps it should be a balance, so that 1+2+3+4 => ((1+2)+(3+4))

But what about 1+2+3? Should that be (1+(2+3)) or ((1+2)+3) or (1+2+3)?

So your question is in my "too hard" basket!

My intuition to that problem is that as long as left-recursion is present, we should prefer a left tree. Yes, left-recursion comes handy when it comes to clearly express the intent of a grammar, but it should not be a careless choice. I assume that since one used left-recursion, his expectation is a left-associative behaviour, no matter if the rule is also right-recursive or not.

That's where the idea from Peter gets really nice : it can allow one aware dude to use left recursion but to get a right associative behaviour by adding an intermediate rule in it. Some fair hack.

Others have discussed the thing, see Laurence Tratt's  "Direct Left-Recursive Parsing Expression Grammars"
http://tratt.net/laurie/research/publications/html/tratt__direct_left_recursive_parsing_expression_grammars/
The author suggests we fix the behaviour of Wrath et al.'s left-recursion handling algorithm to get it parse to "left trees" when a rule is both left and right recursive.

Alex

 
I hope someone else has a better answer....

Cheers,
Peter Cashin


On Fri, Jan 7, 2011 at 6:28 AM, Peter Goodman <peter.goodman <at> gmail.com> wrote:
Hello PEG list,

Suppose I have the following PEG:

S -> S S / a.

For the string "aaaa", Warth et al.'s LR support method would yield
the following tree: (a(a(a(aa)))). However, it has been suggested that
the natural/expected tree for this string is really ((((aa)a)a)a).
What does everyone think the parse tree should be for the following
PEG:

S -> S B / a
B -> S

I ask because supporting ((((aa)a)a)a) is, in effect, making a special
case for left recursion in the presence of non-left recursion. But, if
one were to think in terms of the language of B, then the parse of the
above grammar should probably be (a(a(a(aa)))). If this is acceptable,
then it means getting right-leaning trees with left recursion is no
more complicated then adding an intermediary variable. The other side
of things is that (roots of) left recursion should be prioritized over
all else as a means of getting similar trees to what one might expect
from a left-to-right bottom-up parser. What are people's thoughts on
this?

Best Regards,

Peter Goodman,
http://ioreader.com
70 Winston Circle,
Montreal, Quebec
H9S 4X6

_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg


_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Peter Cashin | 6 Jan 22:21 2011
Picon

Re: The "Right" Parse Tree

Hi Peter & Alex:


Ah!  Sorry Peter I didn't quite follow you at first...

Thanks to Alex I now think you have suggested a good solution.

So for my little balanced example it is "left first":

       sum = sum '+' sum | int     <=>     sum = sum '+' int  |  int 

"Left first" is also a "left to right" interpretation, so it has a good rationale.

Thanks,

Peter.
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Gmane