Rostislav Sacek | 30 Dec 23:59 2013
Picon

[ANN] LPegLJ v.20

I have released version 0.20 of LPegLJ
https://github.com/sacek/LPegLJ/
LPegLJ is an PEG parser based on LPeg v0.12 and all code is implemented in
LuaJIT 2.x.

New features:
- Faster then previous LPegLJ 0.12.
VM engine is about 1.5 times faster, creating patterns is about 4 times
faster.
- Loading and saving patterns
- Support direct and indirect left recursion based on Sergio Medeiros
algorithm
(http://arxiv.org/abs/1207.0443)

Feedback welcome.

All code is MIT licensed.
Rostislav Sacek
Henri Tuhola | 10 Dec 02:24 2013
Picon

Inspiration for recognition based grammars

While I was studying left recursion, I realized that it becomes complex and unpredictable very fast. Allowing left recursion in peg parser is almost like a permission for making it extremely complicated. Suddenly right recursion starts affecting what ends up actually parsed and it's neither nice that the solution is some wrap-around papered over the simple ways to parse with parsing expression grammars.

I noticed another property that isn't emphasized a lot about pegs. If you scan from right to left instead of left to right, the same grammar may match a different language. Regular expressions have same behavior though.

It all gave me a reason to question the sanity in parsing expression grammars. The idea to construct parser from recognition rules sounds intuitive though. Perhaps there's a way to construct recognition based parsing that doesn't depend on direction of evaluation as much.

Human parses the text by looking at some small area at once. Perhaps it would make sense imitating that. Here's some observations:

Even if PEG parsing doesn't force that, it seems preferable to mark some words, such as 'if', 'for' and 'else', as keywords.  "term = identifier"  seems to be less specific than  "statement = if ... else ..."  or  "statement = for ... in" .

Infix expressions alone are unambiguous, and if human doesn't know their precedence then only then they become ambiguous. For example: "4 + 5*3" is easy to understand because precedence is very much standardized at that point. The "4 + 3 << 2" is more ambiguous.

If some pattern is lacking, the pieces that formed parts of that pattern are garbage. For example superfluous '(' and ')' parentheses. Some such garbage patterns cause ambiguous situations as well.

I don't know whether I explore this idea further. It feels more promising because there's chance for better error reporting there. It could also work in noisy settings. Haven't studied whether there's something similar existing already.
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Aaron Moss | 26 Nov 23:24 2013
Picon
Picon

Exponential Time Grammars?

Does anyone have a good example of a parsing expression language that 
*requires* exponential time to parse via recursive descent? I'm working 
on a new algorithm, and I want to check my bounds on some tricky cases.

The best I can come up with is this:

S = A !.
A = ( a A b | a A c )?

which will take exponential time on any string of the form a^n c^n, but 
is exactly equivalent to the following grammar, which is linear time on 
all strings, and I'd like a case that isn't (trivially) transformable to 
an easier grammar:

S = A !.
A = ( a A ( b | c ) )?

Thanks,
Aaron
Henri Tuhola | 25 Nov 07:31 2013
Picon

Easy way to parse indented syntax by adding dimension?

Hi again.

You can already parse indentation with PEG by tokenizing step or providing context. But if you treat the input such that it holds two dimensions, shouldn't it be easy to notice that indented block clearly isn't context sensitive after all?

for i in range(6):
    print(i)
    print(i * 2)

There is very clear pattern here, and you can't really parse the indentation around the block any other way. So doesn't that mean it can be done with packrat parser? You only need a certain sort of extra rule for it:

stmt <- 'for' variable 'in' expression >>> stmt

The 'indent' (>>>):

 1. Memorize column index as base-indent. Make sure the line starts with this structure.
 2. Match the head pattern.
 3. Match newline, count spaces until character found. But skip comments.
 4. Fail if less spacing than what column index dictates.
 5. Match body pattern.
 6. Repeat step 3, 4, 5, until first failure, with condition that the spacing must line up such that it forms a block.

This happens within single block, so it doesn't leak state around. I think it's perhaps possible to synthesize a 2-D PEG. If someone figures out a way to do exactly that, you could also try parse:

       y
k = ------
     x**2

or this, if earlier one turns out too insane:

k = y
     ------
     x**2

I read about someone doing parsing on scanned math expressions. So it doesn't sound too impossible to consider that this might work just as well.
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Toby Goodwin | 24 Nov 20:31 2013

New release of pacc

I'm delighted to announce a new release of pacc, the packrat PEG parser
generator for C.

  http://static.paccrat.org/release/pacc-0.1.tar.bz2

  SHA256: b9ac6633a499d036944453f8af7f3f3af90a913d2f998664580dbc9bd3597e0f

Highlights of this release include a complete reworking of the "feeding"
interface, improvements to various other interfaces, and many, many bug
fixes. Additionally, the documentation is now reasonably complete.

(Feeding is the feature--unique to pacc as far as I know--that allows
pacc to be used for command line applications, such as shells.)

As always, I would be delighted for any feedback. If you are interested
in adopting pacc for your project, I'm interested in working with you
to make it function as it should!

Toby.
--

-- 
Toby Goodwin
pacc developer
Roman Redz | 21 Nov 22:23 2013
Picon

From EBNF to PEG

My paper "From EBNF to PEG" that builds on ideas from Medeiros has now appeared in Fundamenta Informaticae vol. 128 nr 1-2 (2013) pp. 171 - 191.
A preliminary version is available from http://www.romanredz.se/pubs.htm .
Roman
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Henri Tuhola | 21 Nov 01:15 2013
Picon

Debugging and error reporting in a parser?

Hello

How could I extend my simple parser generator such that parsers provide error reports that show precisely what went wrong?

I thought it was a nice idea to write myself a parser generator, specifically tailored for the language I'm going to parse: https://github.com/cheery/kickrat

It was quite fun, clean and exciting to create such library but now I've been prototyping my language for few hours. It was enough for noticing that when I do an error in describing the grammar it is really hard to figure out what's wrong with just plain 'syntax error' -message. Overall it would be nice to get detailed syntax errors for language, even if it's a prototype.
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Matthew Goode | 12 Aug 07:28 2013
Picon

New Java PEG/Packrat parser

Hello, I'm pretty much ready to release a parser generator I've been 
working on for a while. It is a packrat inspired parser (though can 
avoid memorising), that handles left recursion indirectly and directly, 
and takes into account option order with left recursion. So, the following

e = { e "+" e } | {e "*" e } | NUMBER

will parse "1+2*3+4" as (1+((2*3)+4) (if specified correctly - see 
documentation). This seems to work okay with indirect recursion as well, 
and with or without memorisation. It does use a tokenisation step, 
because I liked that better, but that may change in future versions. It 
also has extensions to PEG that allow it to match things like Fibonacci 
sequences...

It can be found at http://scratchy.org.nz/austen.php

(It is still a work in a progress, but for the most part is a well 
matured tool - I've been working on and using it for years).

Matthew
Fadel al hassan | 19 Jun 17:35 2013
Picon

PEG Non-Context Free

Dear All, 

My name is Fadel Al-Hassan, I'm using PEG grammar for some linguistic pattern matching tasks.

But, during my work I have noticed that the grammar, mentioned in Dr.Ford paper, for recognizing  (a^n b^n c^n)

D => &(A !b) a* B !.

A => aAb / ""

B => bBc / "" 

will also recognize (a^m b^n c^n where m >= n). Since the first match of A rule can be an empty string.

So as to avoid this I added another predicate for D rule so it  becomes:

D => &(A !a !b) a* B !.

Is this right? or I misunderstood something here?

Yours Sincerely 

_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
Sérgio Medeiros | 19 Jun 00:57 2013
Picon

Re: Size and intersections of the PEG language set

On Mon, Jun 17, 2013 at 6:40 PM, Juancarlo Añez <apalala <at> gmail.com> wrote:
> A question that often arises in Web circles like Slashdot is what is the
> relation of PEG languages to other well known sets like LL, LR, or even CFG
> (and there's LSTAR and ALLSTAR).

Hi, Juancarlo.

I studied the relation between PEGs and CFGs (and regular expressions)
during my PhD. I have shown how to convert right-linear and strong-LL(k)
CFGs to PEGs. The article "On the Relation between Context-Free
Grammars and Parsing Expression Grammars"
(http://arxiv.org/abs/1304.3177) discusses this,
and it also shows how to convert LL-regular CFGs into PEGs.

I think the question about LR CFGs and PEGs is more difficult
to answer because these CFGs can have left-recursive rules, while
the LL(k) can't. We have tried to give a clear semantics for PEGs
with left-recursive rules in http://arxiv.org/abs/1207.0443. After this,
a next step would be to study the relation between LR CFGs and PEGs.

> I think that a PEG grammar is not a CFG grammar in the strict sense because
> of PEG's rule/production ordering.

In http://arxiv.org/abs/1304.3177, we also show that given a grammar G
(without predicates and left recursion), L(PEG) is a subset of L(CFG).

Roman Redziejowski has also several articles that discusses
the relation between the languages of PEGs and CFGs.

> Intuition tells me that the L(PEG) is larger than LL and LR, but I haven't
> found a proof for it.

As Francisco said, it is a good challenge to prove
the exact relation between PEGs and CFGs :-)

Sérgio
Juancarlo Añez | 17 Jun 23:40 2013
Picon

Size and intersections of the PEG language set


A question that often arises in Web circles like Slashdot is what is the relation of PEG languages to other well known sets like LL, LR, or even CFG (and there's LSTAR and ALLSTAR).

I think that a PEG grammar is not a CFG grammar in the strict sense because of PEG's rule/production ordering.

My question is about the set of languages that can be parsed. 

Intuition tells me that the L(PEG) is larger than LL and LR, but I haven't found a proof for it.

Cheers,


--
Juancarlo Añez
_______________________________________________
PEG mailing list
PEG <at> lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Gmane