Jonathan Lee | 8 Feb 19:42
Picon
Gravatar

Macro preprocessor

Hello everyone,

I am very new to SableCC, so please bear with me.  I would like to
implement an Erlang parser, and I've written the main grammar.
However, Erlang supports preprocessing.  For example, macros can be
defined and applied as:

  -define(MACRO, "abc").
  f() -> ?MACRO.

I am trying to understand how to create a preprocessor that integrates
with SableCC.  I've looked at the Java 1.5 unicode preprocessor, but
that is much simpler than what I need -- it simply replaces escaped
tokens as they occur.

I've thought of writing a preprocessor grammar, parsing the text once,
modifying the AST, outputting back to a string, and finally reparsing
with the main grammar.  But my gut feel is that that's a poor
approach, especially with the need to output an intermediate string.

Is there a standard way of doing this, or an example of this anywhere?
Is there a simple way to chain a preprocessor and parser together in
SableCC?

Thanks for your help!

Jonathan
Emanuele Ianni | 17 Dec 17:04
Picon
Gravatar

Thank you Etienne!

Hi Etienne and everybody else! I would like to thank you, Etienne, for your help in this mailing list during this year in which I wrote my thesis. Two days ago I got my master of science in computer engineering with first-class honours. I formalized a grammar of a DSL language used by my department. Then I also implemented an editor with syntax coloring and some other feature you usually find in tools like Eclipse. I enclose a video of the demo, everything you see there works on top of SableCC3.2. I could write only one grammar and define more types of Lexer, Parser (I used the filter method -A LOT-) and Translation Rules (different extensions of the DepthFirstAdapter).

http://dl.dropbox.com/u/14701979/tesi_manu3.mpg

Now I'm applying for a Ph.D at Inria.fr, finger crossed!

--
Distinti Saluti
Emanuele Ianni

Le informazioni e gli allegati contenuti in questa e-mail sono considerati confidenziali e possono essere riservati. Qualora non foste il destinatario, siete pregati di distruggere questo messaggio e notificarmi il problema immediatamente. In ogni caso, non dovrete spedire a terze parti questa e-mail. Vi ricordo che la diffusione, l'utilizzo e/o la conservazione dei dati ricevuti per errore costituiscono violazione alle disposizioni del D.L. n. 196/2003 denominato "Codice in materia di protezione dei dati personali"
Tale disclaimer non vale nel caso il messaggio sia in una mailing list pubblica.

The information in this e.mail and in any attachements is confidential and may be privileged. If you are not the intended recipient, please destroy this message and notify the sender immediately. You should not retain, copy or use this e.mail for any purpose, nor disclose all or any part of its contents to any other person according to the Italian Legislative Decree n. 196/2003.
This disclaimer should be not considered if the message is on a public mailing list.

_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion <at> lists.sablecc.org
http://lists.sablecc.org/listinfo/sablecc-discussion
Emanuele Ianni | 3 Dec 21:08
Picon
Gravatar

How do you traverse the AST?

Hi! I was curious about you implement the depth first used in the tree-walker. I'm asking because I've noticed that it's pretty cool because, "obviously", the tree-walker goes first on the nodes which are deepest. In my case I have an instruction like: showattr("degree") NOT (v.label == 10) and I noticed that it goes first in "degree" and then in 10 which are the deepest grammar rules. Do you order the children in some way? Do you apply some euristic?

--
Distinti Saluti
Emanuele Ianni

Le informazioni e gli allegati contenuti in questa e-mail sono considerati confidenziali e possono essere riservati. Qualora non foste il destinatario, siete pregati di distruggere questo messaggio e notificarmi il problema immediatamente. In ogni caso, non dovrete spedire a terze parti questa e-mail. Vi ricordo che la diffusione, l'utilizzo e/o la conservazione dei dati ricevuti per errore costituiscono violazione alle disposizioni del D.L. n. 196/2003 denominato "Codice in materia di protezione dei dati personali"
Tale disclaimer non vale nel caso il messaggio sia in una mailing list pubblica.

The information in this e.mail and in any attachements is confidential and may be privileged. If you are not the intended recipient, please destroy this message and notify the sender immediately. You should not retain, copy or use this e.mail for any purpose, nor disclose all or any part of its contents to any other person according to the Italian Legislative Decree n. 196/2003.
This disclaimer should be not considered if the message is on a public mailing list.

_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion <at> lists.sablecc.org
http://lists.sablecc.org/listinfo/sablecc-discussion
Thomas Schwinge | 21 Nov 17:00

Java 1.5 grammar with CST -> AST; issue with default case in switch statements

Hi!

In my diploma thesis/project I'm using the Java 1.5 grammar as originally
published by Stefan Mandel in
<http://lists.sablecc.org/pipermail/sablecc-user/binrUhxOff8TC.bin> in
message <http://lists.sablecc.org/pipermail/sablecc-user/msg00435.html>,
then picked up by Janus Nielsen in
<http://janusnielsen.wordpress.com/2007/04/18/java-5-grammar-for-sablecc-32/>,
<http://janusnielsen.wordpress.com/2008/06/08/java-5-grammar-for-sablecc-32-now-with-copyright/>,
and, after me noticing that the links are 404 again, published in a
public repository, <http://bitbucket.org/jdn/java5grammar>.

That advantage over the grammar on
<http://sablecc3.sablecc.org/wiki/Java-1.5> is that it already contains
CST -> AST transformation rules.

To make it work with stock SableCC 3.2, I'm using the following tiny
patch on top of this version.  (The change log comment on top of the file
would also need updating.)

--- /home/thomas/tmp/source/sablecc/java5grammar/master/grammar/java5.sablecc	2011-03-23
17:59:45.000000000 +0100
+++ src/ikr/compl/jmm/parser.sablecc	2011-03-23 22:12:36.000000000 +0100
@@ -1491,7 +1491,7 @@
        {super} question super reference_type {-> New wildcard.super(reference_type.type)};

 type_parameters {-> [elements]:type_parameter*} =
-	lt type_parameter_list_gt {-> type_parameter_list_gt.elements};
+	lt type_parameter_list_gt {-> [type_parameter_list_gt.elements]};

 type_parameter_list_gt {-> [elements]:type_parameter*} = 
        {one} type_parameter_gt {-> [type_parameter_gt.type_parameter]} |

So far, so good -- big thanks, Stefan and Janus -- I didn't have to spend
much time on the grammar itself, but could instead concentrate on just
using it.

Today I found an issue with the default case in switch statements:

    switch_statement {-> statement} =
        switch condition switch_block {-> New statement.switch(condition.expression, [switch_block.statement])};

    switch_block {-> statement*} =
        l_brc switch_block_statement_group* switch_label* r_brc {->
[switch_block_statement_group.statement, New
statement.switch_block([switch_label.expression], [])]} ;

    switch_block_statement_group {-> statement} =
        switch_label+ block_statement+ {-> New statement.switch_block([switch_label.expression], [block_statement.statement])};

    switch_label {-> expression?} =
        {expression} case constant_exp colon {-> constant_exp.expression} |
        {default} default colon {-> Null};

(Please see
<https://bitbucket.org/jdn/java5grammar/src/eb90e50d0139/grammar/java5.sablecc>
for the whole grammar.)

Briefly, this means that a switch label is either a (constant) expression
(which remains in the AST), or the default token (which is mapped to
Null).

Then, if you have:

    switch (x)
      {
        case 10:
        case 100:
          a = 1;
          break;

        default:
          a = 0;
          break;

        case 20:
        case 200:
          a = 2;
          break;
      }

..., you'd get a switch_block_statement_group with a switch_label list of
10, 100, an empty switch_label list (default case), and a switch_label
list of 20, 200.  This is still detectable unambiguously.

But, in the following case:

    switch (x)
      {
        case 10:
        case 100:
          a = 1;
          break;

        case 30:
        default:
        case 300:
          a = 0;
          break;

        case 20:
        case 200:
          a = 2;
          break;
      }

..., the 30 and 300 cases will build a switch_label list, but the default
case will no longer be detectable: due to its mapping to Null, it will
simply not be added to this switch_label list, and will thus no longer be
detectable, contrary to the empty list case in the example before.

Before I begin researching all the theory about CST -> AST
transformations -- any quick suggestion?

Grüße,
 Thomas
_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion <at> lists.sablecc.org
http://lists.sablecc.org/listinfo/sablecc-discussion
Stéphane HENRY | 11 Oct 04:20
Picon

Precedence keyword

Hi,

is it possible to have the "Precedence" keyword several times ?

I get this message :
Syntax error on unexpected PrecedenceKeyword token "Precedence":
 expecting: EOF.

I use SableCC version 4-beta.4

Thanks

Stéphane
César | 17 Sep 23:58
Picon
Gravatar

Extending Depth First Adapter

Hello.

I'm coding the AST -> IR (tree) phase of my compiler and I'm a little
lost in how to represent the tree and even extends the adapter.

Does any of you have some code snippet that I could use as a base for my own?

The way I thought implementing it uses the return of methods "caseXXX"
but these methods must be void, so I thought to use global vars but I
think this will be a little messy. Any idea?

thanks.
César.
Phuc Luoi | 17 Sep 10:49
Picon
Gravatar

A Wrapper for PushbackReader

In some situation, man cannot give the Lexer an instance of PushbackReader.
For example when I write the SableCC Plugin for Netbeans, I can only get an
instance of LexerInput, a predefined class in the Netbeans framework,
which contains
a (long) String to be parsed.

In these situations, it is comportalbe if there is  a warper class /
interface which provides
the generated Lexer necessary method. They may be for example
"read()",  "unread()". So an end-user can implement this
class/interface so, that it fits his
situation. A pre-implemented class may be simple a warper around the
class PushbackReader.

JavaCC uses a similar approach. It creates the class
"SimpleCharStream", and the generated lexer
uses only this class. An end-user can easy reimplement this class to
fit his situation.

How do you think about this propose?

Hong Phuc Bui
Picon
Favicon

Translation

Hi sablecc users,

I was wondering if some of you have been involved in projects where we
want to translate the language of your grammar into another, e.g.
translating infix mathematical expressions into reverse polish notation.
In such cases, I think that token by token or production by production
transduction is not an option.

What do you think it is a good approach to do so ? Can abstract syntax
trees help [1] ?

I'm actually trying to translate treebank [2] sentences into a flat
structure, and the only way I see now is to manually inspect concret tree
nodes and hard-code rules for each.

Any ideas ?

Cheers,

Sebastian

[1] http://nat.truemesh.com/archives/000531.html
[2] http://en.wikipedia.org/wiki/Treebank
Emanuele Ianni | 14 Sep 10:59
Picon
Gravatar

Error checking

In my grammar every newline is basically an instruction. Right now when I pass script to the parser it stops after the first error. Is there a way to tell it to go through the whole script and print out all the error? Right now I'm doing it using a loop and just moving forward reparsing the rest of the script without the part already analyzed.

--
Distinti Saluti
Emanuele Ianni

Le informazioni e gli allegati contenuti in questa e-mail sono considerati confidenziali e possono essere riservati. Qualora non foste il destinatario, siete pregati di distruggere questo messaggio e notificarmi il problema immediatamente. In ogni caso, non dovrete spedire a terze parti questa e-mail. Vi ricordo che la diffusione, l'utilizzo e/o la conservazione dei dati ricevuti per errore costituiscono violazione alle disposizioni del D.L. n. 196/2003 denominato "Codice in materia di protezione dei dati personali"
Tale disclaimer non vale nel caso il messaggio sia in una mailing list pubblica.

The information in this e.mail and in any attachements is confidential and may be privileged. If you are not the intended recipient, please destroy this message and notify the sender immediately. You should not retain, copy or use this e.mail for any purpose, nor disclose all or any part of its contents to any other person according to the Italian Legislative Decree n. 196/2003.
This disclaimer should be not considered if the message is on a public mailing list.

_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion <at> lists.sablecc.org
http://lists.sablecc.org/listinfo/sablecc-discussion
César | 1 Sep 02:38
Picon
Gravatar

Ignored Tokens

Hello.

I'm sorry if this isn't the right place to post this question, but I
didn't know anywhere else.

Seems to me that "Ignored Tokens" isn't working, since the tokens are
returned with a trailling space, some space tokens are returned, and
even new lines.

I'm using the following grammar:

------------ grammar -----------
Tokens
    blank = (' ' | 9 | 13 | 10)*;
    public = 'public';
    class = 'class';

Ignored Tokens
    blank;
------------ grammar -----------

and I'm using the following input to test:

------------- input test -----------
public class Factorial{
    public static void main(String[] a){
	    System.out.println(new Fac().ComputeFac(10));
    }
}
------------- input test -----------

I use this java snippet to test the lexer:

---------- main.java --------
Lexer l = new Lexer(new PushbackReader(new InputStreamReader(System.in), 1024));
		
	        Token t = l.next();
		    while (!t.getText().equals("")){
		    	System.out.println("|" + t.toString() + "|");
		    	t = l.next();
		    }
---------- main.java --------

output from main:

--------------------- output ------------
|public |
|  |
|class |
|  |
[1,14] Unknown token: F

--------------------- output ------------

I discovered the problem when the parser warned that was expecting
"public" but the lexer returned "public ".

I'll be very grateful for any help.

thanks in advance.
César.
Paul Cockshott | 25 Aug 10:35
Picon

Hi , a proposal and a bug report

Hi, I have been using Sablecc for some 10 years or so.
I teach compilers at the University of Glasgow and have used Sablecc as the basis for my compiler course with
the students preparing a compiler for a simple functional language Hi with arrays as first class citizens
using Sablecc.
Since that particular language is no longer being taught - the course having been terminated it would be
possible to upload the grammar and other parts of the compiler to your site without giving answers to the students.

In addition I have used Sable for 4 languages that I have developed as part of my research:

Lino -- a language for parallel processing that I am prototyping on the new Intel SCC chip
MA -- a prototype Matlab to C compiler
ILCG -- a code generator generator that automatically generates java code generators from machine descriptions

In the last few months I have been working on integrating ILCG and Sablecc to produce a prototype which I call
ILCG2 which incorporates Sablecc3.2 and ILCG.
This allows you to write a short script file for example
tinyb.ilc
This specifies a grammar file in sable say tinyb.sab, a machine code description file say Pentium.ilc, and
the name of a compiler
which when processed by ILCG2 will generate the following:
1. All the files normally generated by sablecc
2. The java code for a Pentium code generator
3. The java code for the main class of the compiler which invokes the sable parser, calls a class named
Translator, and then invokes the code generator.

The user has to write a class Translator.java which translates the Sable language dependent abstract
syntax tree into the machine indepenent abstract syntax tree. The quality of the auto generated code is
competitive with other leading tools.

Using ILCG my students and I have developed Vector Pascal a very high level auto parallelising Pascal
compiler with many of the parallelism features of Fortran 90.

What I propose is that ILCG2 which incorporates SableCC, be made generally available via your sablecc
project, since the combination of ILCG and Sable allows the automation of the two hardest parts of
compiler construction : codegeneration and parsing.

The Bug.
In developing the Lino compiler I found that there is a bug in the clone methods generated for list classes.
If you attempt to copy and replace an element from a list for example in macro expansion, using the standard
methods that Sable makes available to rewrite treesm then the original copy is deleted from the list
before it is placed elsewhere in the tree.

The University of Glasgow, charity number SC004401

Gmane