Sam Harwell | 8 Apr 2010 18:52

New partial methods in the C# 3 generated code

I’ve made a few changes locally in the generated code for the CSharp3 target. Partial methods are new to C# 3, and were added so generated code can add extensibility points. The partial methods and all calls to them in generated code are *completely* removed by the compiler if there is no implementation provided in another file.

 

1.       The partial methods “void EnterRule(string ruleName, int ruleIndex)” and “void LeaveRule(string ruleName, int ruleIndex)” are generated, and a call to each is placed at the beginning and end of each rule.

 

2.       For each rule, the partial methods “void Enter[RuleName]()” and “void Leave[RuleName]()” are generated. They are called *outside* of the calls to EnterRule and LeaveRule (Enter[RuleName] before EnterRule, Leave[RuleName] after LeaveRule).

 

3.       The -trace flag no longer affects the generated code. Calls to BaseRecognizer.TraceIn and BaseRecognizer.TraceOut are always included in the generated code, and the methods are now marked [Conditional("ANTLR_TRACE")]. The calls are *inside* the calls to EnterRule and LeaveRule.

 

4.       The partial method “void OnCreated()” is called at the end of the constructor.

 

5.       The partial method “void CreateTreeAdaptor(ref ITreeAdaptor adaptor)” is generated for tree parsers. Implement this method if you want to use a tree adaptor other than CommonTreeAdaptor – no more need to place conditional initialization inside your rules.

 

I’m also open to suggestions for other partial methods to add, which is why I included this on the antlr-interest list.

 

Thanks,

Sam

<div>

<div class="Section1">

<p class="MsoNormal">I&rsquo;ve made a few changes locally in the generated code
for the CSharp3 target. Partial methods are new to C# 3, and were added so
generated code can add extensibility points. The partial methods and all calls
to them in generated code are *completely* removed by the compiler if
there is no implementation provided in another file.<p></p></p>

<p class="MsoNormal"><p>&nbsp;</p></p>

<p class="MsoListParagraph"><span>1.<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span>The partial methods &ldquo;void EnterRule(string
ruleName, int ruleIndex)&rdquo; and &ldquo;void LeaveRule(string ruleName, int
ruleIndex)&rdquo; are generated, and a call to each is placed at the beginning
and end of each rule.<p></p></p>

<p class="MsoNormal"><p>&nbsp;</p></p>

<p class="MsoListParagraph"><span>2.<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span>For each rule, the partial methods &ldquo;void Enter[RuleName]()&rdquo;
and &ldquo;void Leave[RuleName]()&rdquo; are generated. They are called *outside*
of the calls to EnterRule and LeaveRule (Enter[RuleName] before EnterRule,
Leave[RuleName] after LeaveRule).<p></p></p>

<p class="MsoListParagraph"><p>&nbsp;</p></p>

<p class="MsoListParagraph"><span>3.<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span>The -trace flag no longer affects the generated code.
Calls to BaseRecognizer.TraceIn and BaseRecognizer.TraceOut are always included
in the generated code, and the methods are now marked [Conditional("ANTLR_TRACE")].
The calls are *inside* the calls to EnterRule and LeaveRule.<p></p></p>

<p class="MsoListParagraph"><p>&nbsp;</p></p>

<p class="MsoListParagraph"><span>4.<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span>The partial method &ldquo;void OnCreated()&rdquo; is called
at the end of the constructor.<p></p></p>

<p class="MsoListParagraph"><p>&nbsp;</p></p>

<p class="MsoListParagraph"><span>5.<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span>The partial method &ldquo;void CreateTreeAdaptor(ref
ITreeAdaptor adaptor)&rdquo; is generated for tree parsers. Implement this
method if you want to use a tree adaptor other than CommonTreeAdaptor &ndash;
no more need to place conditional initialization inside your rules.<p></p></p>

<p class="MsoListParagraph"><p>&nbsp;</p></p>

<p class="MsoNormal">I&rsquo;m also open to suggestions for other partial methods
to add, which is why I included this on the antlr-interest list.<p></p></p>

<p class="MsoNormal"><p>&nbsp;</p></p>

<p class="MsoNormal">Thanks,<p></p></p>

<p class="MsoNormal">Sam<p></p></p>

</div>

</div>
Terence Parr | 23 Apr 2010 02:59
Gravatar

v4 code gen

hi, started thinking about it...

http://www.antlr.org/wiki/display/~admin/v4+code+generation

Ter
Kay Röpke | 23 Apr 2010 13:11

Re: v4 code gen

hi!

On Apr 23, 2010, at 2:59 AM, Terence Parr wrote:

> hi, started thinking about it...
>
> http://www.antlr.org/wiki/display/~admin/v4+code+generation

while i totally agree with the goal to reduce the number of templates  
nested conditionals get messy, too.
recently i wrote a relatively simple codegenerator for a custom google  
protobuf implementation, here's one of the simpler templates:

buildProtobuf(field, names) ::= <<
<if(field.transient)><! this field is computed from some other  
field(s), skip it in the merge code !>
// <field;format="variableName"> is transient, no sense in serializing  
it.
<else>
<if(field.repeated)>
// TODO serialize repeated fields properly, depends on the field type:  
native types are fine (even if boxed), message types are not fine at  
all.
<else>
<if(field.messageType)>
<! message type, we should not follow asset links !>
if (has<field;format="methodName">) {
     <names.outerClassName>.<field; format="shortTypeName">.Builder  
<field;format="variableName">Builder = <names.outerClassName>.<field;  
format="shortTypeName">.newBuilder();

< 
field 
;format 
="variableName">Builder.setId(get<field;format="methodName">().getId());

b.set<field;format="methodName">(<field;format="variableName">Builder);
}
<else>
<! plain native type just call protobuf builder !>
if (has<field;format="methodName">) {
     b.set<field;format="methodName">(get<field;format="methodName">());
}
<endif>
<endif>
<endif>
 >>

as you can see, there are only three conditions, but even those make  
it icky to follow already. other parts of the template group are even  
worse, and antlr seems to have even more branches in its codegen.
now, having the multiple templates is not ideal either.
part of the problem stems from not being able to tell which template  
applies in which circumstance.
in the past i've tried to model templates after classes or methods,  
having one template for each variant of the template output, much like  
antlr does it today (although the division in the code isn't as clear  
in antlr3 today).
what i've noticed in that approach is that there are often large  
chunks of text that are common between multiple templates. the next  
step was to factor those common parts out, but unfortunately that  
usually made it almost incomprehensible, too,
not to mention that it very closely ties the template structure to the  
code structure. but that's only going to be a problem if one can  
anticipate major refactorings in the code. we can probably ignore that  
because antlrs code generator is likely
to be pretty stable over time.

i've also noticed that proliferation of setAttribute() calls makes it  
much much harder to follow what's going on, totally agree with passing  
in sensible objects.
most of the time i'm passing in the model objects directly, once in a  
while i'm wrapping several model objects in "view controller" objects,  
just to be able to access related data in the templates without  
requiring me to change my model.

perhaps a sensible approach would be to have a hierarchy of "token ref  
representation" classes, which get instantiated depending on the  
context of that token reference (that would be some decision in a tree  
walker, i guess).
otoh, that would be a 1-to-1 relationship with the number of templates  
again :(
but effectively most of the tokenref templates already are factored  
out a lot, referring to one another and other common elements like  
listLabel.

i think that just by introducing some representation classes the  
templates would become much simpler, for example take the various  
matching templates like lexerStringRef, wildcard et al:
they all have an <if(label)>...<endif> clause. by passing in a label  
representation object, instead of the label string, that could  
collapse down to <label>, pushing some of the logic back into the code  
generator.
it's entirely feasible to have a "null label" that expands to the  
empty string, if there is no label.
i guess that once you start looking for ifs many of them are actually  
of this kind.
another example: wildcardChar vs wildcardCharListLabel. the latter is  
a superset of the former, but now you have two templates instead of  
one, instead of always assuming there is a listlabel, even if that  
might be the null listlabel.

probably i've been chasing templace invocation chains for too long ;)
btw, the example i pasted above used template invocations for the  
various cases before, rendering it completely unreadable over time.  
that kind of invalidates my point, because i've gone back to the ifs,  
but i guess that just illustrates that it is a thin line.

cheers,
-k
-- 
Kay Röpke

Attachment (smime.p7s): application/pkcs7-signature, 2639 bytes
hi!

On Apr 23, 2010, at 2:59 AM, Terence Parr wrote:

> hi, started thinking about it...
>
> http://www.antlr.org/wiki/display/~admin/v4+code+generation

while i totally agree with the goal to reduce the number of templates  
nested conditionals get messy, too.
recently i wrote a relatively simple codegenerator for a custom google  
protobuf implementation, here's one of the simpler templates:

buildProtobuf(field, names) ::= <<
<if(field.transient)><! this field is computed from some other  
field(s), skip it in the merge code !>
// <field;format="variableName"> is transient, no sense in serializing  
it.
<else>
<if(field.repeated)>
// TODO serialize repeated fields properly, depends on the field type:  
native types are fine (even if boxed), message types are not fine at  
all.
<else>
<if(field.messageType)>
<! message type, we should not follow asset links !>
if (has<field;format="methodName">) {
     <names.outerClassName>.<field; format="shortTypeName">.Builder  
<field;format="variableName">Builder = <names.outerClassName>.<field;  
format="shortTypeName">.newBuilder();

< 
field 
;format 
="variableName">Builder.setId(get<field;format="methodName">().getId());

b.set<field;format="methodName">(<field;format="variableName">Builder);
}
<else>
<! plain native type just call protobuf builder !>
if (has<field;format="methodName">) {
     b.set<field;format="methodName">(get<field;format="methodName">());
}
<endif>
<endif>
<endif>
 >>

as you can see, there are only three conditions, but even those make  
it icky to follow already. other parts of the template group are even  
worse, and antlr seems to have even more branches in its codegen.
now, having the multiple templates is not ideal either.
part of the problem stems from not being able to tell which template  
applies in which circumstance.
in the past i've tried to model templates after classes or methods,  
having one template for each variant of the template output, much like  
antlr does it today (although the division in the code isn't as clear  
in antlr3 today).
what i've noticed in that approach is that there are often large  
chunks of text that are common between multiple templates. the next  
step was to factor those common parts out, but unfortunately that  
usually made it almost incomprehensible, too,
not to mention that it very closely ties the template structure to the  
code structure. but that's only going to be a problem if one can  
anticipate major refactorings in the code. we can probably ignore that  
because antlrs code generator is likely
to be pretty stable over time.

i've also noticed that proliferation of setAttribute() calls makes it  
much much harder to follow what's going on, totally agree with passing  
in sensible objects.
most of the time i'm passing in the model objects directly, once in a  
while i'm wrapping several model objects in "view controller" objects,  
just to be able to access related data in the templates without  
requiring me to change my model.

perhaps a sensible approach would be to have a hierarchy of "token ref  
representation" classes, which get instantiated depending on the  
context of that token reference (that would be some decision in a tree  
walker, i guess).
otoh, that would be a 1-to-1 relationship with the number of templates  
again :(
but effectively most of the tokenref templates already are factored  
out a lot, referring to one another and other common elements like  
listLabel.

i think that just by introducing some representation classes the  
templates would become much simpler, for example take the various  
matching templates like lexerStringRef, wildcard et al:
they all have an <if(label)>...<endif> clause. by passing in a label  
representation object, instead of the label string, that could  
collapse down to <label>, pushing some of the logic back into the code  
generator.
it's entirely feasible to have a "null label" that expands to the  
empty string, if there is no label.
i guess that once you start looking for ifs many of them are actually  
of this kind.
another example: wildcardChar vs wildcardCharListLabel. the latter is  
a superset of the former, but now you have two templates instead of  
one, instead of always assuming there is a listlabel, even if that  
might be the null listlabel.

probably i've been chasing templace invocation chains for too long ;)
btw, the example i pasted above used template invocations for the  
various cases before, rendering it completely unreadable over time.  
that kind of invalidates my point, because i've gone back to the ifs,  
but i guess that just illustrates that it is a thin line.

cheers,
-k
--

-- 
Kay Röpke

Jim Idle | 23 Apr 2010 18:33

Re: v4 code gen

Yesterday we were debating what Boolean && and || do to the complexity of templates. My feeling is still that
this works to reduce template complexity without compromising the model/view separation. I often find
myself with nested ifs that would be consolidated if I had those operators. 

Similarly, I wonder if this could not be expanded with a few other constructs that are similarly oriented
towards choice/configuration. Can we have a limited form of switch or do we fall off the end of the map and
the model should pick out a template beforehand. The latter I guess, but does the ensuing complexity of
many templates end up defeating the purpose of MV separation?

Jim

> -----Original Message-----
> From: antlr-dev-bounces@... [mailto:antlr-dev-bounces@...]
> On Behalf Of Kay Röpke
> Sent: Friday, April 23, 2010 4:12 AM
> To: Terence Parr
> Cc: ANTLR-dev Dev
> Subject: Re: [antlr-dev] v4 code gen
> 
> hi!
> 
> On Apr 23, 2010, at 2:59 AM, Terence Parr wrote:
> 
> > hi, started thinking about it...
> >
> > http://www.antlr.org/wiki/display/~admin/v4+code+generation
> 
> 
> while i totally agree with the goal to reduce the number of templates
> nested conditionals get messy, too.
> recently i wrote a relatively simple codegenerator for a custom google
> protobuf implementation, here's one of the simpler templates:
> 
> buildProtobuf(field, names) ::= <<
> <if(field.transient)><! this field is computed from some other
> field(s), skip it in the merge code !> // <field;format="variableName">
> is transient, no sense in serializing it.
> <else>
> <if(field.repeated)>
> // TODO serialize repeated fields properly, depends on the field type:
> native types are fine (even if boxed), message types are not fine at
> all.
> <else>
> <if(field.messageType)>
> <! message type, we should not follow asset links !> if
> (has<field;format="methodName">) {
>      <names.outerClassName>.<field; format="shortTypeName">.Builder
> <field;format="variableName">Builder = <names.outerClassName>.<field;
> format="shortTypeName">.newBuilder();
> 
> <
> field
> ;format
> ="variableName">Builder.setId(get<field;format="methodName">().getId())
> ;
> 
> b.set<field;format="methodName">(<field;format="variableName">Builder);
> }
> <else>
> <! plain native type just call protobuf builder !> if
> (has<field;format="methodName">) {
> 
> b.set<field;format="methodName">(get<field;format="methodName">());
> }
> <endif>
> <endif>
> <endif>
>  >>
> 
> as you can see, there are only three conditions, but even those make it
> icky to follow already. other parts of the template group are even
> worse, and antlr seems to have even more branches in its codegen.
> now, having the multiple templates is not ideal either.
> part of the problem stems from not being able to tell which template
> applies in which circumstance.
> in the past i've tried to model templates after classes or methods,
> having one template for each variant of the template output, much like
> antlr does it today (although the division in the code isn't as clear
> in antlr3 today).
> what i've noticed in that approach is that there are often large chunks
> of text that are common between multiple templates. the next step was
> to factor those common parts out, but unfortunately that usually made
> it almost incomprehensible, too, not to mention that it very closely
> ties the template structure to the code structure. but that's only
> going to be a problem if one can anticipate major refactorings in the
> code. we can probably ignore that because antlrs code generator is
> likely to be pretty stable over time.
> 
> i've also noticed that proliferation of setAttribute() calls makes it
> much much harder to follow what's going on, totally agree with passing
> in sensible objects.
> most of the time i'm passing in the model objects directly, once in a
> while i'm wrapping several model objects in "view controller" objects,
> just to be able to access related data in the templates without
> requiring me to change my model.
> 
> perhaps a sensible approach would be to have a hierarchy of "token ref
> representation" classes, which get instantiated depending on the
> context of that token reference (that would be some decision in a tree
> walker, i guess).
> otoh, that would be a 1-to-1 relationship with the number of templates
> again :( but effectively most of the tokenref templates already are
> factored out a lot, referring to one another and other common elements
> like listLabel.
> 
> i think that just by introducing some representation classes the
> templates would become much simpler, for example take the various
> matching templates like lexerStringRef, wildcard et al:
> they all have an <if(label)>...<endif> clause. by passing in a label
> representation object, instead of the label string, that could collapse
> down to <label>, pushing some of the logic back into the code
> generator.
> it's entirely feasible to have a "null label" that expands to the empty
> string, if there is no label.
> i guess that once you start looking for ifs many of them are actually
> of this kind.
> another example: wildcardChar vs wildcardCharListLabel. the latter is a
> superset of the former, but now you have two templates instead of one,
> instead of always assuming there is a listlabel, even if that might be
> the null listlabel.
> 
> probably i've been chasing templace invocation chains for too long ;)
> btw, the example i pasted above used template invocations for the
> various cases before, rendering it completely unreadable over time.
> that kind of invalidates my point, because i've gone back to the ifs,
> but i guess that just illustrates that it is a thin line.
> 
> cheers,
> -k
> --
> Kay Röpke

Terence Parr | 24 Apr 2010 18:45
Gravatar

Re: v4 code gen

hi guys. I'm mid thought on this, but I'm thinking of trying out  
something akin to an LLVM IR approach. i think of it as doing for  
source code gen what TWIG/BURG did for assembly generation.

Instead of sharing common code chunks between matchToken,  
matchTokenRoot, matchTokenLeaf, etc... with another template and  
instead of lots of template variants, let's try to identify all the  
common chunks among all templates and identify some common operations.  
Then, we make those operations instructions in an IR.  For example,  
token ref T would become

match T

then t=T would become

t = LT(1)
match T

T^ would be

label42 = LT(1)
match T
root label42

rule ref r would be

r()

r[3,"hi"]

becomes

t1=3
t2="hi"
call r, t1..t2

x=r[34]

becomes

t1=34
rv = call r
x.a = rv.a      ; assume r returns values a and b not single value
x.b = rv.b

etc...

The target developer can combine the operations into x=r(34) or push  
args onto software stack (to avoid pred hoisting issues), etc...

I'm thinking of a typed IR like llvm where we have token, tree,  
string, int etc...

Once we have that, it divorces the grammar to code part, though not  
the surrounding class and set up stuff.

One could even imagine some symbol table manipulation instructions.

We could interpret this or translate to source code with 1-to-1  
templates for these canonical operations. We could even go straight to  
LLVM IR from this ANTLR IR for some serious cranking. heh,that's an  
interesting idea.

The good thing about this is that it'd be a well defined interface  
(finally!) for target developers.  We could ALMOST just ask developers  
to identify what assignment, call, hashtable lookup, WHILE, IF looks  
like in their language to get a basic target built pronto.  Beyond  
that we'd need them to identify patterns in the IR to make it higher  
level.  It'll be a balance between high level enough to make it easy  
to map to high level code but low level enough to make it easy to  
share common elements.

[added to wiki]

Ter

Gmane