Re: v4 code gen
hi!
On Apr 23, 2010, at 2:59 AM, Terence Parr wrote:
> hi, started thinking about it...
>
> http://www.antlr.org/wiki/display/~admin/v4+code+generation
while i totally agree with the goal to reduce the number of templates
nested conditionals get messy, too.
recently i wrote a relatively simple codegenerator for a custom google
protobuf implementation, here's one of the simpler templates:
buildProtobuf(field, names) ::= <<
<if(field.transient)><! this field is computed from some other
field(s), skip it in the merge code !>
// <field;format="variableName"> is transient, no sense in serializing
it.
<else>
<if(field.repeated)>
// TODO serialize repeated fields properly, depends on the field type:
native types are fine (even if boxed), message types are not fine at
all.
<else>
<if(field.messageType)>
<! message type, we should not follow asset links !>
if (has<field;format="methodName">) {
<names.outerClassName>.<field; format="shortTypeName">.Builder
<field;format="variableName">Builder = <names.outerClassName>.<field;
format="shortTypeName">.newBuilder();
<
field
;format
="variableName">Builder.setId(get<field;format="methodName">().getId());
b.set<field;format="methodName">(<field;format="variableName">Builder);
}
<else>
<! plain native type just call protobuf builder !>
if (has<field;format="methodName">) {
b.set<field;format="methodName">(get<field;format="methodName">());
}
<endif>
<endif>
<endif>
>>
as you can see, there are only three conditions, but even those make
it icky to follow already. other parts of the template group are even
worse, and antlr seems to have even more branches in its codegen.
now, having the multiple templates is not ideal either.
part of the problem stems from not being able to tell which template
applies in which circumstance.
in the past i've tried to model templates after classes or methods,
having one template for each variant of the template output, much like
antlr does it today (although the division in the code isn't as clear
in antlr3 today).
what i've noticed in that approach is that there are often large
chunks of text that are common between multiple templates. the next
step was to factor those common parts out, but unfortunately that
usually made it almost incomprehensible, too,
not to mention that it very closely ties the template structure to the
code structure. but that's only going to be a problem if one can
anticipate major refactorings in the code. we can probably ignore that
because antlrs code generator is likely
to be pretty stable over time.
i've also noticed that proliferation of setAttribute() calls makes it
much much harder to follow what's going on, totally agree with passing
in sensible objects.
most of the time i'm passing in the model objects directly, once in a
while i'm wrapping several model objects in "view controller" objects,
just to be able to access related data in the templates without
requiring me to change my model.
perhaps a sensible approach would be to have a hierarchy of "token ref
representation" classes, which get instantiated depending on the
context of that token reference (that would be some decision in a tree
walker, i guess).
otoh, that would be a 1-to-1 relationship with the number of templates
again :(
but effectively most of the tokenref templates already are factored
out a lot, referring to one another and other common elements like
listLabel.
i think that just by introducing some representation classes the
templates would become much simpler, for example take the various
matching templates like lexerStringRef, wildcard et al:
they all have an <if(label)>...<endif> clause. by passing in a label
representation object, instead of the label string, that could
collapse down to <label>, pushing some of the logic back into the code
generator.
it's entirely feasible to have a "null label" that expands to the
empty string, if there is no label.
i guess that once you start looking for ifs many of them are actually
of this kind.
another example: wildcardChar vs wildcardCharListLabel. the latter is
a superset of the former, but now you have two templates instead of
one, instead of always assuming there is a listlabel, even if that
might be the null listlabel.
probably i've been chasing templace invocation chains for too long ;)
btw, the example i pasted above used template invocations for the
various cases before, rendering it completely unreadable over time.
that kind of invalidates my point, because i've gone back to the ifs,
but i guess that just illustrates that it is a thin line.
cheers,
-k
--
Kay Röpke
hi!
On Apr 23, 2010, at 2:59 AM, Terence Parr wrote:
> hi, started thinking about it...
>
> http://www.antlr.org/wiki/display/~admin/v4+code+generation
while i totally agree with the goal to reduce the number of templates
nested conditionals get messy, too.
recently i wrote a relatively simple codegenerator for a custom google
protobuf implementation, here's one of the simpler templates:
buildProtobuf(field, names) ::= <<
<if(field.transient)><! this field is computed from some other
field(s), skip it in the merge code !>
// <field;format="variableName"> is transient, no sense in serializing
it.
<else>
<if(field.repeated)>
// TODO serialize repeated fields properly, depends on the field type:
native types are fine (even if boxed), message types are not fine at
all.
<else>
<if(field.messageType)>
<! message type, we should not follow asset links !>
if (has<field;format="methodName">) {
<names.outerClassName>.<field; format="shortTypeName">.Builder
<field;format="variableName">Builder = <names.outerClassName>.<field;
format="shortTypeName">.newBuilder();
<
field
;format
="variableName">Builder.setId(get<field;format="methodName">().getId());
b.set<field;format="methodName">(<field;format="variableName">Builder);
}
<else>
<! plain native type just call protobuf builder !>
if (has<field;format="methodName">) {
b.set<field;format="methodName">(get<field;format="methodName">());
}
<endif>
<endif>
<endif>
>>
as you can see, there are only three conditions, but even those make
it icky to follow already. other parts of the template group are even
worse, and antlr seems to have even more branches in its codegen.
now, having the multiple templates is not ideal either.
part of the problem stems from not being able to tell which template
applies in which circumstance.
in the past i've tried to model templates after classes or methods,
having one template for each variant of the template output, much like
antlr does it today (although the division in the code isn't as clear
in antlr3 today).
what i've noticed in that approach is that there are often large
chunks of text that are common between multiple templates. the next
step was to factor those common parts out, but unfortunately that
usually made it almost incomprehensible, too,
not to mention that it very closely ties the template structure to the
code structure. but that's only going to be a problem if one can
anticipate major refactorings in the code. we can probably ignore that
because antlrs code generator is likely
to be pretty stable over time.
i've also noticed that proliferation of setAttribute() calls makes it
much much harder to follow what's going on, totally agree with passing
in sensible objects.
most of the time i'm passing in the model objects directly, once in a
while i'm wrapping several model objects in "view controller" objects,
just to be able to access related data in the templates without
requiring me to change my model.
perhaps a sensible approach would be to have a hierarchy of "token ref
representation" classes, which get instantiated depending on the
context of that token reference (that would be some decision in a tree
walker, i guess).
otoh, that would be a 1-to-1 relationship with the number of templates
again :(
but effectively most of the tokenref templates already are factored
out a lot, referring to one another and other common elements like
listLabel.
i think that just by introducing some representation classes the
templates would become much simpler, for example take the various
matching templates like lexerStringRef, wildcard et al:
they all have an <if(label)>...<endif> clause. by passing in a label
representation object, instead of the label string, that could
collapse down to <label>, pushing some of the logic back into the code
generator.
it's entirely feasible to have a "null label" that expands to the
empty string, if there is no label.
i guess that once you start looking for ifs many of them are actually
of this kind.
another example: wildcardChar vs wildcardCharListLabel. the latter is
a superset of the former, but now you have two templates instead of
one, instead of always assuming there is a listlabel, even if that
might be the null listlabel.
probably i've been chasing templace invocation chains for too long ;)
btw, the example i pasted above used template invocations for the
various cases before, rendering it completely unreadable over time.
that kind of invalidates my point, because i've gone back to the ifs,
but i guess that just illustrates that it is a thin line.
cheers,
-k
--
--
Kay Röpke