Chris Butcher (BUNGIE | 1 Sep 2006 01:25
Picon
Favicon

Re: [Algorithms] Balancing number of draw calls vs ease oflighting/shadowing - is big, solid geometry still the ideal?

Another alternative is that you can build your index buffers dynamically and cache them. You could then
have separate index buffers for "NE wall chunks affected by light 1" and "Entire house".

We did something similar to this in Halo 2, where large dynamic objects have a spatial subdivision scheme
that allows us to render only part of an object for lighting/shadowing. Because this was on Xbox 1 we had to
copy the indices into the push buffer every frame so it wasn't a massive overhead to build a custom set of
indices depending on the pieces of the object that were rendered.

For next-gen consoles, building custom command buffers like this is something that we could well see more
of. Particularly since it's an easily parallelisable task, unlike the "main" render thread which is
inherently serial.

--
Chris Butcher
Technical Lead
Bungie Studios
butcher <at> bungie.com

-----Original Message-----
From: gdalgorithms-list-bounces <at> lists.sourceforge.net
[mailto:gdalgorithms-list-bounces <at> lists.sourceforge.net] On Behalf Of Gregory Seegert
Sent: Thursday, August 31, 2006 10:04
To: Game Development Algorithms
Subject: Re: [Algorithms] Balancing number of draw calls vs ease oflighting/shadowing - is big, solid
geometry still the ideal?

An approach I've used in the past is to keep the one large vertex
buffer, but arrange the indices in the index buffer spatially such that
subsets of the geometry can be rendered simply by specifying different
indices.  This way you can still render the entire mesh in one call, or
(Continue reading)

Megan Fox | 1 Sep 2006 01:47
Picon
Gravatar

Re: [Algorithms] Balancing number of draw calls vs ease oflighting/shadowing - is big, solid geometry still the ideal?

To be clear, my current systems do this more or less (the tools just
aren't smart enough to generate both a chunk indexing and a merged
indexing from the same assets, yet), but I was specifically concerned
that pre-splitting was a bad idea / going to bite me later in a big
way.  For instance, I wondered if there was some oddity in the way
data was cached that made user-clip-planing a solid object into X
halves / X passes ended up significantly being faster than pre-split
geometry.

... but if my current approach is good enough for Halo 2, then by gum,
I'll just leave it lie.  Thanks!

On 8/31/06, Chris Butcher (BUNGIE) <cbutcher <at> microsoft.com> wrote:
> Another alternative is that you can build your index buffers dynamically and cache them. You could then
have separate index buffers for "NE wall chunks affected by light 1" and "Entire house".
>
> We did something similar to this in Halo 2, where large dynamic objects have a spatial subdivision scheme
that allows us to render only part of an object for lighting/shadowing. Because this was on Xbox 1 we had to
copy the indices into the push buffer every frame so it wasn't a massive overhead to build a custom set of
indices depending on the pieces of the object that were rendered.
>
> For next-gen consoles, building custom command buffers like this is something that we could well see more
of. Particularly since it's an easily parallelisable task, unlike the "main" render thread which is
inherently serial.
>
> --
> Chris Butcher
> Technical Lead
> Bungie Studios
> butcher <at> bungie.com
(Continue reading)

Chris Butcher (BUNGIE | 1 Sep 2006 03:04
Picon
Favicon

Re: [Algorithms] Balancing number of draw calls vs ease oflighting/shadowing - is big, solid geometry still the ideal?

I'm confused. Your subject line implies that you're worried about the number of draw calls. If you're
"pre-splitting" into multiple meshes then yes, you will increase your draw calls, and this will cost you.
(It'll cost you less on a console platform than on the PC, though.)

However, if you don't actually split the object into multiple meshes, then it can be drawn in its entirety
with a single call.

To more directly answer your question about caching - GPU pipelines are very good at hiding vertex fetch
latency. (Texture fetch latency is harder because you can't look ahead like you can in the index stream.)
So having good locality of vertices within a vertex buffer isn't a huge concern. However, the
pre-transform vertex cache in Xbox 360 is 8KB, organized into 32-byte cache lines. I don't know how big
your vertices are but I bet they are a significant fraction of 32 bytes in size. So you don't have to worry
about fetching a lot of data from memory that you don't end up using.

--
Chris Butcher
Technical Lead
Bungie Studios
butcher <at> bungie.com

-----Original Message-----
From: gdalgorithms-list-bounces <at> lists.sourceforge.net
[mailto:gdalgorithms-list-bounces <at> lists.sourceforge.net] On Behalf Of Megan Fox
Sent: Thursday, August 31, 2006 16:48
To: Game Development Algorithms
Subject: Re: [Algorithms] Balancing number of draw calls vs ease oflighting/shadowing - is big, solid
geometry still the ideal?

To be clear, my current systems do this more or less (the tools just
aren't smart enough to generate both a chunk indexing and a merged
(Continue reading)

Megan Fox | 1 Sep 2006 03:50
Picon
Gravatar

Re: [Algorithms] Balancing number of draw calls vs ease oflighting/shadowing - is big, solid geometry still the ideal?

The mesh geometry for a given structure resides in a single vertex
buffer.  As I render out the different material zones for that
structure, I use ranges out of the index buffer that map to each zone,
where the vertices used by each range are contiguous.  Were I to chunk
the geometry into bits, it would be by splitting each zone into
sub-regions, each mapping to a chunk of geometry no larger than X size
agreeable to the limitations of my shadow buffered light frustums.  I
would also keep the original ranges around, and use those for full
mesh rendering in cases where the surface wasn't a shadow reciever /
could safely be quite large.

Without creating those sub-regions, the next best option I know is to
render the solid mesh, but split it with clip planes added when it is
determined that the mesh is too large to fit within a single lighting
frustum.  Which would be often, when throwing house and terrain-sized
shadow recievers around.  My confusion was whether this ended up being
cheaper, or if it was effectively the same as using the smaller chunks
(just with a lot of excess data being discarded by the planes).  That
was what I meant with caching: whether or not the identical render
with flip-flopped clip plane and different shadow buffer was in some
way more efficient than just switching out the index buffer for half A
and half B of the mesh.

The purpose of the original email was to find, first and foremost, if
the former was an inherently flawed approach that would land me in
trouble down the road.  Then, to find out if the latter was a better
option and worth the hassle, or if there was some other approach I've
never considered that's much better than either.

Your approach in Halo 2 sounded similar to the former, which suggested
(Continue reading)

Chris Butcher (BUNGIE | 1 Sep 2006 04:51
Picon
Favicon

Re: [Algorithms] Balancing number of draw calls vs ease oflighting/shadowing - is big, solid geometry still the ideal?

So when you say "creating sub-regions" you aren't actually changing the structure of your vertex or index
data at all, you are just changing the range of indices within the buffer that is rendered. Like Greg
originally said, that will work fine.

There shouldn't be any serious problems with the vertex cache, the default triangle stripper should be
ordering your vertices for good locality anyway, and you will still be rendering the same triangles in the
same order. The only implication will be for triangles that are on the edges of the rendered section, where
the stripper might have assumed that those vertices would be pre-warmed in the cache, and they won't be if
the adjacent section was not drawn. That should only have a trivial impact.

One decision you have to make is what to do in the case where you have sub-regions 0, 1, 2 and only 0 and 2 need to
be rendered. You then have to make two draw calls because the indices are not contiguous. Or, you could
alternatively build a dynamic index buffer that contains region 0, then a degenerate triangle linking
you to region 2. This has the advantage of keeping the number of draw calls down, but it's more CPU work as you
have to either build a new index buffer or push the data into the command buffer directly. I believe we used
this approach on Halo 2 (since you incur the overhead of memcpying indices into the command buffer anyway,
on Xbox 1).

It's really up to you to decide whether these approaches are worth it compared to just using clip planes. For
large objects with small lights on them (like terrain) you can potentially have a lot of triangle setup,
vertex transformation and the clip test itself that you can skip compared to the clip plane approach. But
if you're only able to save 30% of the vertices then it's probably not worthwhile. This is why we don't
bother splitting up most of our objects in Halo 2, only the large ones get sub-region visibility.

Whatever you do, make sure you're setting the screenspace viewport bounds as tightly as possible before
you render each light or shadow, GPUs are very fast at throwing away triangles outside the viewport. I
think there's some way you can set Z bounds on the Xbox as well but I forget.

- b

(Continue reading)

Brian Karis | 2 Sep 2006 02:58
Picon

Re: [Algorithms] Radiosity Normal Mapping

I was hoping after seeing this that I would not run into this too.
Once I got mine up and going I found the same exact thing that you
did. Lights on flat surfaces display those 3 wedges. Upon doing some
quick math on paper it seems that what I was seeing was correct.
Vectors of the same angle to a surface are not the same intensity for
all angles around the normal thus it shows up with those wedges. For
right now I just have it tweaked it a bit to make it more even but it
still isn't correct. I put a power factor in the transform into the
basis. I also added a biased dot product multiply of the light
direction and the normal so there was a less harsh falloff as a
surface went edge on to the light.

If anyone has any info on how to make this effect go away or not show
up so bad when a light is close to the surface, I too would appreciate
the help. The only thing left I was thinking of trying to was to
transform the incoming light into SH and then use the matrix in NMPRT
paper to transform it to the hl2 basis. I thought maybe putting it to
SH would smooth it out a bit and perhaps would account for the
integral over the hemisphere instead of just sampling the 3 basis
directions but I don't expect it to work any better.

Thanks,
Brian

On 8/13/06, Joe Meenaghan <joe <at> gameinstitute.com> wrote:
>
>
>
>
> Hi Everyone,
(Continue reading)

Eric Haines | 3 Sep 2006 19:32
Picon
Favicon

[Algorithms] Triangle strips still useful?

Is stripification still useful when rendering meshes? That is, is it 
worth turning a mesh into a set of triangle strips?

We're starting to write a 3rd edition of "Real-Time Rendering" (shhh, 
it's a secret... not really). I'm working on the Polygonal Techniques 
chapter, and we have a few pages on stripification, the problem of 
turning a mesh into a set of triangle strips. On the PC, at least, using 
index buffers with vertex buffers seems to make more sense, assuming the 
index buffer isn't goofy and that neighboring triangles in the list 
normally share some vertices.

Anyway, my question is whether stripification is something we can mostly 
cut from the next edition. In one sense stripification could give you 
lists of triangles to use to order your index buffers. But it's kind-of 
overkill for this, and most index buffers that get formed usually have 
sharing of some sort in them. We'll still cover triangle strips 
themselves, of course: they certainly have their use, e.g. drawing thick 
lines or other data where an index buffer is superfluous.

Thoughts?

Eric

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
GDAlgorithms-list mailing list
(Continue reading)

Aras Pranckevicius | 3 Sep 2006 20:12
Picon
Gravatar

Re: [Algorithms] Triangle strips still useful?

> Anyway, my question is whether stripification is something we can mostly
> cut from the next edition. In one sense stripification could give you
> lists of triangles to use to order your index buffers. But it's kind-of
> overkill for this, and most index buffers that get formed usually have
> sharing of some sort in them. We'll still cover triangle strips
> themselves, of course: they certainly have their use, e.g. drawing thick
> lines or other data where an index buffer is superfluous.

On a PC and most other hardware, much more important is vertex cache
optimization. You optimize for the vertex cache first, _then_ you can
try strips that follow this optimized order. If you get less indices
in the end, then maybe use the strips. Strips make more sense on
hardware that supports "restart strip" indices. But still, vertex
cache is the first optimization.

TomF has some good info on this topic:
http://tomsdxfaq.blogspot.com/2005_12_01_tomsdxfaq_archive.html#113546436585770597

Also check out DirectX mailing list archives from 2005 december,
subject "stripping":
http://discussms.hosting.lsoft.com/SCRIPTS/WA-MSD.EXE?A1=ind0512c&L=directxdev#11
and http://discussms.hosting.lsoft.com/SCRIPTS/WA-MSD.EXE?A1=ind0512d&L=directxdev#9

--

-- 
Aras Pranckevicius
work: www.unity3d.com
home: nesnausk.org/nearaz

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
(Continue reading)

Stephan Rose | 3 Sep 2006 20:49

Re: [Algorithms] Triangle strips still useful?

That actually reminds me of a question I've had for a while.

When using Transformed Vertices is there any point in worrying about using
an index list or the vertex cache? Remember, the vertices are already
transformed so the hardware doesn't need to do it.

I do a lot of 2D Rendering and it isn't uncommon for me to dump 18,000 or
more complex primitives at the video card to render which can easily total
over a few hundred thousand vertices. Right now I am just using a
non-indexed triangle list.

Since I am primarily CPU Bound I haven't really worried about trying out
index lists yet.

Stephan

-----Original Message-----
From: gdalgorithms-list-bounces <at> lists.sourceforge.net
[mailto:gdalgorithms-list-bounces <at> lists.sourceforge.net] On Behalf Of Aras
Pranckevicius
Sent: Sunday, September 03, 2006 20:13
To: Game Development Algorithms
Subject: Re: [Algorithms] Triangle strips still useful?

> Anyway, my question is whether stripification is something we can mostly
> cut from the next edition. In one sense stripification could give you
> lists of triangles to use to order your index buffers. But it's kind-of
> overkill for this, and most index buffers that get formed usually have
> sharing of some sort in them. We'll still cover triangle strips
> themselves, of course: they certainly have their use, e.g. drawing thick
(Continue reading)

Eric Haines | 3 Sep 2006 22:33
Picon
Favicon

Re: [Algorithms] Triangle strips still useful?

Aras,

Excellent info, thanks! Lots to chew on in Tom's blog, but just the sort of thing I want. I was wondering if people were using the Hoppe Hilbert curve research - looks like they do. I'll be interested to hear answers from others to see if people actually go this route or not.

Eric


Aras Pranckevicius wrote:
Anyway, my question is whether stripification is something we can mostly cut from the next edition. In one sense stripification could give you lists of triangles to use to order your index buffers. But it's kind-of overkill for this, and most index buffers that get formed usually have sharing of some sort in them. We'll still cover triangle strips themselves, of course: they certainly have their use, e.g. drawing thick lines or other data where an index buffer is superfluous.
On a PC and most other hardware, much more important is vertex cache optimization. You optimize for the vertex cache first, _then_ you can try strips that follow this optimized order. If you get less indices in the end, then maybe use the strips. Strips make more sense on hardware that supports "restart strip" indices. But still, vertex cache is the first optimization. TomF has some good info on this topic: http://tomsdxfaq.blogspot.com/2005_12_01_tomsdxfaq_archive.html#113546436585770597 Also check out DirectX mailing list archives from 2005 december, subject "stripping": http://discussms.hosting.lsoft.com/SCRIPTS/WA-MSD.EXE?A1=ind0512c&L=directxdev#11 and http://discussms.hosting.lsoft.com/SCRIPTS/WA-MSD.EXE?A1=ind0512d&L=directxdev#9
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
GDAlgorithms-list mailing list
GDAlgorithms-list <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
Archives:
http://sourceforge.net/mailarchive/forum.php?forum_id=6188

Gmane