Hartmut Kaiser | 1 Sep 01:03 2009
Picon

RE: [Fwd: [Boost-users] [boost] Formal Review: Boost.Polygon starts today August 24, 2009]

> Martin Davis wrote:
> > Hmmm... GEOS comes off rather badly compared to GGL.  Is that because
> >  of memory access issues?  Or perhaps the fact that less code is
> > inlined?
> 
> It's hard to judge, but I'm quite sure inlining is only a small and
> minor optimisation available.
> GEOS and GGL follow completely different programming paradigms.
> GGL is strongly based on static polymorphism resolved and calculated in
> compile-time. This increases changes that compilers will apply finest
> possible optimisations.

>From what I've seen in GEOS and GGL, GEOS is mostly relying on dynamic
memory allocation, where GGL tries to avoid that. GEOS relies on runtime
polymorphism, where GGL relies on compile time (static polymorphism). GEOS
is built using Java-ish constructs, where GGL is build the C++ way...

All of this influences the runtime behavior, but there might be more
reasons. Please note, I didn't do any thorough analysis, all of this is just
guessing.

Regards Hartmut
Barend Gehrels | 2 Sep 01:21 2009
Picon

Re: [Fwd: [Boost-users] [boost] Formal Review: Boost.Polygon starts today August 24, 2009]


Hartmut Kaiser wrote:
Martin Davis wrote:
Hmmm... GEOS comes off rather badly compared to GGL. Is that because of memory access issues? Or perhaps the fact that less code is inlined?
It's hard to judge, but I'm quite sure inlining is only a small and minor optimisation available. GEOS and GGL follow completely different programming paradigms. GGL is strongly based on static polymorphism resolved and calculated in compile-time. This increases changes that compilers will apply finest possible optimisations.
>From what I've seen in GEOS and GGL, GEOS is mostly relying on dynamic memory allocation, where GGL tries to avoid that. GEOS relies on runtime polymorphism, where GGL relies on compile time (static polymorphism). GEOS is built using Java-ish constructs, where GGL is build the C++ way... All of this influences the runtime behavior, but there might be more reasons. Please note, I didn't do any thorough analysis, all of this is just guessing.
I do not know GEOS very well but I think that Hartmut is right here. GGL uses the std-library for storage of coordinates. If I'm right, GEOS calls a memory allocation for every point (of a polygon). And indeed GGL uses static polymorphism. The consequence of this is also that there is no polygon.area() function, but an area(polygon) function, taking any polygon which fulfills the polygon concept.

Regards, Barend


_______________________________________________
geos-devel mailing list
geos-devel <at> lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/geos-devel
Barend Gehrels | 2 Sep 01:31 2009
Picon

Re: [Fwd: [Boost-users] [boost] Formal Review: Boost.Polygon starts today August 24, 2009]


Martin Davis wrote:
> Ok, I can see that.  So in other words, GEOS uses a dynamic coordinate 
> access paradigm, which gives flexibility to access different data 
> structures, but can't be optimized by the compiler?
It is not only the compiler. Dynamic memory allocations are relatively 
slow. Allocating 1000 times a point is slower than allocating once 1000 
points.
>
> Is this the reason for the performance difference for *all* the other 
> libraries which beat it in peformance?  Or maybe some of them *don't* 
> provide the dynamic data structure wrapper, and hence also can be 
> optimized by the compiler (but thus they are less adaptable for use 
> with external data structures).
I don't know this for sure but I think most allocate for the whole 
polygon or linestring at once. GPC is e.g. C, not C++.
>
> I presume it would be a big job to convert GEOS to a template-based 
> paradigm?  
Probably. What would be possible and feasable is adapting GEOS's 
datastructures (polygon) to GGL's concepts and then call e.g. GGL's 
intersection. It would at least be a nice experiment. You still have the 
dynamic memory then, but you can see which part is the algorithm and 
which part is the memory access

> It's somewhat annoying that the problem of efficient memory access and 
> compiler optimization is quite orthogonal to the actual geometric 
> algorithms, and yet it seems difficult to express the algorithms in a 
> sufficiently abstract way to allow optimizations to take place.
I don't see this. Using the std-library, both access to a vector and 
(temporary) storage using a vector or deque are usually quite fast.

Regards, Barend
Martin Davis | 2 Sep 01:57 2009
Picon

Re: [Fwd: [Boost-users] [boost] Formal Review: Boost.Polygon starts today August 24, 2009]

One thing to make clear - GEOS does not alway require allocations to be 
done pointwise.  The CoordinateSequence interface and various subclasses 
provide the option of allocating all storage for a list of points in a 
single allocation.

I would certainly be curious to see how the algorithmic component of 
GEOS compares to GGL (or any other library).  The JTS/GEOS predicate and 
overlay algorithms were developed with the primary goal being first 
generality and then performance.  So no doubt there's better ways of 
doing things. 

In the GEOS algorithms at least I don't see any way of avoiding dynamic 
allocation, since there's no way to predict a priori how many line 
segment intersections will be found, or how what the structure of the 
output geometry is.  Does GGL avoid this problem in some way?

Barend Gehrels wrote:
>
>
> Martin Davis wrote:
>> Ok, I can see that.  So in other words, GEOS uses a dynamic 
>> coordinate access paradigm, which gives flexibility to access 
>> different data structures, but can't be optimized by the compiler?
> It is not only the compiler. Dynamic memory allocations are relatively 
> slow. Allocating 1000 times a point is slower than allocating once 
> 1000 points.
>>
>> Is this the reason for the performance difference for *all* the other 
>> libraries which beat it in peformance?  Or maybe some of them *don't* 
>> provide the dynamic data structure wrapper, and hence also can be 
>> optimized by the compiler (but thus they are less adaptable for use 
>> with external data structures).
> I don't know this for sure but I think most allocate for the whole 
> polygon or linestring at once. GPC is e.g. C, not C++.
>>
>> I presume it would be a big job to convert GEOS to a template-based 
>> paradigm?  
> Probably. What would be possible and feasable is adapting GEOS's 
> datastructures (polygon) to GGL's concepts and then call e.g. GGL's 
> intersection. It would at least be a nice experiment. You still have 
> the dynamic memory then, but you can see which part is the algorithm 
> and which part is the memory access
>
>> It's somewhat annoying that the problem of efficient memory access 
>> and compiler optimization is quite orthogonal to the actual geometric 
>> algorithms, and yet it seems difficult to express the algorithms in a 
>> sufficiently abstract way to allow optimizations to take place.
> I don't see this. Using the std-library, both access to a vector and 
> (temporary) storage using a vector or deque are usually quite fast.
>
> Regards, Barend
>
>
>
> _______________________________________________
> geos-devel mailing list
> geos-devel <at> lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/geos-devel
>

--

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022
Barend Gehrels | 2 Sep 11:39 2009
Picon

Re: [Fwd: [Boost-users] [boost] Formal Review: Boost.Polygon starts today August 24, 2009]


Martin Davis wrote:
> One thing to make clear - GEOS does not alway require allocations to 
> be done pointwise.  The CoordinateSequence interface and various 
> subclasses provide the option of allocating all storage for a list of 
> points in a single allocation.
OK, sorry for my misunderstanding.

I just debugged the Geos area routine, what could then be the reason 
that it is relatively slow? It might be that it is accessed as a vector, 
indeed, but not as an iterator, so getting points using "return 
(*vect)[pos]" is not efficient as it could be using iterators. Besides 
that it is probably no problem to make the getAt function inline.

>
> I would certainly be curious to see how the algorithmic component of 
> GEOS compares to GGL (or any other library). 

The GGL overlay algorithm will be described in a paper. It is a modern 
variant of the classic Weiler-Atherton algorithm.

> The JTS/GEOS predicate and overlay algorithms were developed with the 
> primary goal being first generality and then performance.  So no doubt 
> there's better ways of doing things.
Actually the GGL algorithm is generic as well, in the sense that it is 
able to overlay polygons and polygon/rectangle (clipping) with the same 
algorithm. The only difference is getting the intersections, which is 
more efficient for the rectangle case. Also intersection/union are 
implemented in one algorithm.

> In the GEOS algorithms at least I don't see any way of avoiding 
> dynamic allocation, since there's no way to predict a priori how many 
> line segment intersections will be found, or how what the structure of 
> the output geometry is.  Does GGL avoid this problem in some way?
Yes. It will be described, but in short: we don't insert the found 
intersection points into the polygons (as done in many algorithms, but I 
don't know about GEOS) but register them separately. That saves a copy, 
plus many inserts (shifts). However, it requires somewhat more 
bookkeeping. So intersected polygons are assembled in the end, in 
between there are no large pieces of memory allocation.

By the way, everyone is of course invited to check the geos comparison 
source files, it might be that I've done something not in the most 
efficient way. It is the intention to have it as efficient as possible, 
of course. Sources are at boost SVN now: 
http://svn.boost.org/svn/boost/sandbox/ggl/other/comparisons/

Regards, Barend
strk | 2 Sep 11:53 2009
Picon

Re: [Fwd: [Boost-users] [boost] Formal Review: Boost.Polygon starts today August 24, 2009]

On Wed, Sep 02, 2009 at 11:39:32AM +0200, Barend Gehrels wrote:

> I just debugged the Geos area routine, what could then be the reason 
> that it is relatively slow? It might be that it is accessed as a vector, 
> indeed, but not as an iterator, so getting points using "return 
> (*vect)[pos]" is not efficient as it could be using iterators. Besides 
> that it is probably no problem to make the getAt function inline.

getAt is a virtual function, can't be inlined.
Templates would make it a lot better, but far away 
from 1:1 mapping with Java.

Beside, how do you handle memory management in GGL ?

--strk;

 Free GIS & Flash consultant/developer      ()  ASCII Ribbon Campaign
 http://foo.keybit.net/~strk/services.html  /\  Keep it simple! 
Barend Gehrels | 2 Sep 13:45 2009
Picon

Re: [Fwd: [Boost-users] [boost] Formal Review: Boost.Polygon starts today August 24, 2009]


I just debugged the Geos area routine, what could then be the reason that it is relatively slow? It might be that it is accessed as a vector, indeed, but not as an iterator, so getting points using "return (*vect)[pos]" is not efficient as it could be using iterators. Besides that it is probably no problem to make the getAt function inline.
getAt is a virtual function, can't be inlined.
Right, I debugged in CoordinateArraySequence, where it is not denoted virtual (the overloaded version is...), but I now saw that its parent indeed is virtual.

A (probably) working way to have less virtual calls (for area) then would be changing this:
        double bx=ring->getAt(i).x;
        double by=ring->getAt(i).y;
        double cx=ring->getAt(i+1).x;
        double cy=ring->getAt(i+1).y;

to this: Coordinate const& c = ring->getAt(i + 1 );
double cx = c.x, cy = c.y;
and have a previous_coordinate (b) for the getAt(i). Would save 3 virtual calls + 3 times non-iterator-based vector access... I didn't try it.

This just in search of what would be the bottle neck, because the area algorithm itself is extremely simple.



Templates would make it a lot better, but far away from 1:1 mapping with Java.
Sure, though Java also have templates now

Beside, how do you handle memory management in GGL ?
1) Using the std:: library. Where it is an optional template parameter, so you can have polygon<point> but also polygon<point, std::deque> for if you decide that you want a deque instead of a vector
2) Using concepts, actually a polygon or linestring can be anything as long as it fullfiles the concept. So having an iterator (for linestring that is enough), having traits te denote exterior/interior ring handling. In this way our library does not even now what is below the surface, it just uses it.

Regards, Barend

_______________________________________________
geos-devel mailing list
geos-devel <at> lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/geos-devel
Mateusz Loskot | 2 Sep 17:07 2009
Picon

Re: [Fwd: [Boost-users] [boost] Formal Review: Boost.Polygon starts today August 24, 2009]

Barend Gehrels wrote:
> 
>>> I just debugged the Geos area routine, what could then be the reason
>>> that it is relatively slow? It might be that it is accessed as a
>>> vector, indeed, but not as an iterator, so getting points using
>>> "return (*vect)[pos]" is not efficient as it could be using
>>> iterators. Besides that it is probably no problem to make the getAt
>>> function inline.
>>>     
>>
>> getAt is a virtual function, can't be inlined.
>>   
> Right, I debugged in CoordinateArraySequence, where it is not denoted
> virtual (the overloaded version is...), but I now saw that its parent
> indeed is virtual.
> 
> A (probably) working way to have less virtual calls (for area) then
> would be changing this:
>        double bx=ring->getAt(i).x;
>        double by=ring->getAt(i).y;
>        double cx=ring->getAt(i+1).x;
>        double cy=ring->getAt(i+1).y;

It would be not a bad idea to help compiler to optimise as much as
possible:

double const cx = ...

Most people think it's too trivial to influence generated code,
but it may really help.

> to this:
> Coordinate const& c = ring->getAt(i + 1 );
> double cx = c.x, cy = c.y;

Best option possible.

> and have a previous_coordinate (b) for the getAt(i). Would save 3
> virtual calls + 3 times non-iterator-based vector access... I didn't try
> it.

Anyway, all the small things that "seem" trivial, make a big difference
in fact.

>> Templates would make it a lot better, but far away from 1:1 mapping
>> with Java.
>>   
> Sure, though Java also have templates now
> 
>> Beside, how do you handle memory management in GGL ?
>>   
> 1) Using the std:: library. Where it is an optional template parameter,
> so you can have polygon<point> but also polygon<point, std::deque> for
> if you decide that you want a deque instead of a vector

Plus, you can use your own containers or you can use your own allocators
with standard containers. It's not uncommon situation when specialised
allocators are really a good idea.

> 2) Using concepts, actually a polygon or linestring can be anything as
> long as it fullfiles the concept. So having an iterator (for linestring
> that is enough), having traits te denote exterior/interior ring
> handling. In this way our library does not even now what is below the
> surface, it just uses it.

This is an important detail.

One of the major feature of GGL is that it's extensible and
applicable to user-defined types that conform to concepts and contracts
defined by GGL. It is realised with use of abstractions
like iterator, range, collection which are orthogonal to specific types.

This makes it possible to design set of geometry or geometry-like
types or managed with sophisticated allocators, memory pools, etc.
and use them with GGL.

For example, it would be possible to specialise ggl::polygon with
stxxl::vector [1] and apply GGL algorithms on such container capable to
store very large number of elements.

[1] http://stxxl.sourceforge.net/

There are much more powerful options available like use of
concept of views [2], iterator adaptors [3], etc.

[2] http://www.zib.de/weiser/vtl/
[3] http://www.boost.org/doc/libs/1_39_0/libs/iterator/doc/index.html

Imagine calculation of length or transformation of coordinate of N
linestrings, both of M points, in single-pass using zip_iterator :-)

Best regards,
--

-- 
Mateusz Loskot, http://mateusz.loskot.net
Charter Member of OSGeo, http://osgeo.org
Martin Davis | 2 Sep 18:00 2009
Picon

Re: [Fwd: [Boost-users] [boost] Formal Review: Boost.Polygon starts today August 24, 2009]


> The GGL overlay algorithm will be described in a paper. It is a modern 
> variant of the classic Weiler-Atherton algorithm.
Have you done anything to make the algorithm robust?  My impression of 
this algorithm is that it might be more complex to make robust (although 
perhaps you can handle that in the intersection determination).
>
>> The JTS/GEOS predicate and overlay algorithms were developed with the 
>> primary goal being first generality and then performance.  So no 
>> doubt there's better ways of doing things.
> Actually the GGL algorithm is generic as well, in the sense that it is 
> able to overlay polygons and polygon/rectangle (clipping) with the 
> same algorithm. The only difference is getting the intersections, 
> which is more efficient for the rectangle case. Also 
> intersection/union are implemented in one algorithm.
The JTS/GEOS generality extends to handling all combinations of 
polygons, lines and points. 

As you say, rectangles are special cases of polygons which allow faster 
intersection detection, and which don't have any self-intersections, so 
don't need an initial topology building step.  JTS/GEOS has some 
optimized code for rectangle handling, but there's still opportunity for 
further optimization.
>
>> In the GEOS algorithms at least I don't see any way of avoiding 
>> dynamic allocation, since there's no way to predict a priori how many 
>> line segment intersections will be found, or how what the structure 
>> of the output geometry is.  Does GGL avoid this problem in some way?
> Yes. It will be described, but in short: we don't insert the found 
> intersection points into the polygons (as done in many algorithms, but 
> I don't know about GEOS) but register them separately. That saves a 
> copy, plus many inserts (shifts). However, it requires somewhat more 
> bookkeeping. So intersected polygons are assembled in the end, in 
> between there are no large pieces of memory allocation.

JTS/GEOS uses the same approach. As you say, it's more efficient to 
collect the intersection points first and then create the split edges.

--

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022
Martin Davis | 2 Sep 18:11 2009
Picon

Re: [Fwd: [Boost-users] [boost] Formal Review: Boost.Polygon starts today August 24, 2009]


Barend Gehrels wrote:
>  
>> Templates would make it a lot better, but far away 
>> from 1:1 mapping with Java.
>>   
> Sure, though Java also have templates now
I'm no expert in this, but my understanding is that Java templates are 
purely for type management, and don't really provide much in the way of 
compile-time optimization (since they don't change the underlying memory 
allocation model of Java).  So it's not clear to me that rewriting JTS 
to use templates would provide optimal code via a direct port to C++.  
It might get closer, I guess, but maybe there'd still be some semantic 
rewriting required for maximum optimization. But I'd be happy for 
someone to prove this conjecture wrong  8^)

--

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022

Gmane