Lin Xu | 5 Dec 2002 21:01
Favicon

large size hash-tables and CPU time


We are running ACL 6.2 on Sun OS and wonder what we are wrong..

Our implementation relies on storing data in hash-tables.  We do not
know a priori the number of data items to be stored (could vary from
20 data items to 60,000 even more).

We noticed (with profiler) that, with a fixed hash-table size, much
effort is spent on (re-)hashing.  So, we tried to declare the size of
the hashtable as some function of the input parameters.  We use the
following rehash arguments :rehash-threshold 0.9 :rehash-size 1.5.

When we run our experiments, the CPU time performance (measured with
the function time, w/o accounting for the effort spent on gc) is very
erratic: sometimes the fixed size hash-table shows a better
performance sometimes the opposite is true... 

This is a serious problem as it is preventing us from drawing any
conclusions WRT to the `algorithmic' benefits of our research.

Are we doing anything wrong?  how can we improve our implementation in
order to have a more `stable' results?

Thank you for any pointer.

Lin XU

                                                                                 
               .'*                                                              
            :`...' `.,'  '                                                      
(Continue reading)

Marc LeBrun | 5 Dec 2002 23:18

Re: large size hash-tables and CPU time

I don't know the details of ACL's hash tables specifically, but I've 
implemented a few of my own, so perhaps I can suggest something.

First, to avoid the rehash costs you should make the table big to start 
with, so it will basically never have to rehash.  I would suggest making it 
twice the largest expected number of elements, 120K.  This is not really a 
very big number on today's systems.  A reasonable hash table implementation 
(which I assume ACL's is) probably only uses a small number of words per 
entry: one for the pointer to the object, one for the key and at most a few 
more for caching or bookkeeping.

More generally I would suggest using a threshold of 0.5 (or less).  This 
means the table rehashes when half full.  This assures that the number of 
expected collisions will be very low.  When you crowd the table with 
thresholds like 0.9, then, as it gets full, a greater percentage of the 
probes will miss and have to reprobe multiple times.  This can have a big 
unpredictable impact on performance.  For similar reasons I would never use 
a growth factor of less than 2.  Otherwise, in a big run, you will just 
wind up rehashing multiple times to get the table big enough and sparse enough.

Beyond that, system effects (such as inter-process locking to make the hash 
table thread safe) can have a big performance impact.  Look for, and try to 
reduce, extraneous sources of degradation from factors like this.

Lastly, instead of using system-supplied general purpose hash tables, 
consider implementing your own (using arrays etc), specifically tuned to 
your application (eg with a fast special purpose hash function).  Recently 
I did this in a large commercial system, and found that my hash tables 
metered over 100 times faster than the Microsoft system map classes we had 
been using!
(Continue reading)

eric | 6 Dec 2002 00:10

large size hash-tables and CPU time

Lin Xu writes:
 > 
 > We are running ACL 6.2 on Sun OS and wonder what we are wrong..
 > 
 > Our implementation relies on storing data in hash-tables.  We do not
 > know a priori the number of data items to be stored (could vary from
 > 20 data items to 60,000 even more).
 > 
 > We noticed (with profiler) that, with a fixed hash-table size, much
 > effort is spent on (re-)hashing.  So, we tried to declare the size of
 > the hashtable as some function of the input parameters.  We use the
 > following rehash arguments :rehash-threshold 0.9 :rehash-size 1.5.

Are you using EQ or EQUAL hashtables?  If EQ hashtables, it might
be you are losing due to gsgc.  EQ hashtables hash on address
and therefore have to be rehashed whenever the elements are
moved due to gsgc.  That would also explain the erratic behavior
you mention.

/Eric M/
Reasoning, Inc

Shannon Spires | 6 Dec 2002 01:25
Picon

Re: large size hash-tables and CPU time

> We noticed (with profiler) that, with a fixed hash-table size, much
> effort is spent on (re-)hashing.  So, we tried to declare the size of
> the hashtable as some function of the input parameters.  We use the
> following rehash arguments :rehash-threshold 0.9 :rehash-size 1.5.
> 
> When we run our experiments, the CPU time performance (measured with
> the function time, w/o accounting for the effort spent on gc) is very
> erratic: sometimes the fixed size hash-table shows a better
> performance sometimes the opposite is true... 

Are you using eq hash tables? Typically, eq hash tables have to be rehashed whenever
the gc runs. This may also be true on eql hash tables depending on what they
actually contain as keys.

In addition, hash table traversal with maphash and looping on hashtables may require
locking or copying the whole table. Probably best to avoid these if you're trying to
get reproducible timings.

I don't know anything about the specifics of ACL's hash tables, but these are a couple
of issues I've run into with various CL implementations.

--

-- 
Shannon Spires
svspire <at> sandia.gov

feilong | 6 Dec 2002 18:48
Picon

Definition of mapcar

Hallo,
Ich have tried to write the definition of  mapcar, but  I  don't 
 succeed,  because  funktion  mapcar  can  take  any  numbers of 
 arguments and the recursion procedure is very comlicated. For  example :
 >(mapcar #'+ '(1 2 3))
 >6
 >(mapcar #'+  '(1 2 3)  '(24 3)  '(89 43))
 >(114 48)

feilong
Thank you

Paul Dietz | 6 Dec 2002 20:19

Re: Definition of mapcar

feilong wrote:
> 
> Hallo,
> Ich have tried to write the definition of  mapcar, but  I  don't
>  succeed,  because  funktion  mapcar  can  take  any  numbers of
>  arguments and the recursion procedure is very comlicated. For  example :
>  >(mapcar #'+ '(1 2 3))
>  >6
>  >(mapcar #'+  '(1 2 3)  '(24 3)  '(89 43))
>  >(114 48)

(mapcar #'+ '(1 2 3)) ==> (1 2 3), not 6

See the CL Hyperspec for a thorough definition of mapcar.
http://www.franz.com/support/documentation/6.2/ansicl/dictentr/mapcmapc.htm

	Paul

Matthew Danish | 11 Dec 2002 16:37
Picon

funcalling foreign function pointers

I'm trying to emulate the behavior of LispWork's
DEFINE-FOREIGN-FUNCALLABLE [1] in Allegro CL 6.2.  The current
definition I have is

(def-foreign-funcallable int-caller :int ((x :int)))

  ==>

(defun int-caller (fptr x)
  (system::ff-funcall fptr ':int ':int))

But this segv's.  I obtained system::ff-funcall from the macroexpansion
of def-foreign-call exhibited in the manual, but there are a few
differences I'd like to make a note of first:

;; This taken from def-foreign-call macroexpansion
(system::ff-funcall
  (load-time-value
    (excl::determine-foreign-address '("foo" :language :c)
                                     2
                                     nil))
  '(:int (integer * *))
  x
  '((* :char)
    (array character (*)))
  y
  '(:int (integer * *)))

As you can see, there is a lot more information here.  The conversion of 
:int to '(:int (integer * *)) is done by the def-foreign-call macro, for
(Continue reading)


Gmane