1 Nov 2004 12:21
Re: Hash table performance relative to association lists
Nicolas Neuss <Nicolas.Neuss <at> iwr.uni-heidelberg.de>
2004-11-01 11:21:03 GMT
2004-11-01 11:21:03 GMT
Hello, the problem is the same with CMUCL. Looking at the code there, I see that GETHASH is not short, not optimized, and has to use WITHOUT-GCING. Another overhead of hash-tables via lists is that the default size is quite large (65 for CMUCL, other values can be given, but later in the code a minimal size of 37 is ensured). I once tried to change something there, but my change broke CMUCL and I refrained. Faster and (by default) smaller hash-tables would be really nice to have in both SBCL and CMUCL. Nicolas. P.S.: Such an improvement might speed up my programs noticeably. Therefore, I would like to help a little. If someone manages to improve this in SBCL/CMUCL significantly (say such that list and hash-table access are at equal speed for about 10-20 items and that the default size is also in this range), I would like to offer a Amazon gift certificate of 50 EUR value. ------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
by removing
:SB-UNICODE in customize-target-features.lisp.
I have only tested this on x86/Linux; it is entirely possible that I
have goofed in merging things (for the PPC) or just implementing them
blind (for the other platforms); I warmly encourage testing of simple
buildability earlier rather than later in the development cycle -- we
still have 18 whole days to go before freezing this month's sbcl, so
there is plenty of time to deal with problem reports as long as we get
them early enough.
There are some known problems. Firstly, as I mentioned on this list
before (without anyone coming to offer a strong opinion, as I recall)
the sb-md5 contrib fails its tests. The problem lies in the
MD5SUM-SEQUENCE function operating on a string: there is an underlying
assumption that the character sequence in memory has an identical
octet layout as the same sequence of characters in a file; this
assumption does not hold with :SB-UNICODE enabled. (And indeed it
doesn't even hold in the old world for high-bit characters, which are
typically encoded in utf-8 on disk but in latin1 in memory.) It is
likely that there is third-party software (such as SB-SHA1, Blowfish
and the like) which is similarly afflicted by this change.
Secondly, there are performance implications in all these changes.
Nothing should be too drastically pessimized, but if there is
something which has degraded performance it would be good to know

RSS Feed