5 Aug 2007 00:20
patch for amd64 asm string functions
Blair Sadewitz <blair.sadewitz <at> gmail.com>
2007-08-04 22:20:31 GMT
2007-08-04 22:20:31 GMT
I'd like to hear opinions on the following: I've been using the amd64 string functions (src/common/lib/libc/arch/x86_64/string/) modified by <fuyuki <at> hadaly.org> (see netbsd-bugs in January) for months now without incident (except when troubleshooting to make sure that it wasn't a problem). I have an EMT64 processor, and after rebuilding the tree with -mtune=nocona and applying this patch, the system is noticeably faster. I have confirmed the results he spoke of in his emails to netbsd-bugs with his memcpy benchmark (see: <http://www.hadaly.org/fuyuki/>). Also, when we move to gcc 4.2, we should probably build the tree with -mtune=generic, which tunes fairly for both AMD and Intel processors. Until then--according to what I've read on the GCC lists and such--the best thing to do is use --mtune=nocona, as the performance hit for AMD processors is negligable (they do, after all, have to compete, and that means running code optimized for Intel processors). On the other hand, EMT64 processors pay a substantial price (up to 20% loss in some benchmarks I've seen) for -mtune=k8 (from -march=k8, our default). I don't think there's much of a difference between -march=nocona and -mtune=nocona anyway, especially now that Intel has cloned AMD's instruction set more faithfully. The only thing I changed from the author's original patch is adding an #ifdef _KERNEL...#endif at lines 62 and 81 of memset.S, as the kernel doesn't do huge memcpy operations AFAIK, and so using the hints there has virtually no chance of being productive and a substantial chance of being counterproductive.(Continue reading)
RSS Feed