Neil Booth | 1 Sep 03:43 2007
Picon

Re: accuracy of "long double"

Matthias Drochner wrote:-

> 
> neil <at> daikokuya.co.uk said:
> > extern int a[USHRT_MAX * 0 - 1 < 0 ? 1: -1]; 
> 
> Yep, you are right.
> (I first checked DEC OSF/1 which is usually very correct wrt standards
> and found that it has the same problem.)
> Same for UCHAR8_MAX et al...
> 
> > I put in a PR for this a long time ago.
> 
> I see -- lib/31306.
> I've just rebuilt my system with fixed headers and didn't see any fallout.
> So I'll commit fixes in a minute.

Looks great, thanks a lot!

Neil.

Martti Kuparinen | 10 Sep 11:19 2007
Picon
Picon

double fault in 4.0_RC1

Hi,

One of our servers (Dell PE1800) running NetBSD/amd64 4.0_RC1 is very unstable 
since the 3.1.1 to 4.0_RC1 upgrade (upgrade is a wrong word, it was installed 
from scratch). We see quite often "fatal double fault in supervisor mode" 
followed by "trap 13". The whole machine hangs and there is no trace seen on the 
console even though we see "Begin traceback..."

What makes it harder is that the kernel debugger does not work (Ctrl+Alt+Esc) at 
all even though this is the diff against GENERIC.

Any ideas?

Martti

--- GENERIC     2007-09-06 11:04:40.000000000 +0300
+++ D146  2007-09-10 08:08:09.000000000 +0300
 <at>  <at>  -24,7 +24,7  <at>  <at> 

  #ident                 "GENERIC-$Revision: 1.120.2.12 $"

-maxusers       32              # estimated number of users
+maxusers       128             # estimated number of users

  # delay between "rebooting ..." message and hardware reset, in milliseconds
  #options       CPURESET_DELAY=2000
 <at>  <at>  -87,12 +87,12  <at>  <at> 
  # Because gcc omits the frame pointer for any -O level, the line below
  # is needed to make backtraces in DDB work.
  #
(Continue reading)

Blair Sadewitz | 11 Sep 11:08 2007
Picon

optimizations [for non-debugging] amd64 kernels

Taking a cue from FreeBSD, I built a kernel with the following added
to COPTS on amd64:

optimizations:

-finline-limit=8000 --param inline-unit-growth=100 --param
large-function-growth=1000 -frename-registers

safety/paranoia:

-mno-mmx -mno-sse -mno-sse2 -mno-sse3 -mno-3dnow -msoft-float -mfmpath=387

One can also add -fno-asynchronous-unwind-tables, but [unless the
compiler does something wrong] IIRC these tables don't actually get
loaded into memory.

On EMT64 machines, -march=nocona is also a win.

I haven't done any formal benchmarks, but the difference seems clear
when compiling packages.

--Blair

Hubert Feyrer | 11 Sep 11:08 2007
Picon

Re: optimizations [for non-debugging] amd64 kernels

On Tue, 11 Sep 2007, Blair Sadewitz wrote:
> I haven't done any formal benchmarks, but the difference seems clear
> when compiling packages.

Now - speed, time, diskspace, ...?
Got numbers?

  - Hubert

Hubert Feyrer | 11 Sep 11:09 2007
Picon

Re: optimizations [for non-debugging] amd64 kernels

On Tue, 11 Sep 2007, Hubert Feyrer wrote:
> Now - speed, time, diskspace, ...?

Doh, s/Now/How/

Blair Sadewitz | 11 Sep 13:09 2007
Picon

Re: optimizations [for non-debugging] amd64 kernels

On 9/11/07, Hubert Feyrer <hubert <at> feyrer.de> wrote:
> On Tue, 11 Sep 2007, Hubert Feyrer wrote:
> > Now - speed, time, diskspace, ...?
>
> Doh, s/Now/How/
>

Oh, speed/time.  It's a lot faster.  My GENERIC.MP kernel was built
with -march=nocona, so I know it's not that alone.  When I get a
chance, I'll time some compile jobs.

Also, at:

http://bahar.aydogan.net/~blair/amd64-string.diff

is an enhancement for x86_64 memcpy/bzero/bcopy functions in
common/libc.  This is authored by fuyuki <at> hadaly.org and is a slight
modification of the latest version (<see
http://www.hadaly.org/fuyuki>) of what was originally posted in a PR
back around Jan/Feb.
I changed the size given to the cmpq instruction right below the
remark on non-temporal hints to match the cache size of my CPU
(2MB)/4.  I'm not sure what it should be by default.  Also, I added
the #ifndef _KERNEL, as AFAIK the kernel doesn't copy such long
strings.  I've been using this for ~6 mos now with no ill effects
insofar as I can tell.

I shared this with christos <at>  about 6-8 weeks ago,
and he said that it looked good to him.  I posted it to the list also,
but there was no response.
(Continue reading)

Andrew Doran | 11 Sep 13:25 2007
Picon

Re: optimizations [for non-debugging] amd64 kernels

On Tue, Sep 11, 2007 at 07:09:31AM -0400, Blair Sadewitz wrote:

> Also, at:
> 
> http://bahar.aydogan.net/~blair/amd64-string.diff
> 
> is an enhancement for x86_64 memcpy/bzero/bcopy functions in
> common/libc.  This is authored by fuyuki <at> hadaly.org and is a slight
> modification of the latest version (<see
> http://www.hadaly.org/fuyuki>) of what was originally posted in a PR
> back around Jan/Feb.
...
> I'd appreciate it if someone who actually knew x86_64 assembly would
> take a look at this and/or if others would test it so we could get it
> in the tree at some point.

The setup and teardown for stos/movs/cmps are really expensive and for small
strings (like under 512 bytes) you're better off with really simple loops
using the arithemetic instructions.

Andrew

Thor Lancelot Simon | 11 Sep 22:14 2007

Re: >100K interrupts/s on IOAPIC 0 Pin 9

On Tue, Sep 11, 2007 at 05:28:02PM +0200, Edgar Fu? wrote:

[reformatted to not have lines hundreds of characters long; please,
 it's hard to read your email if you don't break lines before 80
 characters!]
>
> I have a freshly installed 4.0_RC2/i386 system where the CPU spends
> more than 84% in interrupt mode when the machine is idle.   With
> sysstat vm, I notice ~106000 interrupts on ioapic0 pin 9. On the
> other hand, dmesg doesn't show anythin interrupting on that pin.
> Any idea how to track this down? I've tracked down some interrupt
> problems on an amd64 system some months ago, but forgotten most
> of it in the meantime.  Please cc: me since I'm not subscribed to
> port-i386, only port-amd64.

This is almost certainly an issue with the ACPI system-controller
interrupt, which is misconfigured at boot time by many systems.
There have been several efforts to add code to NetBSD to fix this
but I don't believe any have been committed as each turned out to
be not quite right in some way.

A workaround may be to turn off ACPI if you have it turned on, or
on if you have it turned off.

--

-- 
  Thor Lancelot Simon	                                     tls <at> rek.tjls.com

  "The inconsistency is startling, though admittedly, if consistency is to
   be abandoned or transcended, there is no problem."	      - Noam Chomsky

(Continue reading)

Edgar Fuß | 12 Sep 18:18 2007
Picon

Re: >100K interrupts/s on IOAPIC 0 Pin 9

> This is almost certainly an issue with the ACPI system-controller
> interrupt, which is misconfigured at boot time by many systems.
What exactly do you mean by ``at boot time''? Do you mean it's
misconfigured by the BIOS (or, the ACPI information given by the BIOS
is wrong or is it misconfigured by NetBSD?

> A workaround may be to turn off ACPI if you have it turned on, or
> on if you have it turned off.
Thanks for that hint. GENERIC.NOACPI doesn't exhibit the problem.

> There have been several efforts to add code to NetBSD to fix this
> but I don't believe any have been committed as each turned out to
> be not quite right in some way.
Any way I can help to improve that? Anything I can test?

David Laight | 16 Sep 00:44 2007
Picon

Re: optimizations [for non-debugging] amd64 kernels

On Tue, Sep 11, 2007 at 12:25:49PM +0100, Andrew Doran wrote:
> On Tue, Sep 11, 2007 at 07:09:31AM -0400, Blair Sadewitz wrote:
> 
> > Also, at:
> > 
> > http://bahar.aydogan.net/~blair/amd64-string.diff
> > 
> > is an enhancement for x86_64 memcpy/bzero/bcopy functions in
> > common/libc.  This is authored by fuyuki <at> hadaly.org and is a slight
> > modification of the latest version (<see
> > http://www.hadaly.org/fuyuki>) of what was originally posted in a PR
> > back around Jan/Feb.
> ...
> > I'd appreciate it if someone who actually knew x86_64 assembly would
> > take a look at this and/or if others would test it so we could get it
> > in the tree at some point.
> 
> The setup and teardown for stos/movs/cmps are really expensive and for small
> strings (like under 512 bytes) you're better off with really simple loops
> using the arithemetic instructions.

Worse still, 'rep movsd' falls foul of the athlon 'store-load' optimiser
when the source and destination addresses are separated by a multiple of
some (relatively small) power of 2 - as they would be for kernel COW.
The code must do 'load, load, store, store' to avoid this.

OTOH ISTR the latest Intel cpu has an optimiser for 'rep movsl' that
performs adequately aligned copies using cache line read-writes.
It might also have fast setup for them ....

(Continue reading)


Gmane