KettosAllampolgarsag | 3 Nov 13:12 2004
Picon

Nyújts feléje védo kart!


Nyújts feléje védo kart!

A fenti sor a Himnuszból való. A Himnusz költoje, Kölcsey Ferenc, ha ma élne, nem érvényesülhetne
az anyanyelvén, nem intézhetné hivatali ügyeit magyarul, Magyarországra csak 500 euró
felmutatása árán léphetne be, fél évben csak 90 napot tölthetne itt, s ha valamiért túllépné
ezt az idot, az idegenrendészet hosszú idore kitiltaná az országból. Tovább>>> http://www.kettosallampolgarsag.hu/index.php?name=altema&to_include=credo.php

Mirol szavazunk?

A közel félmillió támogató aláírással kezdeményezett népszavazáson az alábbi
kérdésben kell döntenünk:
"Akarja-e, hogy az Országgyulés törvényt alkosson arról, hogy kedvezményes honosítással -
kérelmére - magyar állampolgárságot kapjon az a magát magyar nemzetiségunek valló, nem
Magyarországon lakó, nem magyar állampolgár, aki magyar nemzetiségét a 2001. évi LXII. tv.
19.§ szerinti "Magyar Igazolvánnyal" vagy a megalkotandó törvényben meghatározott egyéb
módon igazolja?" Tovább>>>  http://www.kettosallampolgarsag.hu/index.php?name=nepsz

MAGYAR ALLAMPOLGARSAGOT MINDEN MAGYARNAK!
http://www.kettosallampolgarsag.hu/

Markus W Kilbinger | 4 Nov 17:06 2004
Picon
Picon

2.0_RC4 and -current instability, data corruption and system hang ups

Hi!

After playing about 2 weeks with my 'new' qube2 I couldn't get rid of
the following problem with -current (2.99.10 as of the last days) and
2.0_RC4 kernels/userlands (cross compiled on NetBSD/i386):

- Corrupted files/data streams mostly at 32 byte boundaries (and for
  32 bytes length), quite randomly spread, for writing and reading
  files.

  This only seems to happen under heavy load, especially when combined
  with different i/o media (e. g. ata and network). A simple example I
  am seeing at the moment is when comparing 'identical' copies of my
  cobalts 2.0_RC4 iso image (about 109 mb); I've just copied my
  original netbsd-cobalt.iso file multiple times under different names
  into the same filesystem of my qube2's harddisk and compared them
  (cmp -l [version1] [version2]) -> I see constant (happend during
  writing) and randomly appearing (during reading) mostly 32 bytes
  diffs of these files!?

  I hope it's not an issue of my hardware, pkgsrc/sysutils/memtester
  at least passed multiple times w/o any error... (how to selectively
  test ata i/o?)

  Does anybody else see this problem?

  Are there known issues with mips/cobalt pmap/uvm/ubc stuff?

- Sudden hangups under heavier load, mostly under continuous i/o
  traffic (ata and/or network, beyond 100 mb data volume). There is no
(Continue reading)

Manuel Bouyer | 4 Nov 17:24 2004
Picon

Re: 2.0_RC4 and -current instability, data corruption and system hang ups

On Thu, Nov 04, 2004 at 05:06:05PM +0100, Markus W Kilbinger wrote:
> Hi!
> 
> After playing about 2 weeks with my 'new' qube2 I couldn't get rid of
> the following problem with -current (2.99.10 as of the last days) and
> 2.0_RC4 kernels/userlands (cross compiled on NetBSD/i386):
> 
> - Corrupted files/data streams mostly at 32 byte boundaries (and for
>   32 bytes length), quite randomly spread, for writing and reading
>   files.

Could be a cache sync issue. What CPU is there in this box ?
Also what devices do you use for disk and network I/O ?

--

-- 
Manuel Bouyer <bouyer <at> antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--

Markus W Kilbinger | 4 Nov 17:33 2004
Picon
Picon

Re: 2.0_RC4 and -current instability, data corruption and system hang ups

>>>>> "Manuel" == Manuel Bouyer <bouyer <at> antioche.lip6.fr> writes:

    >> - Corrupted files/data streams mostly at 32 byte boundaries
    >>   (and for 32 bytes length), quite randomly spread, for writing
    >>   and reading files.

    Manuel> Could be a cache sync issue. What CPU is there in this box
    Manuel> ? Also what devices do you use for disk and network I/O ?

dmesg excerpt:

  total memory = 256 MB
  avail memory = 246 MB
  mainbus0 (root)
  com0 at mainbus0 addr 0x1c800000 level 3: st16650a, working fifo
  com0: console
  cpu0 at mainbus0: QED RM5200 CPU (0x28a0) Rev. 10.0 with built-in FPU Rev. 10.0
  cpu0: 32KB/32B 2-way set-associative L1 Instruction cache, 48 TLB entries
  cpu0: 32KB/32B 2-way set-associative write-back L1 Data cache
  panel0 at mainbus0 addr 0x1f000000
  gt0 at mainbus0 addr 0x14000000
  pci0 at gt0
  pci0: i/o space, memory space enabled, rd/line, wr/inv ok
  [...]
  tlp0 at pci0 dev 7 function 0: DECchip 21143 Ethernet, pass 4.1
  tlp0: interrupting at level 1
  tlp0: Ethernet address 00:10:e0:00:c3:c6
  lxtphy0 at tlp0 phy 1: LXT970 10/100 media interface, rev. 3
  lxtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
  [...]
(Continue reading)

Manuel Bouyer | 4 Nov 19:10 2004

Re: 2.0_RC4 and -current instability, data corruption and system hang ups

On Thu, Nov 04, 2004 at 05:33:25PM +0100, Markus W Kilbinger wrote:
> >>>>> "Manuel" == Manuel Bouyer <bouyer <at> antioche.lip6.fr> writes:
> 
>     >> - Corrupted files/data streams mostly at 32 byte boundaries
>     >>   (and for 32 bytes length), quite randomly spread, for writing
>     >>   and reading files.
> 
>     Manuel> Could be a cache sync issue. What CPU is there in this box
>     Manuel> ? Also what devices do you use for disk and network I/O ?
> 
> dmesg excerpt:
> 
>   total memory = 256 MB
>   avail memory = 246 MB
>   mainbus0 (root)
>   com0 at mainbus0 addr 0x1c800000 level 3: st16650a, working fifo
>   com0: console
>   cpu0 at mainbus0: QED RM5200 CPU (0x28a0) Rev. 10.0 with built-in FPU Rev. 10.0
>   cpu0: 32KB/32B 2-way set-associative L1 Instruction cache, 48 TLB entries
>   cpu0: 32KB/32B 2-way set-associative write-back L1 Data cache

32B lines, this matchs what you've reported.

>   [...]
> 
> 'tlp0' and 'viaide0' are all onboard devices.

tlp0 should have bus_dma related bugs, as it's known to work on alpha,
sparc64, and others hardware where bus_dma_sync() isn't a NOP.
viaide doesn't use custom DMA callbacks, so it should be safe too.
(Continue reading)

Markus W Kilbinger | 4 Nov 19:38 2004
Picon
Picon

Re: 2.0_RC4 and -current instability, data corruption and system hang ups

>>>>> "Manuel" == Manuel Bouyer <bouyer <at> antioche.eu.org> writes:

    Manuel> Could be a cache sync issue. What CPU is there in this box
    Manuel> ? Also what devices do you use for disk and network I/O ?

    >> cpu0 at mainbus0: QED RM5200 CPU (0x28a0) Rev. 10.0 with built-in FPU Rev. 10.0
    >> cpu0: 32KB/32B 2-way set-associative L1 Instruction cache, 48 TLB entries
    >> cpu0: 32KB/32B 2-way set-associative write-back L1 Data cache

    Manuel> 32B lines, this matchs what you've reported.

Does this help (anybody) to understand what's going wrong here then?

    >> 'tlp0' and 'viaide0' are all onboard devices.

    Manuel> tlp0 should have bus_dma related bugs, as it's known to
    Manuel> work on alpha, sparc64, and others hardware where
    Manuel> bus_dma_sync() isn't a NOP.

That means the tlp driver is buggy in cobalt mips machines?

Beside performance issues pure tlp traffic (e. g. routing) doesn't
seem to be problematic (but I didn't test this explicitly)...

Within my testings once I've put in a 3com 3c905b nic which was
detected und configured properly (ex0), but the performance was so bad
(20-30 kB/sec) that I gave it up.

    Manuel> viaide doesn't use custom DMA callbacks, so it should be
    Manuel> safe too.
(Continue reading)

Manuel Bouyer | 4 Nov 19:43 2004

Re: 2.0_RC4 and -current instability, data corruption and system hang ups

On Thu, Nov 04, 2004 at 07:38:38PM +0100, Markus W Kilbinger wrote:
> >>>>> "Manuel" == Manuel Bouyer <bouyer <at> antioche.eu.org> writes:
> 
>     Manuel> Could be a cache sync issue. What CPU is there in this box
>     Manuel> ? Also what devices do you use for disk and network I/O ?
> 
>     >> cpu0 at mainbus0: QED RM5200 CPU (0x28a0) Rev. 10.0 with built-in FPU Rev. 10.0
>     >> cpu0: 32KB/32B 2-way set-associative L1 Instruction cache, 48 TLB entries
>     >> cpu0: 32KB/32B 2-way set-associative write-back L1 Data cache
> 
>     Manuel> 32B lines, this matchs what you've reported.
> 
> Does this help (anybody) to understand what's going wrong here then?

Probably a bug in the cache management routines, but I can't tell more.
I'm not familiar with this part of code.

> 
>     >> 'tlp0' and 'viaide0' are all onboard devices.
> 
>     Manuel> tlp0 should have bus_dma related bugs, as it's known to
>     Manuel> work on alpha, sparc64, and others hardware where
>     Manuel> bus_dma_sync() isn't a NOP.
> 
> That means the tlp driver is buggy in cobalt mips machines?

No, sorry, there is a missing "no" here. As far as I know, the tlp driver
should be fine.

--

-- 
(Continue reading)

Markus W Kilbinger | 4 Nov 19:55 2004
Picon
Picon

Re: 2.0_RC4 and -current instability, data corruption and system hang ups

>>>>> "Manuel" == Manuel Bouyer <bouyer <at> antioche.eu.org> writes:

    Manuel> Could be a cache sync issue. What CPU is there in this box
    Manuel> ? Also what devices do you use for disk and network I/O ?
    >> 
    >> >> cpu0 at mainbus0: QED RM5200 CPU (0x28a0) Rev. 10.0 with built-in FPU Rev. 10.0
    >> >> cpu0: 32KB/32B 2-way set-associative L1 Instruction cache, 48 TLB entries
    >> >> cpu0: 32KB/32B 2-way set-associative write-back L1 Data cache
    >> 
    Manuel> 32B lines, this matchs what you've reported.
    >> 
    >> Does this help (anybody) to understand what's going wrong here then?

    Manuel> Probably a bug in the cache management routines, but I
    Manuel> can't tell more. I'm not familiar with this part of code.

Maybe somebody else within these lists can help here?

    Manuel> tlp0 should have bus_dma related bugs, as it's known to
    Manuel> work on alpha, sparc64, and others hardware where
    Manuel> bus_dma_sync() isn't a NOP.
    >> 
    >> That means the tlp driver is buggy in cobalt mips machines?

    Manuel> No, sorry, there is a missing "no" here. As far as I know,
    Manuel> the tlp driver should be fine.

Ah, ok (at least some kind of de-confusion for me ;-)).

Thanks for your comments!
(Continue reading)

David Brownlee | 5 Nov 12:27 2004

Re: 2.0_RC4 and -current instability, data corruption and system hang ups

 	Is there anyone who would be able to seriously look at this
 	on NetBSD but do not have a machine? If so, speak up, and
 	we should find a way to get a machine to you...

--

-- 
 		David/absolute       -- www.NetBSD.org: No hype required --

Simon Burge | 10 Nov 01:05 2004

Re: data corruption (Re: Binding more than one IP to a NIC)

[ CC'd to port-mips as well ]

Markus W Kilbinger wrote:

> >>>>> "Izumi" == Izumi Tsutsui <tsutsui <at> ceres.dti.ne.jp> writes:
> 
> First: Thanks for all infos!
> 
>     Izumi> Well, AFAIK the problem on cobalt PCI implementation
>     Izumi> affects memory mapped PCI devices (like siop), so I'm not
>     Izumi> sure if it fixes your "data corruption" problem. (patch
>     Izumi> attached, including MI changes filed in kern/27423)
> 
> No, the data corruption problem remains, but my qube2 became much more
> stable with your pci related patches! Beside the data corruption
> problems now I 'only' see panics under heavy load of the following
> kind:
> 
>   trap: TLB miss (load or instr. fetch) in kernel mode
>   status=0x2403, cause=0x8808, epc=0x80229214, vaddr=0xc874e000
>   pid=196 cmd=ttcp usp=0x7fffda10 ksp=0xc8749b08
>   Stopped in pid 196.1 (ttcp) at  netbsd:r5k_pdcache_wb_range_32+0x58: cache   0
>   x19,0x1a0(a0)
>   db> t
>   r5k_pdcache_wb_range_32+58 (c874de60,c874e3e0,5ea,5ea) ra 8022e468 sz 0
>   8022e3d8+90 (c874de60,c874e3e0,5ea,5ea) ra 0 sz 0
>   User-level: pid 196.1
>   db>

I'm a little rusty on cache ops...
(Continue reading)


Gmane