Adam Hamsik | 1 Mar 2010 08:52
Picon
Gravatar

Re: nfe problem on dom0 kernel 5.99.24 [need help]


On Feb,Saturday 27 2010, at 10:23 AM, Jean-Yves Migeon wrote:

> On 02/27/10 04:56, Adam Hamsik wrote:
>> Hi Manual,
>> On Feb,Friday 26 2010, at 12:14 PM, Manuel Bouyer wrote:
>> 
>>> On Fri, Feb 26, 2010 at 04:44:21AM -0600, Sam Fourman Jr. wrote:
>>>>> If possible, boot with a xen-debug kernel, and post the xm dmesg. Failing to
>>>>> create DMA memory is likely to be due to an error returned by hypervisor.
>>>>> 
>> [snip]
> >
>> Do you have any ideas how can it be debug more ?
>> 
>> Regards
>> 
>> Adam.
> 
> decrease_reservation fails for one page when the page does not belong to the guest, or that the args passed
to the decrease_reservation() are invalid (wrong order, wrong extent). Which is unlikely here.
> 
> Please try with rev 1.18 of xen/x86/xen_bus_dma.c; hopefully the error message will be more meaningful.

Here is updated dmesg with your latest change.

nfe0: interrupting at ioapic0 pin 22, event channel 7
nfe0: Ethernet address 00:1a:92:c0:d4:b2
_xen_bus_dmamem_alloc_range boundary check
_bus_dmamem_alloc_range == uvm_pglistalloc
(Continue reading)

Jukka Marin | 1 Mar 2010 12:06
Picon

Re: amd64 + Xen + arcmsr = crash

On Fri, Feb 26, 2010 at 01:48:01AM +0100, Jean-Yves Migeon wrote:
> >I would like to replace the NetBSD i386 dom0 with amd64 port to be able to
> >access 8 GB of RAM.  However, amd64+xen does not work well with the Areca
> >controller - most of NetBSD builds make the system panic at a random point.
> 
> What are the panic messages, and which "random points"?

I'll post the panic message as soon as I can.  The system ran one build
without problems, but during the next build, the system while it was
executing the last command below:

--- insn-recog.o ---
--- dependall-compat ---
--- dependall-lib ---
--- tests.po ---
#   compile  libatf-c++/tests.po

"Random points" means that the build process can crash the system at a random
time (not during the same command every time).

  -jm

haad | 1 Mar 2010 13:56
Picon
Gravatar

Re: nfe problem on dom0 kernel 5.99.24 [need help]

There is also other very strange thing which may be related to this
nfe behaviour.
If I try to unpack any tar.gz it will failed with strnage errors.  See
attached output.
However if I boot GENERIC kernel everything works like expected. We
have checked machine memory with memtest several times and memory
seems to work properly.

Do this make any sense to anyone ?

pkgsrc/archivers/libarchive/files/libarchive/test/test_write_format_tar_ustar.c
pkgsrc/archivers/libarchive/files/libarchive/test/test_write_open_memory.c
pkgsrc/archivers/libarchive/files/libarchive/test/test_acl_freebsd.c
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_bzip2.c
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_bzip2_1.tbz.uu
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_bzip2_2.tbz.uu
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_cpio.c
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_cpio_1.cpio.uu
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_gtar_1.tar.uu
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_gzip_2.tgz.uu
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_lzma.c
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_lzma_1.tlz.uu
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_lzma_2.tlz.uu
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_lzma_3.tlz.uu
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_solaris_tar_acl.c
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_solaris_tar_acl.tar.uu
pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_xz.c
gzip: pkgsrc/archivers/libarchive/files/libarchive/test/test_compat_xz_1.txz.uu
error writing to
outputpkgsrc/archivers/libarchive/files/libarchive/test/test_extattr_freebsd.c
(Continue reading)

Christoph Egger | 1 Mar 2010 14:57
Picon
Picon

netbsd crash


Hi!

When I boot a Xen/amd64 Dom0 it crashes
at

sys/arch/x86/pci/pci_machdep.c:348

pc_make_tag is a wild pointer.

Christoph

Jean-Yves Migeon | 2 Mar 2010 00:30
Picon
Favicon

Re: nfe problem on dom0 kernel 5.99.24 [need help]

On 03/01/10 13:56, haad wrote:
> There is also other very strange thing which may be related to this
> nfe behaviour.
> If I try to unpack any tar.gz it will failed with strnage errors.  See
> attached output.
> However if I boot GENERIC kernel everything works like expected. We
> have checked machine memory with memtest several times and memory
> seems to work properly.
>
> Do this make any sense to anyone ?

Same as before, check if there is no warning raised on dmesg or xm dmesg.

Other than that, I can't really say :/

--

-- 
Jean-Yves Migeon
jeanyves.migeon <at> free.fr

Jean-Yves Migeon | 2 Mar 2010 01:28
Picon
Favicon

Re: nfe problem on dom0 kernel 5.99.24 [need help]

On 03/01/10 08:52, Adam Hamsik wrote:
 >>>>>> If possible, boot with a xen-debug kernel, and post the xm 
dmesg. Failing to
 >>>>>> create DMA memory is likely to be due to an error returned by 
hypervisor.
 >>>>>>
 >>> [snip]
 >>>
 >>> Do you have any ideas how can it be debug more ?
 >>>
 > _xen_alloc_contig ->  uvm_pglistalloc == 0
 > xen_alloc_contig: XENMEM_decrease_reservation failed: err 0 (pa 
0x208f000 mfn 0x209d70)
 > _xen_alloc_contig 12
 >[snip]
 > _xen_alloc_contig ->  uvm_pglistalloc
 > _xen_alloc_contig ->  uvm_pglistalloc == 0
 > xen_alloc_contig: XENMEM_decrease_reservation failed: err 0 (pa 
0x2093000 mfn 0xce0f3)
 > _xen_alloc_contig 12
 > nfe1: could not create DMA map
 > nfe1: could not allocate Tx ring

I'll have a look tomorrow at hypervisor's code.

How much RAM do you have for this host?

Could you post the result obtained with the attached patch please?

--

-- 
(Continue reading)

Sam Fourman Jr. | 2 Mar 2010 01:32
Picon

Re: nfe problem on dom0 kernel 5.99.24 [need help]

>
> I'll have a look tomorrow at hypervisor's code.
>
> How much RAM do you have for this host?
I can answer this one, it is Option #6

banner=Welcome to NetBSD
banner==================
banner=
banner=Please choose an option from the following menu:
menu=Boot normally:boot netbsd
menu=Boot single-user:boot netbsd -s
menu=Boot backup kernel:boot onetbsd
menu=Drop to boot prompt:prompt
menu=Boot Xen with 512MB for dom0:load /netbsd-XEN3_DOM0
console=pc;multiboot /xen-debug dom0_mem=512M
menu=Development  NetBSD kernel:load /netbsd-XEN3_DOM0-testing
console=pc;multiboot /xen-debug dom0_mem=512M
menu=Boot Xen with 2048MB for dom0:load /XEN3_DOM0_20091215
console=pc;multiboot /xen dom0_mem=2048M

timeout=5
default=6
clear=1

> Could you post the result obtained with the attached patch please?
>
> --
> Jean-Yves Migeon
> jeanyves.migeon <at> free.fr
(Continue reading)

Dustin Marquess | 2 Mar 2010 18:27
Picon

File-backed VMs under RAIDframe RAID-1

All,

I have a question regarding using RAIDframe RAID-1 & Xen.

Months ago on one setup a Linux VM that was P2V'd from a vendor
install (CentOS) got completely corrupt.  I scratched it off as a
vendor problem.

Today my Oracle DBA was complaining that Oracle died with 600 errors.
I logged on to the Xen dom0 and noticed that RAIDframe had kicked a
drive out:

Feb 24 03:37:23 fxnpvm02 /netbsd: wd0a: error reading fsbn 394348511
of 394348511-394348607 (wd0 bn 394348574; cn 391218 tn 13 sn 11),
retrying
Feb 24 03:37:23 fxnpvm02 /netbsd: wd0: (uncorrectable data error)
Feb 24 03:37:23 fxnpvm02 /netbsd: ahcisata0 port 2: device present,
speed: 3.0Gb/s
Feb 24 03:37:25 fxnpvm02 /netbsd: wd0a: error reading fsbn 394348511
of 394348511-394348607 (wd0 bn 394348574; cn 391218 tn 13 sn 11),
retrying
Feb 24 03:37:25 fxnpvm02 /netbsd: wd0: (uncorrectable data error)
Feb 24 03:37:26 fxnpvm02 /netbsd: ahcisata0 port 2: device present,
speed: 3.0Gb/s
Feb 24 03:37:28 fxnpvm02 /netbsd: wd0a: error reading fsbn 394348511
of 394348511-394348607 (wd0 bn 394348574; cn 391218 tn 13 sn 11),
retrying
Feb 24 03:37:28 fxnpvm02 /netbsd: wd0: (uncorrectable data error)
Feb 24 03:37:28 fxnpvm02 /netbsd: ahcisata0 port 2: device present,
speed: 3.0Gb/s
(Continue reading)

Dustin Marquess | 2 Mar 2010 19:13
Picon

Re: File-backed VMs under RAIDframe RAID-1

I forgot to mention that the dom0 is running FFSv2 + WAPBL.  It is
pinned to the first CPU and has 256MB RAM allocated to it.  The domu
has the other 3 CPUs and has 7.5 GB RAM + 2 bridged network
interfaces.  The domU is paravirt, not HVM.

-Dustin

Thor Lancelot Simon | 2 Mar 2010 19:46
Picon
Favicon

Re: File-backed VMs under RAIDframe RAID-1

On Tue, Mar 02, 2010 at 11:27:30AM -0600, Dustin Marquess wrote:
> All,
> 
> I have a question regarding using RAIDframe RAID-1 & Xen.
> 
> Months ago on one setup a Linux VM that was P2V'd from a vendor
> install (CentOS) got completely corrupt.  I scratched it off as a
> vendor problem.
> 
> Today my Oracle DBA was complaining that Oracle died with 600 errors.
> I logged on to the Xen dom0 and noticed that RAIDframe had kicked a
> drive out:
> 
> Feb 24 03:37:23 fxnpvm02 /netbsd: wd0a: error reading fsbn 394348511
> of 394348511-394348607 (wd0 bn 394348574; cn 391218 tn 13 sn 11),
> retrying
> Feb 24 03:37:23 fxnpvm02 /netbsd: wd0: (uncorrectable data error)

On a read error, RAIDframe *should* be returning you the data from the
other member of the RAID set, for a RAID-1 set.  But if there's any chance
the drive returned zeroes or bad data before it started actually returning
error... ugh.

Thor


Gmane