Tobias Diedrich | 1 Jun 2008 10:48
Picon

kvm: unable to handle kernel NULL pointer dereference

Hi,

I get the following Oops when trying to start qemu-kvm
(Debian/unstable kvm package version 60+dfsg-1) on my system:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8021d44f>] svm_vcpu_run+0x34/0x351
PGD 5aed6067 PUD 0 
Oops: 0000 [1] PREEMPT 
CPU 0 
Modules linked in: radeon drm snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 snd_seq_midi snd_rawmidi snd_ac97_codec emu10k1_gp snd_hda_intel ac97_bus snd_util_mem k8temp gameport snd_hwdep forcedeth pata_amd
Pid: 3125, comm: kvm Not tainted 2.6.26-rc4 #28
RIP: 0010:[<ffffffff8021d44f>]  [<ffffffff8021d44f>] svm_vcpu_run+0x34/0x351
RSP: 0018:ffff81006ff9bc38  EFLAGS: 00010046
RAX: ffff810049aca040 RBX: 00000000fffffffc RCX: 0000000000000000
RDX: ffff810049aca040 RSI: ffff81005a8aa000 RDI: ffff810049aca040
RBP: ffff81006ff9bc88 R08: 0000000000000002 R09: 0000000000000001
R10: ffffffff804237e5 R11: ffff81006ff9bc88 R12: ffff810049aca040
R13: 0000000000000000 R14: ffff81005a8aa000 R15: 000000000000ae80
FS:  00007f43c02946e0(0000) GS:ffffffff808bc000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000006ff21000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kvm (pid: 3125, threadinfo ffff81006ff9a000, task ffff81005aec1340)
Stack:  ffff81006ff9bc68 ffff810049aca040 ffff810049aca040 ffff8100491d40a8
 ffff810049aca040 00000000fffffffc ffff810049aca040 0000000000000000
 ffff81005a8aa000 000000000000ae80 ffff81006ff9bcc8 ffffffff8020fa41
Call Trace:
 [<ffffffff8020fa41>] kvm_arch_vcpu_ioctl_run+0x46a/0x6df
(Continue reading)

Avi Kivity | 1 Jun 2008 11:21

Re: [patch 00/12] fake ACPI C2 emulation v2

Marcelo Tosatti wrote:
> Addressing comments on the previous patchset, follows:
>
> - Same fake C2 emulation
> - /dev/pmtimer
> - Support for multiple IO bitmap pages + userspace interface
> - In-kernel ACPI pmtimer emulation
>
> Tested with Linux and WinXP guests. Also tested migration.
>   

Do you have any performance numbers, comparing qemu/kernel/passthrough?

[Real review will be delayed as I am travelling; will try to do as much 
as I can]

--

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrea Arcangeli | 1 Jun 2008 16:32

Re: Benchmarking on CentOS 5

On Fri, May 30, 2008 at 07:30:41PM +0200, Farkas Levente wrote:
> without crash) and the performance is similar (ie 80-90%) to the 
> development version. unfortunately it seems currently there is no such 

I hope you didn't reach this conclusion about different performance
between enterprise kernel host and mainline kernel host because I
didn't answer promptly to your previous email, sorry!

With regard to your performance question, preempt notifier emulation
really shouldn't be noticeable in real life. It worth to run the
fastest code only if you're doing pure benchmarking, in which case you
want to avoid any unnecessary exception and heavyweight exit (that
should reduces the time to schedule of a couple thousand cycles I
guess, but re-schedule events aren't so frequent and this is a
per-task per-cpu breakpoint with my latest emulation logic taking
advantage of the __switch_to internals).

I can hardly measure any difference here between SLES10 SP2 host and
mainline host with a simple dd from pagecache to /dev/null. The change
between enterprise kernel and current mainline kernel may be more
significant for the final performance than the impact of the preempt
notifier emulation. In any case the difference should be much less
than 5% as far as preempt notifier emulation is concerned. It should
be couple usec lost every couple msec or so.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

Thomas Gleixner | 1 Jun 2008 18:34
Picon

Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)

On Thu, 29 May 2008, Marcelo Tosatti wrote:
> KVM wishes to allow direct guest access to the ACPI pmtimer. In that
> case QEMU/KVM has to read the current value for migration, so the proper
> syncing can be done on the destination.

I don't understand from the above which problem you are trying to
solve. Which pmtimer is read out, the one of the host (physical
hardware) or the one of the guest (emulated hardware) ? What is synced
at the destination ?

> This patch will not register the device if the chipset has an unreliable
> timer.

Can we please keep that code inside of drivers/clocksource/acpi_pm.c
without creating a new disconnected file in drivers/char ?

Btw, depending on the use case we might as well have a sysfs entry for that.

> +static ssize_t pmtimer_read(struct file *file, char __user *buf, size_t count,
> +			    loff_t *ppos)
> +{
> +	int ret;
> +	__u32 value;
> +
> +	ret = -EINVAL;
> +	if (count < sizeof(u32))
> +		goto out;

    		return -EINVAL;
> +
(Continue reading)

Anthony Liguori | 1 Jun 2008 18:56

Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)

Thomas Gleixner wrote:
> Can we please keep that code inside of drivers/clocksource/acpi_pm.c
> without creating a new disconnected file in drivers/char ?
>
> Btw, depending on the use case we might as well have a sysfs entry for that.

I think sysfs would actually make a lot of sense for this.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marcelo Tosatti | 1 Jun 2008 19:56
Picon
Favicon

Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)

On Sun, Jun 01, 2008 at 06:34:27PM +0200, Thomas Gleixner wrote:
> On Thu, 29 May 2008, Marcelo Tosatti wrote:
> > KVM wishes to allow direct guest access to the ACPI pmtimer. In that
> > case QEMU/KVM has to read the current value for migration, so the proper
> > syncing can be done on the destination.
> 
> I don't understand from the above which problem you are trying to
> solve. Which pmtimer is read out, the one of the host (physical
> hardware) or the one of the guest (emulated hardware) ? What is synced
> at the destination ?

Problem is this:

We want to allow guests to directly access the hosts pmtimer (by using
the I/O bitmap feature in VMX/SVM hardware). The advantage of doing it
is that no VMExits are necessary for guest pmtimer reads (which happen
often if we inform the guest that ACPI C1 state is supported, or if the
workload is gettimeofday() intensive).

If you migrate such a guest that has direct (ie. non-virtualized, using
the physical hardware) pmtimer access to a different host (destination),
you need to save the current host pmtimer value at the time of migration
so that you can either emulate it with a proper offset or synchronize
(wait for the destination hosts real hardware pmtimer value to be in
sync before actually resuming guest execution).

> > This patch will not register the device if the chipset has an unreliable
> > timer.
> 
> Can we please keep that code inside of drivers/clocksource/acpi_pm.c
(Continue reading)

Thomas Gleixner | 1 Jun 2008 20:17
Picon

Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)

On Sun, 1 Jun 2008, Marcelo Tosatti wrote:
> On Sun, Jun 01, 2008 at 06:34:27PM +0200, Thomas Gleixner wrote:
> 
> A sysfs entry sounds fine and much simpler. Should probably be a generic
> clocksource interface (so userspace can read any available clocksource)
> rather than acpi_pm specific.

Agreed.

> >   	return clocksource_acpi_pm.read == acpi_pm_read;
> > 
> > So we don't need reliable_pmtimer at all.
> 
> For KVM's use case, we'd rather not allow direct pmtimer access if the
> host has an unreliable (buggy) chipset.

well, "return clocksource_acpi_pm.read == acpi_pm_read;" is supposed
to do that just without an additional variable "reliable_pmtimer" :)

> But then, I doubt any of those older affected chipsets have HW
> virtualization support, so it shouldnt be an issue.

It's exactly one old crappy chipset, which definitely has no HW virt
support and therefor we just can use read_pmtmr() w/o checking for
reliable or not.

Thanks,
	tglx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
(Continue reading)

Tan, Li | 2 Jun 2008 03:25
Picon
Favicon

RE: [PATCH] [RFC][PATCH] kvm: kvmtrace: kvmtrace_format for supporting big_endian

Yes, in this way only need to handle 2 situations, no need to have 3 situations.
Tan Li
-----Original Message-----
From: kvm-owner <at> vger.kernel.org [mailto:kvm-owner <at> vger.kernel.org] On Behalf Of Avi Kivity
Sent: 2008年5月28日 19:17
To: Tan, Li
Cc: kvm <at> vger.kernel.org
Subject: Re: [PATCH] [RFC][PATCH] kvm: kvmtrace: kvmtrace_format for supporting big_endian

Tan, Li wrote:
> According to http://docs.python.org/lib/module-struct.html
> Character Byte order Size and alignment 
>  <at>  native native 
> = native standard 
> < little-endian standard 
>   

Oh okay.  You are relying on the user to supply the reverse flag.

But you don't need to do that.  You can start with the formats as "<I" 
and similar.  Read the magic word.  If it mismatches, switch to ">I" and 
start again.

This way you get autodetection of the format, and shorter code as well.

--

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
(Continue reading)

Amit Shah | 2 Jun 2008 08:46

[PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest

From: Or Sagi <ors <at> tutis.com>
From: Nir Peleg <nir <at> tutis.com>
From: Amit Shah <amit.shah <at> qumranet.com>
From: Glauber de Oliveira Costa <gcosta <at> redhat.com>

We can assign a device from the host machine to a guest. The original code comes
from Neocleus.

A new command-line option, -pcidevice is added.
For example, to invoke it for an Ethernet device sitting at
PCI bus:dev.fn 04:08.0 with host IRQ 18, use this:

-pcidevice Ethernet/04:08.0-18

The host ethernet driver is to be removed before doing the passthrough.

If kvm uses the in-kernel irqchip, interrupts are routed to
the guest via the kvm module (accompanied kernel changes are necessary).
If -no-kvm-irqchip is used, the 'irqhook' module, also included here,
is to be used.

Signed-off-by: Amit Shah <amit.shah <at> qumranet.com>
---
 Makefile                  |   10 +-
 irqhook/Kbuild            |    3 +
 irqhook/Makefile          |   25 ++
 irqhook/irqhook_main.c    |  215 ++++++++++++++
 kernel/Makefile           |    2 +
 libkvm/libkvm-x86.c       |    9 +-
 libkvm/libkvm.h           |   16 +
(Continue reading)

Han, Weidong | 2 Jun 2008 09:18
Picon
Favicon

RE: [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest

Amit Shah wrote:
> From: Or Sagi <ors <at> tutis.com>
> From: Nir Peleg <nir <at> tutis.com>
> From: Amit Shah <amit.shah <at> qumranet.com>
> From: Glauber de Oliveira Costa <gcosta <at> redhat.com>
> 
> We can assign a device from the host machine to a guest. The original
> code comes 
> from Neocleus.
> 
> A new command-line option, -pcidevice is added.
> For example, to invoke it for an Ethernet device sitting at
> PCI bus:dev.fn 04:08.0 with host IRQ 18, use this:
> 
> -pcidevice Ethernet/04:08.0-18
> 
> The host ethernet driver is to be removed before doing the
> passthrough. 

This operation is rough. Do you have any plan to improve it? For
example, don't load drivers for the devices which will be assigned to
guests when host booting, or ideally unbind driver devices dynamically
before assignment.

Randy (Weidong)

> 
> If kvm uses the in-kernel irqchip, interrupts are routed to
> the guest via the kvm module (accompanied kernel changes are
> necessary). 
(Continue reading)


Gmane