Grant Grundler | 1 Aug 2008 18:12
Favicon

Re: ccio_mark_invalid(): would it have to clear a bit or byte?

On Mon, Jul 21, 2008 at 02:51:08PM +0000, Joel Soete wrote:
> Hello all,
>
> given this comment:
>  * Given a virtual address (vba, arg2) and space id, (sid, arg1),
>  * load the I/O PDIR entry pointed to by pdir_ptr (arg0). Each IO Pdir
>  * entry consists of 8 bytes as shown below (MSB == bit 0):
...
> and also this:
>         while (byte_cnt > 0) {
>                 /* clear I/O Pdir entry "valid" bit first */
>                 ((unsigned char *) pdir_ptr)[7] = 0;
>
> So if I well understand 'Valid' field of a pdir entry is well of 1 bit but 
> the code cleanup a all byte?

That's just a convenient way to clobber the bit we care about.
The fact that the rest of the pdir remains available is irrelevant
except for debugging (when we might dump IO Pdir to see the history.)

>
> Is coding something like:
> #define PTE_VALID_BIT_MASK      0xfffffffffffffffeULL	
>
> 		*pdir_ptr &= PTE_VALID_BIT_MASK;

That's a load/modify store of a 64-bit value.
That substantially more instructions than a single byte store.

> wouldn't do better what comment says it does?
(Continue reading)

Helge Deller | 2 Aug 2008 00:15
Picon
Picon

Re: [PATCH] fix unwind crash - was: Re: 2.6.26 kernel crash

Hi Kyle,

I verified, that my attached patch fixes the kernel panic.
Testcase is here: http://gsyprf10.external.hp.com/~deller/crash.tgz
Could you please apply the patch?
Signed-off-by: Helge Deller <deller <at> gmx.de>

Thanks,
Helge

PS:
arch/parisc/kernel/unwind.c, line 225 looks kinda fishy as well:
225: info->prev_ip = *(unsigned long *)(info->prev_sp - RP_OFFSET);

PPS:
Instead of a kernel panic (which is really annoying since you need to 
reboot the machine) I now get as expected an user fault:

do_page_fault() pid=1846 command='a.out' type=6 address=0x87802043
vm_start = 0x407ff000, vm_end = 0x40802000

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00000000000001001111111100001111 Not tainted
r00-03  0004ff0f 407c9f6b 406fbfe3 00012b00
r04-07  fb4ec308 000125b8 407fd534 000e6ba8
r08-11  fb4ec014 00000001 0001264a 000d3b60
r12-15  00000000 000d3b5c 000db4c8 000b0000
r16-19  000d06a0 000b0000 ffffffff 23882000
r20-23  406fc15f 406fc138 87802042 00012d80
r24-27  407fd534 000125b8 407fd534 000125b8
(Continue reading)

Grant Grundler | 2 Aug 2008 01:31
Favicon

Re: Cupertino test ring problem?

On Thu, Jul 31, 2008 at 10:26:24AM -0700, Rick Jones wrote:
...
> I can confirm that gsyperf3 was/is not set to autoboot.  I can also state 
> that it cannot successfully boot.  During POST it spits-out FRU problem 
> messages and during OS boot boatloads of Segmentation Fault output while it 
> tries to boot, and end-up in busybox.

Debian kernel didn't find it's root disk...older kernel I built booted fine.
It's up and I restored the default to a kernel that boots. I can test
other debian kernels on other machines and update gsyprf3 once those
are proven to work.

Rick and I swapped gsyprf3 (1.5Ghz proto CPUs and pre-production mother
board and case) for a production unit (1.3 Ghz CPUs and 8GB of RAM).

> I'm not sure that gsyprf11 (pa) is connected to the external net.

It wasn't and the default debian kernel didn't boot either.
I've set the default kernel to jda's 2.6.22.19 patched kernel.

> I tried swapping the cables on gsyprf10 (the lp1000r)  I have to see if I 
> can find the old monitor and keyboard to see what its boot state happens to 
> be.

again, bad 2.6.24 debian kernel. Rebooting to older (2.6.21) debian
kernel worked. Updated the machine but not willing to try a newer
kernel on this box unless I'm sitting in front of it with a console.

> Grant - wrt times, I'm here all week, from about 0930 to about 0330 each 
> day.  I'd be around later but this week I'm playing single-parent :)
(Continue reading)

Grant Grundler | 2 Aug 2008 01:32
Favicon

Re: Cupertino test ring problem?

On Tue, Jul 29, 2008 at 02:26:03PM -0600, Matthew Wilcox wrote:
> On Tue, Jul 29, 2008 at 01:03:10PM -0700, Rick Jones wrote:
> > A couple of the rx2600's were not powered-up, so I've powered them up. 
> > One of the rp34xx's does not seem "happy" and will need further 
> > diagnosis.  If there are other things still "not right" with the ring, 
> > please feel free to let me know the specifics.
> 
> I can't connect to port 22 on any of gsyprf3, gsyprf10 or gsyprf11.

should all be working now.

thanks,
grant

> 
> -- 
> Intel are signing my paycheques ... these opinions are still mine
> "Bill, look, we understand that you're interested in selling us this
> operating system, but compare it to ours.  We can't possibly take such
> a retrograde step."
> --
> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
> the body of a message to majordomo <at> vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

Joel Soete | 2 Aug 2008 09:59
Picon
Favicon

Re: ccio_mark_invalid(): would it have to clear a bit or byte?


Grant Grundler wrote:
> On Mon, Jul 21, 2008 at 02:51:08PM +0000, Joel Soete wrote:
>> Hello all,
>>
>> given this comment:
>>  * Given a virtual address (vba, arg2) and space id, (sid, arg1),
>>  * load the I/O PDIR entry pointed to by pdir_ptr (arg0). Each IO Pdir
>>  * entry consists of 8 bytes as shown below (MSB == bit 0):
> ...
>> and also this:
>>         while (byte_cnt > 0) {
>>                 /* clear I/O Pdir entry "valid" bit first */
>>                 ((unsigned char *) pdir_ptr)[7] = 0;
>>
>> So if I well understand 'Valid' field of a pdir entry is well of 1 bit but 
>> the code cleanup a all byte?
> 
> That's just a convenient way to clobber the bit we care about.
> The fact that the rest of the pdir remains available is irrelevant
> except for debugging (when we might dump IO Pdir to see the history.)
> 
Ok my worry was because other bits of this bytes was related to DMA behaviour of this U2 (Prefetch, Update,
Lock, SafeDMA).

>> Is coding something like:
>> #define PTE_VALID_BIT_MASK      0xfffffffffffffffeULL	
>>
>> 		*pdir_ptr &= PTE_VALID_BIT_MASK;
> 
(Continue reading)

Randolph Chung | 3 Aug 2008 16:26

Re: [PATCH] fix unwind crash - was: Re: 2.6.26 kernel crash

(Apologize for my earlier html mail - resent in a more proper format)

Helge, your patch doesn't look quite right.

The kernel unwinder is only supposed to be called for kernel
addresses. Kyle says he thinks he knows what is wrong so he's going to
poke at it.

Maybe we can make the kernel unwinder more robust against invalid
addresses passed to it though.

thanks,
randolph

On Tue, Jul 29, 2008 at 1:31 PM, Helge Deller <deller <at> gmx.de> wrote:
>
> I narrowed down to the problematic codepath, and I assume the attached patch might fix it. Problem is, that
I can't test without my testcase which is on gsyprf10 (which is unreachable right now), so this patch here
is currently a RFC...
>
> Helge
>
> Signed-off-by: Helge Deller <deller <at> gmx.de>
>
>
> Helge Deller wrote:
>>
>> On Sunday 20 July 2008, Helge Deller wrote:
>>>
>>> While debugging some user-space stuff I just faced this 32bit kernel crash (2.6.26):
(Continue reading)

Grant Grundler | 4 Aug 2008 08:20
Favicon

Re: ccio-dma: is issue could be related to too much io_tlb entries?

On Thu, Jul 24, 2008 at 02:13:55PM +0100, Joel Soete wrote:
> Hello Grant, Kyle, et al.,
> 
> Iirc the number of io_tlb enties on this u2/uturn ioa is of 256?

ISTR that u2 and uturn have different number of IO TLB entries.
But I don't recall how many exactly. Need the ERSs to look that up.

> Because issue occur only when I do a lot of I/O on scsi disk (sometime request
> of mapping reach 128 pages), the idea was that it could induce some exceed of
> iotlb entries.
> 
> I so turn on some STAT (just used_pages) and grab following data:
> [snip]
> IO PDIR size    : 131072 bytes (16384 entries)
> IO PDIR entries : 16384 total  170 used (16214 free, 1%)
> Resource bitmap : 2048 bytes (16384 pages)
>   Bitmap search : 36221/36430/38793 (min/avg/max CPU Cycles)
> 
> IO PDIR size    : 131072 bytes (16384 entries)
> IO PDIR entries : 16384 total  235 used (16149 free, 1%)
> Resource bitmap : 2048 bytes (16384 pages)
>   Bitmap search : 36220/36346/37806 (min/avg/max CPU Cycles)
> 
> IO PDIR size    : 131072 bytes (16384 entries)
> IO PDIR entries : 16384 total  718 used (15666 free, 4%)
> Resource bitmap : 2048 bytes (16384 pages)
>   Bitmap search : 36222/36342/38472 (min/avg/max CPU Cycles)
> 
> ## issue occurs just when I ready above message.
(Continue reading)

Thibaut VARENE | 4 Aug 2008 10:59
Favicon

Re: Cupertino test ring problem?

On Sat, Aug 2, 2008 at 1:31 AM, Grant Grundler
<grundler <at> parisc-linux.org> wrote:

> It wasn't and the default debian kernel didn't boot either.
> I've set the default kernel to jda's 2.6.22.19 patched kernel.

Is there a source tarball of that, or a list of applied patches? I
would very much like to use that "homebrew" kernel on my cluster as
well, until newer kernels are proven reliable enough...

Thx

T-Bone
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

John David Anglin | 4 Aug 2008 17:34
Picon

Re: Cupertino test ring problem?

> Is there a source tarball of that, or a list of applied patches? I
> would very much like to use that "homebrew" kernel on my cluster as
> well, until newer kernels are proven reliable enough...

I built a tar ball, linux-2.6.22.19-jda.tar.gz.  It is in my home directory
on gsyprf11.  The source is also there.  The most important patch is
compat_sys_getdents.d from Kyle.  The base is linux-2.6.22.19.tar.bz2
from kernel.org.  The .config file was derived from some earlier config
file on this machine (can't remember which).

Dave
--

-- 
J. David Anglin                                  dave.anglin <at> nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thibaut VARENE | 4 Aug 2008 17:39
Favicon

Re: Cupertino test ring problem?

On Mon, Aug 4, 2008 at 5:34 PM, John David Anglin
<dave <at> hiauly1.hia.nrc.ca> wrote:
>> Is there a source tarball of that, or a list of applied patches? I
>> would very much like to use that "homebrew" kernel on my cluster as
>> well, until newer kernels are proven reliable enough...
>
> I built a tar ball, linux-2.6.22.19-jda.tar.gz.  It is in my home directory
> on gsyprf11.  The source is also there.  The most important patch is
> compat_sys_getdents.d from Kyle.  The base is linux-2.6.22.19.tar.bz2
> from kernel.org.  The .config file was derived from some earlier config
> file on this machine (can't remember which).

Thanks a lot, I'll fetch that later on!

T-Bone
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane