Mark Harvey | 1 Jul 01:34 2008
Picon

Re: disk kicked out of RAID -> tgtd segmentation fault

Hi Tomasz,

I've also noticed there is only one tgtd process when running it in
the foreground.

Running in the foreground has been the only way I've been successful
in collecting core files in the past.

I still do not (read: never actually spent time looking) understand
the why/what differences is between foreground and why there is two
processes while in background.

I would have to assume that there is some sort of race condition or
handshake issue between the two processes within your test senerio.

Sorry I can't be of much help.

Cheers
Mark

On Mon, Jun 30, 2008 at 7:00 PM, Tomasz Chmielewski <mangoo@...> wrote:
> Mark Harvey schrieb:
>> My 2c worth.
>>
>> Try running the tgtd in 'foreground' mode (after setting "ulimit -c
>> unlimited").
>>
>> You will then get a core file which should be a little easier to work
>> with (vs gdb on a running tgtd instance).
>> e.g.
(Continue reading)

Ming Zhang | 2 Jul 16:09 2008
Picon

Re: disk kicked out of RAID -> tgtd segmentation fault

On Tue, 2008-07-01 at 09:34 +1000, Mark Harvey wrote:
> Hi Tomasz,
> 
> I've also noticed there is only one tgtd process when running it in
> the foreground.
> 
> Running in the foreground has been the only way I've been successful
> in collecting core files in the past.

in the script before u start tgtd, add "ulimit -c unlimited" might help.
or change tgtd code to call setrlimit().

> 
> I still do not (read: never actually spent time looking) understand
> the why/what differences is between foreground and why there is two
> processes while in background.
> 
> I would have to assume that there is some sort of race condition or
> handshake issue between the two processes within your test senerio.
> 
> Sorry I can't be of much help.
> 
> Cheers
> Mark
> 
> On Mon, Jun 30, 2008 at 7:00 PM, Tomasz Chmielewski <mangoo@...> wrote:
> > Mark Harvey schrieb:
> >> My 2c worth.
> >>
> >> Try running the tgtd in 'foreground' mode (after setting "ulimit -c
(Continue reading)

Richard Sharpe | 2 Jul 23:01 2008
Picon

A curious observation with iSCSI and SCSI tgt ...

Hi,

I am testing some configurations I have laying around here at work while waiting for our new hardware to arrive.

I set up one system, a 1U with 2GB of memory and a Xeon (speed not really relevant to this) and GigE as a target and set up a virtual disk via iSCSI.

Then on another system, an old DELL, I set up the initiator, and ran:

   dd if=/dev/zero of=/dev/sda1 bs=1024 count=1000000

and I got a throughput of 10.5-10.8MB/a while consuming around 10-12% of the CPU on the target (shown via top).

That number looked suspiciously like 100BaseT numbers to me, and lspci told me it was.

So, I shifted to another system with GigE and did the same tests. This time around, tgtd on the target machine was getting 25-30% with spikes up to 99% but the throughput stayed around 10.2MB/s.

I am going to pull down the kernel profiling tool to see what is going on, but I wonder if anyone else knows off the top of their heads?

_______________________________________________
Stgt-devel mailing list
Stgt-devel@...
https://lists.berlios.de/mailman/listinfo/stgt-devel
ronnie sahlberg | 3 Jul 02:58 2008
Picon

Re: disk kicked out of RAID -> tgtd segmentation fault

Hi Tomasz

I had no problems running TGTD under gdb.   Just let it start first
and fork()   then
ps aux |grep tgtd and  gdb -p PID to attach to each of the two processes.

What appears to happen is the task has been removed already from
struct scsi_cmd *cmd->c_hlist
so that c_hlist is actually a completely empty list.
next==prev==NULL.

Thus the list_del() helper causes a SEGV since it assumes that the
list can never be empty and that we can always
dereference the next/prev pointers.

Tomasz, can you try the patch below the gdb backtrace?
It prevents the SEGV for me.

This solves one of the bugs.  That list_del() gets a SEGV when the
list is empty.
There is probably another bug somewhere as well where tgtd has lost
track of which tasks are active and has forgotten that this task
has already been deleted/removed from the list. thus causing it to
call list_del() for a task that is not on the list.
I.e. the task is referenced from several places and when it was
deleted tgtd previously removed it from this list but forgot to remove
it from some other list/place.
I have no idea where that bug is.

regards
ronnie s

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fc9c63056e0 (LWP 4124)]
0x0000000000412ed4 in __list_del (prev=0x0, next=0x0) at list.h:79
79              next->prev = prev;
(gdb) bt
#0  0x0000000000412ed4 in __list_del (prev=0x0, next=0x0) at list.h:79
#1  0x0000000000412ea3 in list_del (entry=0x642868) at list.h:85
#2  0x000000000041363d in cmd_hlist_remove (cmd=0x642860) at target.c:308
#3  0x0000000000414d92 in __cmd_done (target=0x0, cmd=0x642860) at target.c:862
#4  0x0000000000414fab in target_cmd_done (cmd=0x642860) at target.c:906
#5  0x0000000000406faf in iscsi_free_cmd_task (task=0x6427a0)
    at iscsi/iscsid.c:1081
#6  0x0000000000403613 in conn_close (conn=0x63dc88) at iscsi/conn.c:112
#7  0x000000000040ca19 in iscsi_tcp_event_handler (fd=14, events=5,
    data=0x63dc88) at iscsi/iscsi_tcp.c:166
#8  0x0000000000411513 in event_loop () at tgtd.c:251
#9  0x00000000004118e2 in main (argc=1, argv=0x7fffce311678) at tgtd.c:355

 diff --git a/usr/list.h b/usr/list.h
index 4d76057..39222ab 100644
--- a/usr/list.h
+++ b/usr/list.h
 <at>  <at>  -82,6 +82,9  <at>  <at>  static inline void __list_del(struct list_head * prev, struct

 static inline void list_del(struct list_head *entry)
 {
+       if ((entry->prev == NULL) && (entry->next == NULL)) {
+               return;
+       }
        __list_del(entry->prev, entry->next);
        entry->next = entry->prev = NULL;
 }

On Mon, Jun 30, 2008 at 7:05 PM, Tomasz Chmielewski <mangoo@...> wrote:
> Tomasz Chmielewski schrieb:
>
> (...)
>
>> initiator# iptables -I INPUT -s <target IP> -p tcp --sport 3260 -j DROP
>>
>>
>> After a while, you will see that only one tgtd process is running, whereas
>> the second has crashed.
>
> (...)
>
>> The above is valid with tgt-20080527, I'm just about to try tgt-20080629.
>
> It still crashes with tgt-20080629.
>
>
> --
> Tomasz Chmielewski
> http://wpkg.org
>
Richard Sharpe | 3 Jul 21:32 2008
Picon

should backed_file_open be called from dtd_load_unload, or ...

Hi,

In looking at implementing some aspects of SSC in concert with MMC, and especially considering things like MAM and even how to handle things like FILE marks and GAPs etc, it seems to me that one model to use would be to have a separate file for certain things, or even a database of sorts that keys off of a token associated with a slot.

One way to more flexibly support this would be to have the struct device_type_template have separate backed_file_open and backed_file_close methods andto call these from dtd_load_unload.

If the particular device is happy with the default backed_file_open request, it could simply initialize its device_type_template with this function, otherwise it could override the default with its own implementation ...

Does anyone have any comments on this?

_______________________________________________
Stgt-devel mailing list
Stgt-devel@...
https://lists.berlios.de/mailman/listinfo/stgt-devel
ronnie sahlberg | 7 Jul 03:12 2008
Picon

[PATCH 1/1] list.h: prevent a SEGV if we try to clear a list that is already empty

Please apply.

I have reproduced the SEGV that was reported when I/O has failed/been aborted.
This pathc prevents the SEGV from occuring in this situation.

It does however not address the root cause why tgtd tries to clear a
list that is already (and it should know is already?) empty.

ronnie sahlberg

From 9fce02d67ea0369a4c070e3559c0c812a728a914 Mon Sep 17 00:00:00 2001
From: Ronnie Sahlberg <ronniesahlberg@...>
Date: Mon, 7 Jul 2008 11:06:11 +1000
Subject: [PATCH] Make sure the ->next/prev pointers in the list head
are non-NULL
 before we dereference them.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@...>
---
 usr/list.h |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/usr/list.h b/usr/list.h
index 4d76057..39222ab 100644
--- a/usr/list.h
+++ b/usr/list.h
 <at>  <at>  -82,6 +82,9  <at>  <at>  static inline void __list_del(struct list_head *
prev, struct list_head * next)

 static inline void list_del(struct list_head *entry)
 {
+	if ((entry->prev == NULL) && (entry->next == NULL)) {
+		return;
+	}
 	__list_del(entry->prev, entry->next);
 	entry->next = entry->prev = NULL;
 }
--

-- 
1.5.4.3
ronnie sahlberg | 7 Jul 04:55 2008
Picon

[PATCH 1/1] RESEND list.h prevent SEGV if list is already empty

Please apply.

Resending modified patch that passes checkpatch.pl

This patch prevents the SEGV that is triggered when an I/O is aborted/timedout.

From 4b42e7be6012d2b7b7e119ee7dd806e0bb1732cd Mon Sep 17 00:00:00 2001
From: Ronnie Sahlberg <ronniesahlberg@...>
Date: Mon, 7 Jul 2008 12:53:13 +1000
Subject: [PATCH] Make sure that ->prev and ->next are non-NULL
 before we call __list_del() and dereference them.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@...>
---
 usr/list.h |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/usr/list.h b/usr/list.h
index 4d76057..12e6b06 100644
--- a/usr/list.h
+++ b/usr/list.h
 <at>  <at>  -82,6 +82,9  <at>  <at>  static inline void __list_del(struct list_head *
prev, struct list_head * next)

 static inline void list_del(struct list_head *entry)
 {
+	if ((entry->prev == NULL) && (entry->next == NULL))
+		return;
+
 	__list_del(entry->prev, entry->next);
 	entry->next = entry->prev = NULL;
 }
--

-- 
1.5.4.3
Eli Dorfman | 8 Jul 13:53 2008
Picon

Re: [RFC] target configuration tool

Hi,

This configuration tool looks like a good start.
Can you add this tool to tgt/scripts so that we and others start
working and improving it.

Thanks,
Eli
Doron Shoham | 8 Jul 14:52 2008

[PATCH] Define a larger SCSI_SN_LEN

Hi,
Today, SCSI_SN_LEN is defined as 8 chars only.
I want to define a larger SCSI_SN_LEN in order to
export the real device's serial number.
This is required for using dm-multipath.
When several targets export the same device to the initiator we
will be able to use dm-multipath on the initiator. 

Thanks,
Doron

Define a larger SCSI_SN_LEN in order to
export the real device's serial number.

Signed-off-by: Doron Shoham <dorons@...>
---
 usr/tgtd.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/usr/tgtd.h b/usr/tgtd.h
index 62e3821..2f53fb8 100644
--- a/usr/tgtd.h
+++ b/usr/tgtd.h
 <at>  <at>  -5,7 +5,7  <at>  <at> 
 #include "scsi_cmnd.h"

 #define SCSI_ID_LEN		24
-#define SCSI_SN_LEN		8
+#define SCSI_SN_LEN		32

 #define VENDOR_ID_LEN		8
 #define PRODUCT_ID_LEN		16
--

-- 
1.5.2
FUJITA Tomonori | 9 Jul 07:26 2008
Picon

Re: disk kicked out of RAID -> tgtd segmentation fault

On Mon, 30 Jun 2008 10:54:48 +0200
Tomasz Chmielewski <mangoo@...> wrote:

> Tomasz Chmielewski schrieb:
> > ronnie sahlberg schrieb:
> >> Hi Tomasz,
> >>
> >> I could not get that configuration to work.
> >>
> >> Can you please provide more detailed instructions exactly how to set
> >> up hosts A B and C
> >> so I can try to reproduce it.
> >>
> >> Please provide the exact commandline for each and every command I need
> >> to run on the three hosts and Ill try to
> >> reproduce it under gdb.
> > 
> > A faulty RAID is just one way to crash tgtd.
> > 
> > A simpler one is to just block the traffic between the target and the 
> > initiator - just login to the target, make sure there is some iSCSI 
> > traffic between the target and the initiator, then block incoming iSCSI 
> > traffic on the initiator with:
> > 
> > initiator# iptables -I INPUT -s <target IP> -p tcp --sport 3260 -j DROP
> > 
> > 
> > After a while, you will see that only one tgtd process is running, 
> > whereas the second has crashed.
> 
> Note - the above seems to be valid if:
> 
> - there are two initiators connected (from different IPs), perhaps more
> - there is traffic from these two initiators
> - we block traffic on one of these initiators
> 
> 
> I couldn't reproduce the issue with only one initiator connected.

Can you provide the detailed configuration?

Do you mean:

1. there are three machines, say A, B, and C.

2. you run tgtd on A and setup one target in tgtd.

3. B and C work as an initiator. They connect to A. So the target on A
has two sessions.

Then you block the traffic btwwen A and B, then tgtd on A dies?

Right?

I think that the output of tgtadm will enable us to understand your
configuration easily.

Thanks,

Gmane