1 Aug 2010 12:57
sw raid array completely hungs during verify in 2.6.32
Michael Tokarev <mjt <at> tls.msk.ru>
2010-08-01 10:57:56 GMT
2010-08-01 10:57:56 GMT
Hello. It is the second time we come across this issue after switching from 2.6.27 to 2.6.32 about 3 months ago. At some point, an md-raid10 array hungs - that is, all the processes that tries to access it, either read or write, hungs forever. Here's a typical set of messages found in kern.log: INFO: task oracle:7602 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. oracle D ffff8801a8837148 0 7602 1 0x00000000 ffffffff813bc480 0000000000000082 0000000000000000 0000000000000001 ffff8801a8b7fdd8 000000000000e1c8 ffff88003b397fd8 ffff88003f47d840 ffff88003f47dbe0 000000012416219a ffff88002820e1c8 ffff88003f47dbe0 Call Trace: [<ffffffffa018e8ae>] ? wait_barrier+0xee/0x130 [raid10] [<ffffffff8104f570>] ? default_wake_function+0x0/0x10 [<ffffffffa0191852>] ? make_request+0x82/0x5f0 [raid10] [<ffffffffa007cb2c>] ? md_make_request+0xbc/0x130 [md_mod] [<ffffffff810c4722>] ? mempool_alloc+0x62/0x140 [<ffffffff8117d26f>] ? generic_make_request+0x30f/0x410 [<ffffffff8112eee4>] ? bio_alloc_bioset+0x54/0xf0 [<ffffffff8112e28b>] ? __bio_add_page+0x12b/0x240 [<ffffffff8117d3cc>] ? submit_bio+0x5c/0xe0 [<ffffffff811313da>] ? dio_bio_submit+0x5a/0x90 [<ffffffff81131d63>] ? __blockdev_direct_IO+0x5a3/0xcd0(Continue reading)
Moving BIO_RW_AHEAD back to bit 1 might be a better solution but I'm
afraid that would cause more confusions downstream. This patch
updates READA and SWRITE to match BIO_RW_AHEAD and should also appear
in -stable releases. The next patch will create bio_types.h and
define all constants in terms of BIO_RW_*.
Thanks.
RSS Feed