rgmanager is jamed
2012-05-25 16:20:43 GMT
I am in the process of upgrading one of our cluster from RHEL 6.1 to 6.2. It's an 8-node cluster. I started with one node. Stop all cluster resources, cman, rgmanager et al. yum update, reboot, move to next. The first one did ok. On the second one, rgmanager started, but doesn't seem to connect to other nodes. I found this in dmesg : INFO: task rgmanager:2901 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rgmanager D 0000000000000000 0 2901 2900 0x00000080 ffff880667299d48 0000000000000082 0000000000000000 ffff8806656aa318 ffff88066729c378 0000000000000001 ffff880665bb31b0 00007fffc6c6fa20 ffff88066635a678 ffff880667299fd8 000000000000f4e8 ffff88066635a678 Call Trace: [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ee59b>] mutex_lock+0x2b/0x50 [<ffffffffa02c192c>] dlm_new_lockspace+0x3c/0xa30 [dlm] [<ffffffff8115f74c>] ? __kmalloc+0x20c/0x220 [<ffffffffa02ca94d>] device_write+0x30d/0x7d0 [dlm] [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20 [<ffffffff8120c646>] ? security_file_permission+0x16/0x20 [<ffffffff81176918>] vfs_write+0xb8/0x1a0 [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81177321>] sys_write+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced the node, same outcome.(Continue reading)
A "less simplified" version is below.
RSS Feed