这个问题从内存报错开始容易给人内存不足假象,实际环境确实也是内存利用率很高,但是定位最终问题不是内存不足造成的。
首先,刚开始新增的几个lv都顺利,后几个就失败了,然后新增其他vg的lv也成功了,这个时候就开始怀疑遇到bug了。
第二,一般裸奔的powerha,遭遇bug的可能性比较大,检查一下powerha补丁情况吧
果然,基本上就是一个裸奔的Powerha环境,遇到bug也就不足为奇了
第三,既然怀疑是bug,那就找点说服力的东西出来.如下所示
IV36992: CLPASSWDREMOTE CORE DUMPS DUE TOMEMORY FAULT
A fix is available
Error description
The clpasswdremote utility is core dumping due to
segmentation
fault.
The problem occurs when the user is missing in
/etc/passwd
in one of the nodes in the cluster.
The cspoc.log will log the following:
[========== C_SPOC COMMAND LINE==========]
/usr/es/sbin/cluster/sbin/cl_chpasswd -cspoc-f -r
-cspoc -grg1 test3
hacmp13: success:
/usr/es/sbin/cluster/etc/clpasswd/usr_bin_passwd.orig
test3
hacmp14: FAILED: eval clpasswdremote -u test3 -p
\'raCYOSMwhoJU.\' -f 2 -l 0
hacmp14: cexec[54]: 3735594 Memoryfault(coredump)
hacmp14: RETURN_CODE=139
hacmp14: cl_rsh had exit code =139, see cspoc.log
and/or clcomd.log for more information
The error report will log a CORE DUMP error with the
following stack trace:
main 94
main 88
__start 6C
The following symptom code is logged as well:
PIDS/5765E6200 LVLS/520 PCSS/SPI2 FLDS/clpasswdr SIG/11
FLDS/main
VALU/94 FLDS/__start
Local fix
Ensure that the user exists in /etc/passwd file in all of
the nodes in the cluster.
Problem summary
If a user is not created using cspoc so that it exists on
all nodes in the cluster, then if you try to change that
user\'s password cluster wide using cspoc, clpasswordremote
will core dump on nodes where the user is not configured.
The smit output will look like:
Changing password for \"tstuser\"
hack2: cexec 54 : 8781858 Memory fault(coredump)
Problem conclusion
A check was added to clpasswdremote to avoid attempting to
change the password on a node where the user is not defined.
收起