诡异案例2-PowerHA给Oracle新增表空间,遭遇memory croedump

通过PowerHA给Oracle新增表空间,使用C-SPOC在线添加LV,开始给Datavg添加,很顺利,添加了10个很成功,在继续添加新的lv后,居然报错了. memory croedump.1,内存不够用了吗.看了一下确实有点紧张hostA:hostB:...显示全部

通过PowerHA给Oracle新增表空间,使用C-SPOC在线添加LV,开始给Datavg添加,很顺利,添加了10个很成功,在继续添加新的lv后,居然报错了. memory croedump.

图片.png

1,内存不够用了吗.看了一下确实有点紧张

hostA:

图片.png

hostB:

图片.png

收起
参与5

查看其它 2 个回答myciciy的回答

myciciymyciciyIT顾问某金融科技公司

这个问题从内存报错开始容易给人内存不足假象,实际环境确实也是内存利用率很高,但是定位最终问题不是内存不足造成的。

首先,刚开始新增的几个lv都顺利,后几个就失败了,然后新增其他vg的lv也成功了,这个时候就开始怀疑遇到bug了。

第二,一般裸奔的powerha,遭遇bug的可能性比较大,检查一下powerha补丁情况吧

图片.png


果然,基本上就是一个裸奔的Powerha环境,遇到bug也就不足为奇了

第三,既然怀疑是bug,那就找点说服力的东西出来.如下所示

IV36992: CLPASSWDREMOTE CORE DUMPS DUE TOMEMORY FAULT

A fix is available

Obtain the fix forthis APAR.

Error description

The clpasswdremote utility is core dumping due to

segmentation

fault.

The problem occurs when the user is missing in

/etc/passwd

in one of the nodes in the cluster.

The cspoc.log will log the following:

[========== C_SPOC COMMAND LINE==========]

/usr/es/sbin/cluster/sbin/cl_chpasswd -cspoc-f -r

-cspoc -grg1 test3

hacmp13: success:

/usr/es/sbin/cluster/etc/clpasswd/usr_bin_passwd.orig

test3

hacmp14: FAILED: eval  clpasswdremote -u test3 -p

\'raCYOSMwhoJU.\' -f 2 -l 0

hacmp14: cexec[54]: 3735594 Memoryfault(coredump)

hacmp14: RETURN_CODE=139

hacmp14: cl_rsh had exit code =139, see cspoc.log

and/or clcomd.log for more information

The error report will log a CORE DUMP error with the

following stack trace:

main 94

main 88

__start 6C

The following symptom code is logged as well:

PIDS/5765E6200 LVLS/520 PCSS/SPI2 FLDS/clpasswdr SIG/11

FLDS/main

VALU/94 FLDS/__start

Local fix

Ensure that the user exists in /etc/passwd file in all of

the nodes in the cluster.

Problem summary

If a user is not created using cspoc so that it exists on

all nodes in the cluster, then if you try to change that

user\'s password cluster wide using cspoc, clpasswordremote

will core dump on nodes where the user is not configured.

The smit output will look like:

Changing password for \"tstuser\"

hack2: cexec 54 : 8781858 Memory fault(coredump)

Problem conclusion

A check was added to clpasswdremote to avoid attempting to

change the password on a node where the user is not defined.

银行 · 2017-03-30
浏览2234

回答者

myciciy
myciciy21035
IT顾问某金融科技公司
擅长领域: 服务器存储灾备

myciciy 最近回答过的问题

回答状态

  • 发布时间:2017-03-30
  • 关注会员:2 人
  • 回答浏览:2234
  • X社区推广