mxin辛旻
作者mxin辛旻2017-10-17 12:45
资深工程师, 上海宝信软件股份有限公司

AIX 系统hang住,停止服务,dump报KERNEL_ABEND报错的解决方法

字数 3997阅读 7845评论 3赞 7

现象:AIX6 系统hang住,停止服务.

       硬件强行关机重起后发现dump和errpt中的KERNEL_ABEND报错.

结论:系统运行过程中IBM的系统监控程序触发了AIX6.1的IZ93856补丁所修正的V_FREESCB的bug。
处理建议:1.停止IBM系统监控程序运行(推荐)

               2.升级bos.mp64的软件包从6.1.6.1升级到6.1.6.15(不推荐)。
             

分析证据:

1.errpt报错

LABEL: KERNEL_ABEND
IDENTIFIER: 0975DD6C

Date/Time: Mon Aug 28 14:48:18 2017
Sequence Number: 2568183
Machine Id: 00C68D064C00
Node Id: XXXXXXXX
Class: S
Type: PERM
WPAR: Global
Resource Name: ABEND

Description
KERNEL ABNORMALLY TERMINATED

Probable Causes
SOFTWARE PROGRAM

Failure Causes
SOFTWARE PROGRAM

Recommended Actions
OBTAIN DUMP
CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
ABEND CODE
EEEE 0000 9672 7028
ABEND DETAIL
0000 0000 00ED BB2D
ABEND CALLER
0000 0000 000F 2B84
MACHINE STATE REGISTER, MSR
8000 0000 0002 9032
2.dump 的kdb分析
sip570av[/viobak]#./kdb dump unix
dump mapped from @ 700000000000000 to @ 7000008c89d4e2f
.......
(16)> stat
SYSTEM_CONFIGURATION:
CHRP_SMP_PCI POWER_PC POWER_6 machine with 32 available CPU(s) (64-bit registers)

SYSTEM STATUS:
sysname... AIX
nodename.. XXXXXXXXX
release... 1
version... 6
build date Aug 27 2010
build time 17:03:39
label..... 1034A_61L
machine... 00C68D064C00
nid....... C68D064C
time of crash: Mon Aug 28 14:28:03 2017
age of system: 355 day, 22 hr., 43 min., 2 sec.
xmalloc debug: enabled
FRRs active... 0
FRRs started.. 0

CRASH INFORMATION:
CPU 16 CSA F1000815B043BD00 at time of crash, error code for LEDs: 70000000
pvthread+225B00 STACK:
[0001BF00]abend_trap+000000 ()
[000F2B80]v_freescb+001260 (??, ??)
[002D279C]v_freewseg+0003DC (??, ??)
[0005E73C].backt+000100 ()

3.IBM网站的补丁说明
http://www-01.ibm.com/support/docview.wss?uid=isg1IZ93856

IZ93856: SYSTEM ASSERTS IN V_FREESCB() IF SCB_VPAGES IS NOT 0 APPLIES TO AIX 6100

-06
A fix is available

Obtain the fix for this APAR.
APAR status

Closed as program error.

Error description

System asserts while freeing a segment if virtual page
count for the segment
is not 0. When this happens, the stack will be similar to

(0)> f
pvthread+024700 STACK:
[0001BF00]abend_trap+000000 ()
[00039BDC]v_freescb+00121C (??, ??)
[0003178C]v_freewseg+0003EC (??, ??)
[00060A7C].backt+000100 ()

Local fix

Problem summary

****************************************************************
* USERS AFFECTED:
* Users of the 6100-06 Technology Level with the bos.mp64
* fileset below the level of 6.1.6.4.
****************************************************************
* PROBLEM DESCRIPTION:
* System asserts while freeing a segment if virtual page count
* for the segment is not 0. When this happens, the stack will
* be similar to:
* (0)> f
* pvthread+024700 STACK:
*  0001BF00 abend_trap+000000 ()
*  00039BDC v_freescb+00121C (??, ??)
*  0003178C v_freewseg+0003EC (??, ??)
*  00060A7C .backt+000100 ()
****************************************************************
* RECOMMENDATION:
* Install APAR IZ93856.
****************************************************************

Problem conclusion

The v_mvfork() function should perform it's own local count of
the pinned pages in a SCB instead of relying on scb_nppages
which could be updated concurrently by the fast short-term
unpin code at the same time.

Temporary fix

*********
* HIPER *
*********

Comments

6100-03 - use AIX APAR IZ94002
6100-04 - use AIX APAR IZ93256
6100-05 - use AIX APAR IZ92186
6100-06 - use AIX APAR IZ93856
7100-00 - use AIX APAR IZ95252

APAR Information

APAR number

IZ93856
Reported component name

AIX 610 STD EDI
Reported component ID

5765G6200
Reported release

610
Status

CLOSED PER
PE

NoPE
HIPER

YesHIPER
Submitted date

2011-01-31
Closed date

2011-01-31
Last modified date

2012-05-18

APAR is sysrouted FROM one or more of the following:

IZ92186
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

AIX 610 STD EDI
Fixed component ID

5765G6200

Applicable component levels

R610 PSY U833973

   UP11/05/13 I 1000

PTF to Fileset Mapping

U833973 bos.mp64 6.1.6.15

4.当前系统的软件包版本。
bos.mp64 6.1.6.1 C F Base Operating System 64-bit

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

7

添加新评论3 条评论

wuwenpinwuwenpin软件开发工程师, 南京
2017-10-21 18:04
学习了。谢谢 分享!
buxl2012buxl2012系统运维工程师, linux aix
2017-10-21 16:40
大神,怎么用kdb进行分析,求教,谢谢

mxin辛旻@buxl2012 其他人的博客里有(好像是王巧雷?),可以搜索一下

2017-10-30 13:23
IBM_2012IBM_2012系统运维工程师, 北京
2017-10-19 09:34
学习了。谢谢 分享!
Ctrl+Enter 发表

本文隶属于专栏

AIX运维专栏
专注于AIX系统运维,系统管理。

作者其他文章

相关文章

相关问题

相关资料

X社区推广