互联网服务 AIX

4DA8FE60 Memory related DR operation failed

硬件p570 ,系统6.1
本机为oracle dg端,同步数据日志很慢,基本上5分钟一个,还有500多个日志没有应用同步;重启之后速度立马上去,几十秒应用一个日志,但是10分钟过后又慢下来,看aix日志发现有报错。
通过topas看,发现内存被用完,lsps发现也用了50%。

这个问题是什么问题?难道是bug?请大神帮忙?
如下:
4DA8FE60   1014235514 T S DR_KER_MEM     Memory related DR operation failed

# oslevel -s
6100-01-09-1015
# errpt -aj 4DA8FE60
---------------------------------------------------------------------------
LABEL:          DR_DMA_MIGRATE_FAIL
IDENTIFIER:     4DA8FE60

Date/Time:       Wed Oct 15 09:58:28 CST 2014
Sequence Number: 487
Machine Id:      00CEDC054C00
Node Id:         r3dg
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   DR_KER_MEM      

Description
Memory related DR operation failed

Probable Causes
DMA activity to memory being removed

Failure Causes
DMA specific memory migration failed

        Recommended Actions
        Quiesce the device causing DMA to the memory

Detail Data
Return Code
           0           2
Memory Address
0000 0004 7AA2 C000
Hypervisor return code
          -4
LIOBN
FFFF FFFF
DMA Address
0C00 0000 0000 0004 7AA2 C000

这个是什么问题,难道碰见bug?
参与3

2 同行回答

abit2007 abit2007 系统工程师 代维
1、调整内存参数,2、增加pagespace空间大小,最后放在存储盘上。显示全部
1、调整内存参数,
2、增加pagespace空间大小,最后放在存储盘上。 收起
互联网服务 · 2014-10-15
浏览14771
fly0176 fly0176 IT顾问 新明星
1.系统出现268DA6A3错误 $ errpt -aj 268DA6A3 --------------------------------------------------------------------------- LABEL:          DR_DMA_MAPPER_FAIL IDENTIFIER:     268DA6A3Date/Time:  &nbs...显示全部
1.系统出现268DA6A3错误
$ errpt -aj 268DA6A3
---------------------------------------------------------------------------
LABEL:          DR_DMA_MAPPER_FAIL
IDENTIFIER:     268DA6A3
Date/Time:       Thu Sep 18 18:29:00 GMT+08:00 2014
Sequence Number: 12463954
Machine Id:      00F7025F4C00
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   DR_KER_MEM     
Description
Memory related DR operation failed
Probable Causes
DMA Mapper DR handler failure
Failure Causes
DMA specific memory mapper failed
        Recommended Actions
        Try DR operation on other memory resources
Detail Data
Return Code
           1          -1
Memory Address
0000 0007 E84C 0000
Handler Address
0000 0000 0413 0840
Module Name
/usr/lib/drivers/pci/pci_busdd
---------------------------------------------------------------------------
LABEL:          DR_DMA_MAPPER_FAIL
IDENTIFIER:     268DA6A3
Date/Time:       Thu Sep 18 18:28:00 GMT+08:00 2014
Sequence Number: 12463947
Machine Id:      00F7025F4C00
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   DR_KER_MEM     
Description
Memory related DR operation failed
Probable Causes
DMA Mapper DR handler failure
Failure Causes
DMA specific memory mapper failed
        Recommended Actions
        Try DR operation on other memory resources
Detail Data
Return Code
           1          22
Memory Address
0000 0006 72B5 0000
Handler Address
0000 0000 0443 3680
Module Name
/usr/lib/drivers/headd
2.集群CRSD进程出现报错并最终导致进程中止
2014-09-18 18:28:05.716: [  CRSEVT][16919]32CAAMonitorHandler :: 0:Could not execute /oracle/product/10.2/crs/bin/racgwrap(check) for ora.vip
category: 1234, operation: scls_process_spawn, loc: read_pipe, OS error: 12, other: EOF on read pipe
2014-09-18 18:28:05.750: [  CRSAPP][16919]32CheckResource error for ora.vip error code = -1
……
[  OCRAPI][3368]procr_ctx_set_invalid_no_abort: ctx set to invalid
2014-09-18 18:36:51.127: [ CSSCLNT][11080]clsssRecvMsg: comm error received, comrc 11, con (114b682f0), msg (114b63150), msgl 144
2014-09-18 18:36:51.136: [ CSSCLNT][7994]clsssRecvMsg: comm error received, comrc 11, con (1134b5890), msg (1135b99f0), msgl 144
2014-09-18 18:36:51.178: [ CSSCLNT][7994]clssgsGGetStatus:  communications failed (0/3/324770392)
2014-09-18 18:36:51.178: [ CSSCLNT][7994]clssgsGGetStatus: returning 8
2014-09-18 18:36:51.156: [ CSSCLNT][11080]clssgsGGetStatus:  communications failed (0/3/0)
2014-09-18 18:36:51.178: [ CSSCLNT][11080]clssgsGGetStatus: returning 8
2014-09-18 18:36:51.218: [  CRSEVT][11080]32Error in clssgsgrpstat rc =8
2014-09-18 18:36:51.232: [    CRSD][7994][PANIC]32 termination by CSS, ret=
2014-09-18 18:36:51.241: [    CRSD][7994]32Done.
3.数据库alert日志中出现status 12错误信息
Thu Sep 18 18:30:35 2014
Process startup failed, error stack:
Thu Sep 18 18:30:35 2014
Errors in file /oracle/admin/orcl/bdump/orcl1_psp0_11469288.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
4.之后节点2将节点1重启,并接管了节点1的vip
5.节点1重启之后,数据库alert日志中报
AUTO SGA: Disabling background sga auto-tuning.
Thu Sep 18 18:47:46 2014
Error 0 in kwqmnpartition(), aborting txn
Thu Sep 18 18:47:49 2014
ORA-376 encountered when generating server alert SMG-4120
Thu Sep 18 18:47:50 2014
Errors in file /oracle/admin/orcl/bdump/orcl1_smon_10551752.trc:
ORA-01595: error freeing extent (12) of rollback segment (133))
ORA-00376: file 2 cannot be read at this time
ORA-01110: data file 2: '/dev/vx/rdsk/vgorc/lvorcl_undotbs1_1'
使用recover datafile之后数据库恢复。
问题分析:
关于ORA-2730* status 12故障,主要可能有两种情况导致
1.服务器资源耗尽,比如内存或交换空间,或者是一些其他资源,在有些系统中可能是nproc或maxnproc参数设置太小需要进行调整(参考 文档 579365_1)
2.AIX系统需要安装IV37048补丁(IV37048 CIFS_FS LEAVES BEHIND DEFUNCT KERNEL PROCESSES)(参考MOS 文档 ID 1541121.1)
  如果是这种情况,服务器可能出现下面现象:
  -AIX系统不可用
  -系统命令,比如ps命令返回fork或malloc错误
  -无法连接到数据库
  -命令行HANG住
  -僵尸进程 收起
系统集成 · 2014-10-15
浏览15099

提问者

yujin2010good
yujin2010good 0 0 10
系统工程师 大型零售巨头
擅长领域: 云计算服务器存储
评论697

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2014-10-15
  • 关注会员:1 人
  • 问题浏览:25275
  • 最近回答:2014-10-15
  • X社区推广