互联网服务Power服务器aix 6

AIX 4DA8FE60错误日志

巡检日志的时候看到4DA8FE60,请问是什么原因?以下是错误信息,如果觉得不详细,我可以继续贴
LABEL:          DR_DMA_MIGRATE_FAIL
IDENTIFIER:     4DA8FE60

Date/Time:       Tue Aug  6 08:20:00 CST 2013
Sequence Number: 15460
Machine Id:      00C38CF54C00
Node Id:         p550a
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   DR_KER_MEM      

Description
Memory related DR operation failed

Probable Causes
DMA activity to memory being removed

Failure Causes
DMA specific memory migration failed

        Recommended Actions
        Quiesce the device causing DMA to the memory

Detail Data
Return Code
           0           2
Memory Address
0000 0003 7392 0000
Hypervisor return code
          -4
LIOBN
FFFF FFFF
DMA Address
0C00 0000 0000 0003 7392 0000
---------------------------------------------------------------------------
LABEL:          DR_DMA_MIGRATE_FAIL
IDENTIFIER:     4DA8FE60

Date/Time:       Wed Jul 17 10:45:00 CST 2013
Sequence Number: 15458
Machine Id:      00C38CF54C00
Node Id:         p550a
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   DR_KER_MEM      

Description
Memory related DR operation failed

Probable Causes
DMA activity to memory being removed

Failure Causes
DMA specific memory migration failed

        Recommended Actions
        Quiesce the device causing DMA to the memory

Detail Data
Return Code
           0           2
Memory Address
0000 0000 E582 6000
Hypervisor return code
          -4
LIOBN
FFFF FFFF
DMA Address
0C00 0000 0000 0000 E582 6000
参与10

7同行回答

午夜幽魂午夜幽魂系统运维工程师计算机有限公司
看到两种解释,不知道对你有没有帮助,看了一下,好像不太一样,仅供参考吧http://www-01.ibm.com/support/docview.wss?uid=isg1IY89339首先,gxibdd是"Infiniband Logical HCA Runtime"的一部分。所以,系统出现该报错的原因应该是存在关于Infiniband Host Channel Adapter(HCA)iba0...显示全部
看到两种解释,不知道对你有没有帮助,看了一下,好像不太一样,仅供参考吧http://www-01.ibm.com/support/docview.wss?uid=isg1IY89339

首先,gxibdd是"Infiniband Logical HCA Runtime"的一部分。所以,系统出现该报错的原因应该是存在关于Infiniband Host Channel Adapter(HCA)iba0的DR(dynamic relocation)尝试。
#lsdev -Cc adapter
iba0 Available U789D.001.DQD31K6-P1-C9 InfiniBand host channel adapter
查看系统中的适配器,果然存在iba0设备。
#lsattr -El iba0
dll_32_name /usr/lib/IbGxLib/libGxIb.so N/A True
dll_64_name /usr/lib/IbGxLib/libGxIb.so64 N/A True
在galaxy IB (HCA)的驱动中,存在一个已知的限制(limitation),即该适配器驱动不支持DR操作。
该实例的DR操作可能是由于VMM的内部机制所引起的。从AIX 5300-04起,对于power 5+和power6的服务器,AIX使用两种页面大小,4k和64k。VMM管理一个64
页面的池(pool)和一个4k页面的池(pool)。当一个池耗尽时,VMM可以通过将一个64k页面拆分成16个4k页面,或者将16个4k页面聚合成一个64k页面来挪用另一个池中的内存资源。
在后者情况下,被聚合的16个4k页面必须物理上连续。为了寻找到16个物理上连续的空闲页面,可能需要将一些4k页面的内容迁移到内存中的其他位置,从而释放这些页面的原始位置来拼凑出16个物理上连续的空闲页面。

这一操作通过DR(memory dynamic relocation)来实现。因为页面可能由DMA的驱动所使用,VMM将调用系统中所有注册的设备驱动。HCA驱动不知道如何处理这种DR操作,所以返回错误。此时,VMM终止该DR操作,并且将错误记录在AIX的errpt中。

在了解了错误产生的原因之后,可以知道对于DR_DMA_MAPPER_FAIL报错且Module Name为/usr/lib/drivers/gxibdd的情况,我们可以忽略这一报错。收起
系统集成 · 2013-08-07
浏览4317
jackyduysjackyduys项目经理苏源
楼主,有没有最后结论?显示全部
楼主,有没有最后结论?收起
系统集成 · 2014-07-28
浏览4392
huangrq_cnhuangrq_cn存储架构师中投科信
刚好碰到这问题,学习了显示全部
刚好碰到这问题,学习了收起
互联网服务 · 2014-07-09
浏览4329
yxdongzhiwenyxdongzhiwen系统运维工程师神州数码信息服务股份有限公司
请看看是不是P6的机器,微码版本是否低于EL350_107_038显示全部
请看看是不是P6的机器,微码版本是否低于EL350_107_038收起
系统集成 · 2013-11-13
浏览4167
yxdongzhiwenyxdongzhiwen系统运维工程师神州数码信息服务股份有限公司
EL350_107_03806/06/11Impact:  Availability        Severity:  ATTNew Features and FunctionsSupport for the attachment of a System Director Management Console (SDMC).System firmware changes that affect ...显示全部
EL350_107_038
06/06/11

Impact:  Availability        Severity:  ATT
New Features and Functions

Support for the attachment of a System Director Management Console (SDMC).
System firmware changes that affect all systems

PARTITION-DEFERRED:  A problem was fixed that prevented virtual LANs (VLANs) in a VIOS with partition ID of 1 from being displayed as bootable devices in the system management services (SMS) menus.
A problem was fixed that prevented a hardware management console (HMC) from being permanently disconnected using the Advanced System Management Services (ASMI) menus.
A problem was fixed that prevented the timed-power-on command from turning the system back on if the service processor's clock was adjusted to an earlier time.  Adjustment of the service processor's clock could have been done through the operating system or the Advanced System Management Interface (ASMI).  This problem could occur during the fall when clocks are set back when daylight saving time ends, for example.
A problem was fixed that caused certain service processor error log entries with a severity of "predictive", and a failing subsystem of "service processor firmware", to be erroneously converted to "informational".
A problem was fixed that caused the HMC2 port on the advanced system management interface (ASMI) to erroneously default to static IP addressing instead of dynamic.
A problem was fixed that caused a firmware installation to fail with SRC B181EF7C.
A problem was fixed that prevented processor resources from being moved to another partition by a DLPAR (dynamic LPAR) operation.
A problem was fixed that prevented partitions from booting.
The firmware was enhanced to list the attached devices when viewing the adapter information for a partition profile on the HMC GUI.
A problem was fixed that could cause the target partition to crash after a successful P6 to P7 partition migration.  Possible AIX error log entries include:  label: DSI_PROC, resource:  SYSVMM, with description: "DATA STORAGE INTERRUPT, PROCESSOR".  Other partition-related crash descriptors may also be logged.
A problem was fixed that could cause AIX error log entries following a successful partition migration.  Possible AIX error log entries include: label: RTAS_ERROR, resource: sysplanar0, with description: "INTERNAL ERROR CODE".  Other errors may also be logged.
A problem was fixed that caused a partition to crash with SRC BA330002 after several concurrent installations of system firmware, or partition migrations, without a reboot.
A problem was fixed that caused multiple DR_DMA_MIGRATE_FAIL entries in the AIX error log.
A problem was fixed that caused the installation of some versions of Linux to fail.
A problem was fixed that caused a partition migration or partition hibernation operation to hang with the partition left in the "suspending" state.
The firmware was enhanced to log SRC B1768B76 as informational instead of unrecoverable.收起
系统集成 · 2013-11-13
浏览4769
午夜幽魂午夜幽魂系统运维工程师计算机有限公司
看似内存报错,这是什么机器,看看ASMI有没有报错,检查一下内存容量有没有少呢,TEMP的,如果不是一至报,应该没问题显示全部
看似内存报错,这是什么机器,看看ASMI有没有报错,检查一下内存容量有没有少呢,
TEMP的,如果不是一至报,应该没问题收起
系统集成 · 2013-08-07
浏览4379
zwz99999zwz99999系统工程师dcits
好像是内存有点问题,临时性报错可以不用管!显示全部
好像是内存有点问题,临时性报错可以不用管!收起
系统集成 · 2013-08-07
浏览4431

提问者

finalsylph
系统工程师医药
擅长领域: 服务器存储PowerHA

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2013-08-07
  • 关注会员:1 人
  • 问题浏览:17073
  • 最近回答:2014-07-28
  • X社区推广