aix系统突然报错3D32B80D 和3C81E43F 错误?

环境:软件:oracle11.2.0.4RAC ,操作系统aix6107,hacmp5.5硬件:小机2、存储2、网络*2架构:oracleRAC asm磁盘组利用双存储冗余架构构建冗余磁盘组;oracleRAC voteocr 利用HACMP 共享卷组方式创建磁盘心跳。问题:最近每个1~2个星期报一次3D32B80D 和3C81E43F 错误。错误代码...显示全部

环境:
软件:oracle11.2.0.4RAC ,操作系统aix6107,hacmp5.5
硬件:小机2、存储2、网络*2
架构:oracleRAC asm磁盘组利用双存储冗余架构构建冗余磁盘组;oracleRAC voteocr 利用HACMP 共享卷组方式创建磁盘心跳。
问题:最近每个1~2个星期报一次3D32B80D 和3C81E43F 错误。
错误代码如下:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
LABEL:          TS_NIM_ERROR_STUCK_

IDENTIFIER:     3D32B80D

Date/Time:       Wed Mar 16 03:08:20 CST 2022
Sequence Number: 50626
Machine Id:      00F83B6A4C00
Node Id:         newhisvhfs1
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   topsvcs

Description
NIM thread blocked

Probable Causes
A thread in a Topology Services Network Interface Module (NIM) process
was blocked
Topology Services NIM process cannot get timely access to CPU
The system clock was set forward

User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention
The system clock was manually set forward

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.41,7916
ERROR ID
6BUfAx.YECAW/uyc01cU08....................
REFERENCE CODE

Thread which was blocked
receive thread
Interval in seconds during which process was blocked
       27900
Interface name
rhdisk33


LABEL:          TS_NIM_ERROR_STUCK_
IDENTIFIER:     3D32B80D

Date/Time:       Wed Mar 16 03:08:20 CST 2022
Sequence Number: 50625
Machine Id:      00F83B6A4C00
Node Id:         newhisvhfs1
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   topsvcs
Description
NIM thread blocked

Probable Causes
A thread in a Topology Services Network Interface Module (NIM) process
was blocked
Topology Services NIM process cannot get timely access to CPU
The system clock was set forward

User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention
The system clock was manually set forward

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.41,7916
ERROR ID
6BUfAx.YECAW/fd/01cU08....................
REFERENCE CODE

Thread which was blocked
receive thread
Interval in seconds during which process was blocked
       27899
Interface name

rhdisk32

LABEL:          TS_NIM_ERROR_STUCK_
IDENTIFIER:     3D32B80D

Date/Time:       Wed Mar 16 03:08:20 CST 2022
Sequence Number: 50624
Machine Id:      00F83B6A4C00
Node Id:         newhisvhfs1
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   topsvcs

Description
NIM thread blocked

Probable Causes
A thread in a Topology Services Network Interface Module (NIM) process
was blocked
Topology Services NIM process cannot get timely access to CPU
The system clock was set forward

User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention
The system clock was manually set forward

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.41,7916
ERROR ID
6BUfAx.YECAW/eBh.1cU08....................
REFERENCE CODE

Thread which was blocked
receive thread
Interval in seconds during which process was blocked
       27900
Interface name
en15
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

集群心跳状态如下:
网络和磁盘心跳存在丢包现象
/#lssrc -ls topsvcs
Subsystem         Group            PID     Status
 topsvcs          topsvcs          7143780 active
Network Name   Indx Defd  Mbrs  St   Adapter ID      Group ID
net_ether_01_0 [ 0] 2     2     S    10.10.5.114     10.10.5.115
net_ether_01_0 [ 0] en17             0x419b608b      0x419b65c8
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 15 Current group: 5
Packets sent    : 17778053 ICMP 8 Errors: 0 No mbuf: 0
Packets received: 23130783 ICMP 51 Dropped: 0
NIM's PID: 5308798
net_ether_02_0 [ 1] 2     2     S    10.10.10.3      10.10.10.4
net_ether_02_0 [ 1] en15             0x419b608d      0x419b65cc
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 32 Current group: 15
Packets sent    : 17778315 ICMP 8 Errors: 0 No mbuf: 0
Packets received: 23111569 ICMP 52 Dropped: 0
NIM's PID: 6225934
diskhb_0       [ 2] 2     2     S    255.255.10.1    255.255.10.3
diskhb_0       [ 2] rhdisk33         0x8129fcfe      0x819b65cd
HB Interval = 2.000 secs. Sensitivity = 4 missed beats
Missed HBs: Total: 1727 Current group: 982
Packets sent    : 8471066 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 8931503 ICMP 0 Dropped: 0
NIM's PID: 7864588
diskhb_1       [ 3] 2     2     S    255.255.10.0    255.255.10.2
diskhb_1       [ 3] rhdisk32         0x8129fcff      0x819b65ce
HB Interval = 2.000 secs. Sensitivity = 4 missed beats
Missed HBs: Total: 1716 Current group: 982
Packets sent    : 8470911 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 8931415 ICMP 0 Dropped: 0
NIM's PID: 5963920
  2 locally connected Clients with PIDs:
haemd(7930352) hagsd(7274814)
  Fast Failure Detection available but off.
  Dead Man Switch Enabled:
     reset interval = 1 seconds
     trip  interval = 20 seconds
  Client Heartbeating Disabled.
  Configuration Instance = 4
  Daemon employs no security
  Segments pinned: Text Data.
  Text segment size: 862 KB. Static data segment size: 1497 KB.
  Dynamic data segment size: 5953. Number of outstanding malloc: 167
  User time 1246 sec. System time 988 sec.
  Number of page faults: 148. Process swapped out 0 times.
  Number of nodes up: 2. Number of nodes down: 0.

查看系统nmon日志,在问题发生时点未发现明显的IO或者内存使用异常。

收起
参与6

查看其它 1 个回答zwz99999的回答

zwz99999zwz99999系统工程师dcits

1、看看ntp时钟是否正常
2、查看网络通讯是否异常,特别ha网络及心跳网络
3、磁盘io使用情况,是否有坏道,检查存储
4、查看文件系统磁盘分布情况,是否有fs 使用过高,导致某块磁盘使用过高

系统集成 · 2022-03-23

回答者

zwz99999
系统工程师dcits
擅长领域: 服务器存储灾备

zwz99999 最近回答过的问题

回答状态

  • 发布时间:2022-03-23
  • 关注会员:3 人
  • 回答浏览:1056
  • X社区推广