目前环境如下,一个数据库 两套RAC, 4个实例。A机因为dump宕机了,B机也跟着宕机,无硬件错误日志,dump日志内容如下
A机日志内容如下
3C81E43F 0621000016 P U topsvcs Late in sending heartbeat
A6DF45AA 0620000916 I O RMCdaemon The daemon is started.
67145A39 0620000716 U S SYSDUMP SYSTEM DUMP
F48137AC 0620000516 U O minidump COMPRESSED MINIMAL DUMP
225E3B63 0620000516 T S PANIC SOFTWARE PROGRAM ABNORMALLY TERMINATED
9DBCFDEE 0620000916 T O errdemon ERROR LOGGING TURNED ON
90EDB0A5 0619235916 P S topsvcs Dead Man Switch being allowed to expire.
BA6A5ED2 0619231616 I S rmt6 CONFIGURATION MISMATCH
# errpt -aj 3C81E43F
---------------------------------------------------------------------------
LABEL: TS_LATEHB_PE
IDENTIFIER: 3C81E43F
Date/Time: Tue Jun 21 00:00:07 GMT+08:00 2016
Sequence Number: 490017
Machine Id: 00CC37E64C00
Node Id: hzypa
Class: U
Type: PERF
WPAR: Global
Resource Name: topsvcs
Resource Class: NONE
Resource Type: NONE
Location:
Description
Late in sending heartbeat
Probable Causes
Heavy CPU load
Severe physical memory shortage
Heavy I/O activities
Failure Causes
Daemon can not get required system resource
Recommended Actions
Reduce the system load
Detail Data
DETECTING MODULE
rsct,bootstrp.C,1.215.1.10,5366
ERROR ID
6zESUw.5A/OL/lXb.t2pZ8....................
REFERENCE CODE
A heartbeat is late by the following number of seconds
8
#
#
# errpt -aj 67145A39
---------------------------------------------------------------------------
LABEL: DUMP_STATS
IDENTIFIER: 67145A39
Date/Time: Mon Jun 20 00:07:04 GMT+08:00 2016
Sequence Number: 489865
Machine Id: 00CC37E64C00
Node Id: hzypa
Class: S
Type: UNKN
WPAR: Global
Resource Name: SYSDUMP
Description
SYSTEM DUMP
Probable Causes
UNEXPECTED SYSTEM HALT
User Causes
SYSTEM DUMP REQUESTED BY USER
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Failure Causes
UNEXPECTED SYSTEM HALT
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
DUMP DEVICE
/dev/lg_dumplv
DUMP SIZE
1773494272
TIME
Sun Jun 19 23:59:26 2016
DUMP TYPE (1 = PRIMARY, 2 = SECONDARY)
1
DUMP STATUS
0
ERROR CODE
0000 0000 0000 0000
DUMP INTEGRITY
Compressed dump - Run dmpfmt with -c flag on dump after uncompressing.
FILE NAME
PROCESSOR ID
0
#
LABEL: TS_DMS_EXPIRING_EM
IDENTIFIER: 90EDB0A5
Date/Time: Sun Jun 19 23:59:16 GMT+08:00 2016
Sequence Number: 489861
Machine Id: 00CC37E64C00
Node Id: hzypa
Class: S
Type: PEND
WPAR: Global
Resource Name: topsvcs
Description
Dead Man Switch being allowed to expire.
If a TS_DMS_RESTORED_TE error appears after this, that will indicate this
condition has been recovered from. Otherwise, a DMS-triggered node failure
should be expected to occur after the time indicated in the Detail Data.
Probable Causes
Topology Services has detected blockage that puts it in danger of suffering
a sundered network. This is due to all viable NIM processes experiencing
blockage, or the daemon's main thread being hung for too long.
User Causes
Excessive I/O load is causing high I/O interrupt traffic
Excessive memory consumption is causing high memory contention
Recommended Actions
Reduce application load on the system
Change (relax) Topology Services tunable parameters
Call IBM Service if problem persists
Failure Causes
Problem in Operating System prevents processes from running
Excessive I/O interrupt traffic prevents processes from running
Excessive virtual memory activity prevents Topology Services from making progress
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Change (relax) Topology Services tunable parameters
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.34,4890
ERROR ID
6Z0PvE0I3gNL/wtD/t2pZ8....................
REFERENCE CODE
Time remaining until DMS triggers (in msec)
10000
DMS trigger interval (in msec)
20000
---------------------------------------------------------------------------
LABEL: SC_TAPE_ERR7
IDENTIFIER: BA6A5ED2
Date/Time: Sun Jun 19 23:16:41 GMT+08:00 2016
Sequence Number: 489860
Machine Id: 00CC37E64C00
Node Id: hzypa
Class: S
Type: INFO
WPAR: Global
Resource Name: rmt6
Description
CONFIGURATION MISMATCH
Probable Causes
CONFIGURATION
CONFIGURATION PARAMETER MISMATCH
DEVICE CONFIGURATION DATABASE
Failure Causes
SOFTWARE DEVICE DRIVER
Recommended Actions
VERIFY SYSTEM CONFIGURATION IS VALID
CORRECT CONFIGURATION
REFER TO PRODUCT DOCUMENTATION FOR ADDITIONAL INFORMATION
Detail Data
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0007 7298 0000 0000 007B 0C00 0007 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0019
B机日志内容如下
A6DF45AA 0620001416 I O RMCdaemon The daemon is started.
67145A39 0620001216 U S SYSDUMP SYSTEM DUMP
F48137AC 0620001116 U O minidump COMPRESSED MINIMAL DUMP
AB59ABFF 0620001116 U U LIBLVM Remote node Concurrent Volume Group fail
9DBCFDEE 0620001416 T O errdemon ERROR LOGGING TURNED ON
AB59ABFF 0620000016 U U LIBLVM Remote node Concurrent Volume Group fail
# errpt -aj AB59ABFF
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Mon Jun 20 00:11:12 GMT+08:00 2016
Sequence Number: 371773
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A08 A77F
MAJOR/MINOR DEVICE NUMBER
002C 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Mon Jun 20 00:00:42 GMT+08:00 2016
Sequence Number: 371771
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A08 A77F
MAJOR/MINOR DEVICE NUMBER
002C 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Mon Jun 20 00:00:42 GMT+08:00 2016
Sequence Number: 371770
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A0E B016
MAJOR/MINOR DEVICE NUMBER
002D 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Mon Jun 20 00:00:42 GMT+08:00 2016
Sequence Number: 371769
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A0B BD73
MAJOR/MINOR DEVICE NUMBER
002E 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Mon Jun 20 00:00:41 GMT+08:00 2016
Sequence Number: 371768
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A08 A77F
MAJOR/MINOR DEVICE NUMBER
002C 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Mon Jun 20 00:00:41 GMT+08:00 2016
Sequence Number: 371767
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A0B BD73
MAJOR/MINOR DEVICE NUMBER
002E 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Mon Jun 20 00:00:41 GMT+08:00 2016
Sequence Number: 371766
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A0E B016
MAJOR/MINOR DEVICE NUMBER
002D 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Wed Jun 1 00:00:42 GMT+08:00 2016
Sequence Number: 370819
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A08 A77F
MAJOR/MINOR DEVICE NUMBER
002C 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Wed Jun 1 00:00:42 GMT+08:00 2016
Sequence Number: 370818
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A0E B016
MAJOR/MINOR DEVICE NUMBER
002D 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Wed Jun 1 00:00:42 GMT+08:00 2016
Sequence Number: 370817
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A08 A77F
MAJOR/MINOR DEVICE NUMBER
002C 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Wed Jun 1 00:00:42 GMT+08:00 2016
Sequence Number: 370816
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A0E B016
MAJOR/MINOR DEVICE NUMBER
002D 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Wed Jun 1 00:00:42 GMT+08:00 2016
Sequence Number: 370815
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A0B BD73
MAJOR/MINOR DEVICE NUMBER
002E 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_RLEAVE
IDENTIFIER: AB59ABFF
Date/Time: Wed Jun 1 00:00:42 GMT+08:00 2016
Sequence Number: 370814
Machine Id: 00CC37F64C00
Node Id: hzypb
Class: U
Type: UNKN
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Remote node Concurrent Volume Group failure detected
Probable Causes
Remote node Concurrent Volume Group forced offline
Failure Causes
Remote node left VGSA/VGDA groups due to failure
Recommended Actions
Examine error log on identified remote node
Detail Data
Remote Node Name
hzypa
Volume Group ID
00CC 37E6 0000 4C00 0000 0141 0A0B BD73
MAJOR/MINOR DEVICE NUMBER
002E 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000