环境说明:
该DB2数据库为一个基于AIX小机的DB2DPF环境,包括110,111,112,113,114五个server,其中110为管理节点单独一个物理主机;111-114为数据节点,111和113是从一个物理主机LPAR出来的2个逻辑主机;112和114也是一个物理主机上通过LPAR出来的2个逻辑主机。
故障前操作说明:
周四111和113上更换主机配件(CPU恒压器),需要DB2数据库配合启停。
晚上8点我在110上db2stop关闭实例,接着主机组开始停主机更换配件,期间需要做gpfs启停,mount,unmout等操作。然后晚上十点多结束然后启动主机,主机起来之后我这边db2start启动数据库,数据库起来之后检查所有节点都正常,也能连接数据库,表空间也都正常。到了后半夜一点左右应用端说很多表访问都有sql1299n报错,到现场后发现114主机的node33-node40的8个节点没有了,已停止。然后我db2stop/db2start重启实例,数据库恢复正常。但是不清楚什么原因造成的,怀疑和主机更换配件操作有关系。以下是主要的诊断日志信息,麻烦各位看下帮分析下是什么原因导致的单个节点实例宕机,感谢!
................................
2019-07-12-01.05.02.216277+480I15516677A581 LEVEL:Error
PID :3409250 TID :14138 PROC:db2sysc37
INSTANCE:xjbiinst NODE:037 DB :
APPHDL :37-420 APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH
EDUID :14138 EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,basesysutilities,sqleagnt_sigsegvh,probe:5
MESSAGE:ErrorinagentservicingapplicationwithINBOUNDSEQUENCENUMBER:
DATA#1:Hexdump,2bytes
0x078000000172632C:0000 ..
2019-07-12-01.05.02.217085+480I15517259A1223 LEVEL:Error
PID :3409250 TID :14138 PROC:db2sysc37
INSTANCE:xjbiinst NODE:037 DB :
APPHDL :37-420 APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH
EDUID :14138 EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,basesysutilities,sqleagnt_sigsegvh,probe:8
MESSAGE:ErrorinagentservicingapplicationwithAUTHORIZATIONID:
DATA#1:Hexdump,129bytes
0x0780000001725689:42495748202020200000000000000000 BIWH ........
0x0780000001725699:00000000000000000000000000000000 ................
0x07800000017256A9:00000000000000000000000000000000 ................
0x07800000017256B9:00000000000000000000000000000000 ................
0x07800000017256C9:00000000000000000000000000000000 ................
0x07800000017256D9:00000000000000000000000000000000 ................
0x07800000017256E9:00000000000000000000000000000000 ................
0x07800000017256F9:00000000000000000000000000000000 ................
0x0780000001725709:00 .
...............................
2019-07-12-01.05.02.219602+480I15519653A498 LEVEL:Severe
PID :3409250 TID :14138 PROC:db2sysc37
INSTANCE:xjbiinst NODE:037 DB :
APPHDL :37-420 APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH
EDUID :14138 EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,DRDAApplicationServer,sqljsSignalHandler,probe:10
MESSAGE:DIA0505IExecutionofacomponentsignalhandlingfunctionhasbegun.
........................................................
2019-07-12-01.05.08.281697+480I15537945A405 LEVEL:Event
PID :2098662 TID :1 PROC:db2vend(PDVendorProcess-258
INSTANCE:xjbiinst NODE:000
EDUID :1 EDUNAME:db2vend(PDVendorProcess-258
FUNCTION:DB2UDB,traceservices,pdInvokeCalloutScript,probe:20
STOP :Completedinvoking/db2home/xjbiinst/sqllib/bin/db2cos_trap
2019-07-12-01.05.08.282258+480I15538351A626 LEVEL:Error
PID :3409250 TID :14138 PROC:db2sysc37
INSTANCE:xjbiinst NODE:037 DB :
APPHDL :37-420 APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH
EDUID :14138 EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,RAS/PDcomponent,pdVendorCallWrapper,probe:285
MESSAGE:ZRC=0x870F0057=-2029060009=SQLO_TIMEOUT"Operationtimedout"
DIA8578CAtimeoutoccurredwhilewaitingonasemaphore.
DATA#1:String,46bytes
Warning:PDVendorProcesshasreachedtimeout
2019-07-12-01.05.08.283685+480E15538978A1234 LEVEL:Critical
PID :3409250 TID :14138 PROC:db2sysc37
INSTANCE:xjbiinst NODE:037 DB :
APPHDL :37-420 APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH
EDUID :14138 EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,opersystemservices,sqloEDUCodeTrapHandler,probe:90
MESSAGE:ADM14011C Acriticalfailurehascausedthefollowingtypeoferror:
"Trap".TheDB2databasemanagercannotrecoverfromthefailure.
FirstOccurrenceDataCapture(FODC)wasinvokedinthefollowing
mode:"Automatic".FODCdiagnosticinformationislocatedinthe
followingdirectory:
"/db2diag/db2dump/FODC_Trap_2019-07-12-01.05.02.015729_0037/".
DATA#1:SignalNumberRecieved,4bytes
11
DATA#2:Siginfo,64bytes
0x0A00000002409DC0:0000000B000000000000003200000000 ...........2....
0x0A00000002409DD0:00000000000000005D28BD9F00000008 ........](......
0x0A00000002409DE0:00000000000000000000000000000000 ................
0x0A00000002409DF0:00000000000000000000000000000000 ................
...........................................
2019-07-12-01.15.56.517541+480I15668263A508 LEVEL:Error
PID :3670780 TID :66260 PROC:db2sysc33
INSTANCE:xjbiinst NODE:033 DB :XJBIDB
APPHDL :0-3550 APPID:10.238.100.181.60854.190711170821
AUTHID :ODS
EDUID :66260 EDUNAME:db2agntp(XJBIDB)33
FUNCTION:DB2UDB,bufferdistserv,sqlkdReceiveData,probe:5
RETCODE:ZRC=0x81580016=-2124939242=SQLKD_NODE_FAILURE
"MappingforSQLKF_NODE_FAILED"
......................................................................
2019-07-12-01.15.56.913498+480I15784791A549 LEVEL:Error
PID :2622644 TID :52890 PROC:db2sysc34
INSTANCE:xjbiinst NODE:034 DB :XJBIDB
APPHDL :0-3374 APPID:10.238.100.180.47058.190711170215
AUTHID :ODS
EDUID :52890 EDUNAME:db2agntp(XJBIDB)34
FUNCTION:DB2UDB,databaseutilities,DIAG_ERROR,probe:0
DATA#1:String,114bytes
LOADID:95352.2019-07-12-01.13.21.268758.0(24;4416)
,-2124939242,0,Detectedinfile:sqlusSARouter.C,Line:204
2019-07-12-01.15.56.914379+480I15785341A550 LEVEL:Error
PID :2622644 TID :51348 PROC:db2sysc34
INSTANCE:xjbiinst NODE:034 DB :XJBIDB
APPHDL :0-3551 APPID:10.238.100.181.60855.190711170821
AUTHID :ODS
EDUID :51348 EDUNAME:db2agntp(XJBIDB)34
FUNCTION:DB2UDB,databaseutilities,DIAG_ERROR,probe:0
DATA#1:String,115bytes
LOADID:64130.2019-07-12-01.09.47.701472.0(10;17085)
,-2124939242,0,Detectedinfile:sqlusSARouter.C,Line:204
2019-07-12-01.15.57.022485+480E15785892A548 LEVEL:Severe
PID :2491092 TID :258 PROC:db2wdog37
INSTANCE:xjbiinst NODE:037
EDUID :258 EDUNAME:db2wdog37
FUNCTION:DB2UDB,basesysutilities,sqleWatchDog,probe:20
MESSAGE:ADM0503C Anunexpectedinternalprocessingerrorhasoccurred.All
DB2processesassociatedwiththisinstancehavebeenshutdown.
Diagnosticinformationhasbeenrecorded.ContactIBMSupportfor
furtherassistance.
2019-07-12-01.15.57.024626+480E15786441A424 LEVEL:Error
PID :2491092 TID :258 PROC:db2wdog37
INSTANCE:xjbiinst NODE:037
EDUID :258 EDUNAME:db2wdog37
FUNCTION:DB2UDB,basesysutilities,sqleWatchDog,probe:21
DATA#1:ProcessID,4bytes
3409250
DATA#2:Hexdump,8bytes
0x0A000000003FDF30:000001010000000B ........
2019-07-12-01.15.57.025100+480I15786866A390 LEVEL:Severe
PID :2491092 TID :258 PROC:db2wdog37
INSTANCE:xjbiinst NODE:037
EDUID :258 EDUNAME:db2wdog37
FUNCTION:DB2UDB,basesysutilities,sqleCleanupResources,probe:5
DATA#1:Hexdump,4bytes
0x0A000000003FDE90:00000101 ....
2019-07-12-01.15.57.025519+480I15787257A455 LEVEL:Warning
PID :2491092 TID :258 PROC:db2wdog37
INSTANCE:xjbiinst NODE:037
EDUID :258 EDUNAME:db2wdog37
FUNCTION:DB2UDB,routine_infrastructure,sqlerKillAllFmps,probe:5
MESSAGE:Bringingdownalldb2fmpprocessesaspartofdb2stop
DATA#1:Hexdump,4bytes
0x0A000000003FDC10:00000000
收起