互联网服务Db2DPF实例宕机

DPF下单个物理节点实例宕机?

环境说明:
该DB2数据库为一个基于AIX小机的DB2DPF环境,包括110,111,112,113,114五个server,其中110为管理节点单独一个物理主机;111-114为数据节点,111和113是从一个物理主机LPAR出来的2个逻辑主机;112和114也是一个物理主机上通过LPAR出来的2个逻辑主机。

故障前操作说明:
周四111和113上更换主机配件(CPU恒压器),需要DB2数据库配合启停。
晚上8点我在110上db2stop关闭实例,接着主机组开始停主机更换配件,期间需要做gpfs启停,mount,unmout等操作。然后晚上十点多结束然后启动主机,主机起来之后我这边db2start启动数据库,数据库起来之后检查所有节点都正常,也能连接数据库,表空间也都正常。到了后半夜一点左右应用端说很多表访问都有sql1299n报错,到现场后发现114主机的node33-node40的8个节点没有了,已停止。然后我db2stop/db2start重启实例,数据库恢复正常。但是不清楚什么原因造成的,怀疑和主机更换配件操作有关系。以下是主要的诊断日志信息,麻烦各位看下帮分析下是什么原因导致的单个节点实例宕机,感谢!
................................
2019-07-12-01.05.02.216277+480I15516677A581     LEVEL:Error
PID    :3409250             TID :14138      PROC:db2sysc37
INSTANCE:xjbiinst            NODE:037        DB  :        
APPHDL :37-420              APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH   
EDUID  :14138               EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,basesysutilities,sqleagnt_sigsegvh,probe:5
MESSAGE:ErrorinagentservicingapplicationwithINBOUNDSEQUENCENUMBER:
DATA#1:Hexdump,2bytes
0x078000000172632C:0000                                      ..

2019-07-12-01.05.02.217085+480I15517259A1223    LEVEL:Error
PID    :3409250             TID :14138      PROC:db2sysc37
INSTANCE:xjbiinst            NODE:037        DB  :        
APPHDL :37-420              APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH   
EDUID  :14138               EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,basesysutilities,sqleagnt_sigsegvh,probe:8
MESSAGE:ErrorinagentservicingapplicationwithAUTHORIZATIONID:
DATA#1:Hexdump,129bytes
0x0780000001725689:42495748202020200000000000000000   BIWH   ........
0x0780000001725699:00000000000000000000000000000000   ................
0x07800000017256A9:00000000000000000000000000000000   ................
0x07800000017256B9:00000000000000000000000000000000   ................
0x07800000017256C9:00000000000000000000000000000000   ................
0x07800000017256D9:00000000000000000000000000000000   ................
0x07800000017256E9:00000000000000000000000000000000   ................
0x07800000017256F9:00000000000000000000000000000000   ................
0x0780000001725709:00                                        .

...............................

2019-07-12-01.05.02.219602+480I15519653A498     LEVEL:Severe
PID    :3409250             TID :14138      PROC:db2sysc37
INSTANCE:xjbiinst            NODE:037        DB  :        
APPHDL :37-420              APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH   
EDUID  :14138               EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,DRDAApplicationServer,sqljsSignalHandler,probe:10
MESSAGE:DIA0505IExecutionofacomponentsignalhandlingfunctionhasbegun.

........................................................

2019-07-12-01.05.08.281697+480I15537945A405     LEVEL:Event
PID    :2098662             TID :1          PROC:db2vend(PDVendorProcess-258
INSTANCE:xjbiinst            NODE:000
EDUID  :1                   EDUNAME:db2vend(PDVendorProcess-258
FUNCTION:DB2UDB,traceservices,pdInvokeCalloutScript,probe:20
STOP   :Completedinvoking/db2home/xjbiinst/sqllib/bin/db2cos_trap

2019-07-12-01.05.08.282258+480I15538351A626     LEVEL:Error
PID    :3409250             TID :14138      PROC:db2sysc37
INSTANCE:xjbiinst            NODE:037        DB  :        
APPHDL :37-420              APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH   
EDUID  :14138               EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,RAS/PDcomponent,pdVendorCallWrapper,probe:285
MESSAGE:ZRC=0x870F0057=-2029060009=SQLO_TIMEOUT"Operationtimedout"
         DIA8578CAtimeoutoccurredwhilewaitingonasemaphore.
DATA#1:String,46bytes
Warning:PDVendorProcesshasreachedtimeout

2019-07-12-01.05.08.283685+480E15538978A1234    LEVEL:Critical
PID    :3409250             TID :14138      PROC:db2sysc37
INSTANCE:xjbiinst            NODE:037        DB  :        
APPHDL :37-420              APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH   
EDUID  :14138               EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,opersystemservices,sqloEDUCodeTrapHandler,probe:90
MESSAGE:ADM14011C Acriticalfailurehascausedthefollowingtypeoferror:
         "Trap".TheDB2databasemanagercannotrecoverfromthefailure.
         FirstOccurrenceDataCapture(FODC)wasinvokedinthefollowing
         mode:"Automatic".FODCdiagnosticinformationislocatedinthe
         followingdirectory:
         "/db2diag/db2dump/FODC_Trap_2019-07-12-01.05.02.015729_0037/".
DATA#1:SignalNumberRecieved,4bytes
11
DATA#2:Siginfo,64bytes
0x0A00000002409DC0:0000000B000000000000003200000000   ...........2....
0x0A00000002409DD0:00000000000000005D28BD9F00000008   ........](......
0x0A00000002409DE0:00000000000000000000000000000000   ................
0x0A00000002409DF0:00000000000000000000000000000000   ................

...........................................

2019-07-12-01.15.56.517541+480I15668263A508     LEVEL:Error
PID    :3670780             TID :66260      PROC:db2sysc33
INSTANCE:xjbiinst            NODE:033        DB  :XJBIDB
APPHDL :0-3550              APPID:10.238.100.181.60854.190711170821
AUTHID :ODS    
EDUID  :66260               EDUNAME:db2agntp(XJBIDB)33
FUNCTION:DB2UDB,bufferdistserv,sqlkdReceiveData,probe:5
RETCODE:ZRC=0x81580016=-2124939242=SQLKD_NODE_FAILURE
         "MappingforSQLKF_NODE_FAILED"

......................................................................

2019-07-12-01.15.56.913498+480I15784791A549     LEVEL:Error
PID    :2622644             TID :52890      PROC:db2sysc34
INSTANCE:xjbiinst            NODE:034        DB  :XJBIDB
APPHDL :0-3374              APPID:10.238.100.180.47058.190711170215
AUTHID :ODS    
EDUID  :52890               EDUNAME:db2agntp(XJBIDB)34
FUNCTION:DB2UDB,databaseutilities,DIAG_ERROR,probe:0
DATA#1:String,114bytes
LOADID:95352.2019-07-12-01.13.21.268758.0(24;4416)
 ,-2124939242,0,Detectedinfile:sqlusSARouter.C,Line:204

2019-07-12-01.15.56.914379+480I15785341A550     LEVEL:Error
PID    :2622644             TID :51348      PROC:db2sysc34
INSTANCE:xjbiinst            NODE:034        DB  :XJBIDB
APPHDL :0-3551              APPID:10.238.100.181.60855.190711170821
AUTHID :ODS    
EDUID  :51348               EDUNAME:db2agntp(XJBIDB)34
FUNCTION:DB2UDB,databaseutilities,DIAG_ERROR,probe:0
DATA#1:String,115bytes
LOADID:64130.2019-07-12-01.09.47.701472.0(10;17085)
 ,-2124939242,0,Detectedinfile:sqlusSARouter.C,Line:204

2019-07-12-01.15.57.022485+480E15785892A548     LEVEL:Severe
PID    :2491092             TID :258        PROC:db2wdog37
INSTANCE:xjbiinst            NODE:037
EDUID  :258                 EDUNAME:db2wdog37
FUNCTION:DB2UDB,basesysutilities,sqleWatchDog,probe:20
MESSAGE:ADM0503C Anunexpectedinternalprocessingerrorhasoccurred.All
         DB2processesassociatedwiththisinstancehavebeenshutdown.
         Diagnosticinformationhasbeenrecorded.ContactIBMSupportfor
         furtherassistance.

2019-07-12-01.15.57.024626+480E15786441A424     LEVEL:Error
PID    :2491092             TID :258        PROC:db2wdog37
INSTANCE:xjbiinst            NODE:037
EDUID  :258                 EDUNAME:db2wdog37
FUNCTION:DB2UDB,basesysutilities,sqleWatchDog,probe:21
DATA#1:ProcessID,4bytes
3409250
DATA#2:Hexdump,8bytes
0x0A000000003FDF30:000001010000000B                       ........

2019-07-12-01.15.57.025100+480I15786866A390     LEVEL:Severe
PID    :2491092             TID :258        PROC:db2wdog37
INSTANCE:xjbiinst            NODE:037
EDUID  :258                 EDUNAME:db2wdog37
FUNCTION:DB2UDB,basesysutilities,sqleCleanupResources,probe:5
DATA#1:Hexdump,4bytes
0x0A000000003FDE90:00000101                                 ....

2019-07-12-01.15.57.025519+480I15787257A455     LEVEL:Warning
PID    :2491092             TID :258        PROC:db2wdog37
INSTANCE:xjbiinst            NODE:037
EDUID  :258                 EDUNAME:db2wdog37
FUNCTION:DB2UDB,routine_infrastructure,sqlerKillAllFmps,probe:5
MESSAGE:Bringingdownalldb2fmpprocessesaspartofdb2stop
DATA#1:Hexdump,4bytes
0x0A000000003FDC10:00000000     

参与5

1同行回答

tongshuaitongshuai数据库工程师北京新数科技有限公司
重点看看这段日志2019-07-12-01.05.08.282258+480I15538351A626 LEVEL:ErrorPID :3409250 TID :14138 PROC:db2sysc37INSTANCE:xjbiinst NODE:037 DB : APPHDL :37-420 APPID:*N37.xjbiinst.19071117050...显示全部

重点看看这段日志

2019-07-12-01.05.08.282258+480I15538351A626 LEVEL:Error
PID :3409250 TID :14138 PROC:db2sysc37
INSTANCE:xjbiinst NODE:037 DB :
APPHDL :37-420 APPID:*N37.xjbiinst.190711170502
AUTHID :BIWH
EDUID :14138 EDUNAME:db2agent(instance)37
FUNCTION:DB2UDB,RAS/PDcomponent,pdVendorCallWrapper,probe:285
MESSAGE:ZRC=0x870F0057=-2029060009=SQLO_TIMEOUT"Operationtimedout"

     DIA8578C Atimeoutoccurredwhilewaitingonasemaphore.

DATA#1:String,46bytes
Warning:PD Vendor Process has reached timeout

从日志来看,数据库在等待系统信号量时出现超时,这个有可能会引起宕库。不过这里的日志信息不多,不能完全确定。

收起
互联网服务 · 2019-07-17
浏览1522
db2haodb 邀答

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2019-07-17
  • 关注会员:2 人
  • 问题浏览:2255
  • 最近回答:2019-07-17
  • X社区推广