2013年11月12日,晚上收到报警信息,于23点55分的时候登录到数据库服务器查看日志,数据库版本是NT64位 V9.5FP8:
2013-11-12-20.03.36.703000+480 I165895F562 LEVEL: Error
PID : 6036 TID : 764 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : XXXX
APPHDL : 0-36672 APPID: 192.168.100.5.22281.13111120121
AUTHID : DB2ADMIN
EDUID : 764 EDUNAME: db2agent (WZHDB2)
FUNCTION: DB2 UDB, database utilities, DIAG_ERROR, probe:0
DATA #1 : String, 121 bytes
LOADID: 764.2013-11-12-20.02.43.062002.0 (2;6936)
, -2029060031, 0000000000000000, Detected in file:sqluvld.C, Line:8024
2013-11-12-20.03.36.703000+480 I166459F1035 LEVEL: Error
PID : 6036 TID : 764 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : XXXX
APPHDL : 0-36672 APPID: 192.168.100.5.22281.13111120121
AUTHID : DB2ADMIN
EDUID : 764 EDUNAME: db2agent (WZHDB2)
FUNCTION: DB2 UDB, database utilities, sqlulSndEndMsg, probe:8027
MESSAGE : ZRC=0x8015006D=-2146107283=SQLU_CA_BUILT
"SQLCA has already been built"
DATA #1 : LOADID, PD_TYPE_LOADID, 49 bytes
LOADID: 764.2013-11-12-20.02.43.062002.0 (2;6936)
DATA #2 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
sqlcaid : SQLCA sqlcabc: 136 sqlcode: -2044 sqlerrml: 1
sqlerrmc: 3
sqlerrp : SQLUVLD
sqlerrd : (1) 0x8015006D (2) 0x00000000 (3) 0x00000000
(4) 0x00000000 (5) 0x00000000 (6) 0x00000000
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
sqlstate:
2013-11-12-20.03.36.718000+480 I167496F471 LEVEL: Error
PID : 6036 TID : 5360 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000
EDUID : 5360 EDUNAME: db2lbm2
FUNCTION: DB2 UDB, database utilities, DIAG_ERROR, probe:0
DATA #1 : String, 153 bytes
LOADID: 764.2013-11-12-20.02.43.062002.0 (2;6936)
Error writing message to queue. , -2029060031, 0000000000000003, Detected in file:sqluvld.C, Line:16827
2013-11-12-20.03.36.718000+480 I167969F516 LEVEL: Severe
PID : 6036 TID : 764 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : XXXX
APPHDL : 0-36672 APPID: 192.168.100.5.22281.13111120121
AUTHID : DB2ADMIN
EDUID : 764 EDUNAME: db2agent (WZHDB2)
FUNCTION: DB2 UDB, database utilities, sqlulSndEndMsg, probe:8027
MESSAGE : DIA0001E An internal error occurred. Report the following error code
发现从20:03的时候数据库在做load操作的时候发生异常。并将错误信心发送给message queue中。
2013-11-12-20.14.53.343000+480 I389071F597 LEVEL: Error
PID : 6036 TID : 764 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : XXXX
APPHDL : 0-36672 APPID: 192.168.100.5.22281.13111120121
AUTHID : DB2ADMIN
EDUID : 764 EDUNAME: db2agent (WZHDB2)
FUNCTION: DB2 UDB, database utilities, DIAG_ERROR, probe:0
DATA #1 : String, 156 bytes
LOADID: 764.2013-11-12-20.14.53.343000.0 (2;6936)
Failed to lock table and fix TCB , -2147221458, 0000000000000000, Detected in file:sqluTarget.C, Line:3025
2013-11-12-20.14.53.343000+480 I389670F491 LEVEL: Severe
PID : 6036 TID : 764 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : XXXX
APPHDL : 0-36672 APPID: 192.168.100.5.22281.13111120121
AUTHID : DB2ADMIN
EDUID : 764 EDUNAME: db2agent (WZHDB2)
FUNCTION: DB2 UDB, database utilities, sqluRegisterLoadStart, probe:2436
MESSAGE : Load Error: Attempt to lock table/fix tcb unsuccessful
2013-11-12-20.14.53.343000+480 I390163F465 LEVEL: Severe
PID : 6036 TID : 764 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : XXXX
APPHDL : 0-36672 APPID: 192.168.100.5.22281.13111120121
AUTHID : DB2ADMIN
EDUID : 764 EDUNAME: db2agent (WZHDB2)
FUNCTION: DB2 UDB, database utilities, sqluLoadPartition, probe:3568
MESSAGE : Load Error: Error loading table.
2013-11-12-20.14.53.343000+480 I390630F598 LEVEL: Error
PID : 6036 TID : 764 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : XXXX
APPHDL : 0-36672 APPID: 192.168.100.5.22281.13111120121
AUTHID : DB2ADMIN
EDUID : 764 EDUNAME: db2agent (WZHDB2)
FUNCTION: DB2 UDB, database utilities, DIAG_ERROR, probe:0
DATA #1 : String, 157 bytes
LOADID: 764.2013-11-12-20.14.53.343000.0 (2;6936)
Error acquiring partition resources. , -2147221458, 0000000000000000, Detected in file:sqluvtld.C, Line:921
2013-11-12-20.14.53.343000+480 I391230F468 LEVEL: Warning
PID : 6036 TID : 764 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : XXXX
APPHDL : 0-36672 APPID: 192.168.100.5.22281.13111120121
AUTHID : DB2ADMIN
EDUID : 764 EDUNAME: db2agent (WZHDB2)
FUNCTION: DB2 UDB, relation data serv, sqlrrbck, probe:100
MESSAGE : SQLEU_FLAG1_FORCE_RBK is set, doing rollback
在后续日志中查看到由于第一次的load失败导致后续的重启load操作由于无法锁定,从而报错。
2013-11-13-00.20.35.937000+480 I897148F442 LEVEL: Severe
PID : 6036 TID : 3660 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000
EDUID : 3660 EDUNAME: db2ipccm
FUNCTION: DB2 UDB, common communication, sqlccipc_process_conn, probe:1
RETCODE : ZRC=0x870F0041=-2029060031=SQLO_QUE_NOT_SENT "Message Not Sent"
DIA8557C No message was sent using the message queue.
2013-11-13-00.20.35.953000+480 E898846F807 LEVEL: Critical
PID : 6036 TID : 3660 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000
EDUID : 3660 EDUNAME: db2ipccm
FUNCTION: DB2 UDB, base sys utilities, sqle_panic, probe:10
MESSAGE : ADM14001C An unexpected and critical error has occurred: "Panic".
The instance may have been shutdown as a result. "Automatic" FODC
(First Occurrence Data Capture) has been invoked and diagnostic
information has been recorded in directory
"X:DB2PROFSDB2FODC_Panic_2013-11-13-00.20.35.953000". Please look
in this directory for detailed evidence about what happened and
contact IBM support if necessary to diagnose the problem.
在13日00:20时抛出异常,数据库无法获得message queue从而导致数据库down机。
db2ipccm管理数据库的本地连接,netstat的时候发生机器上的tcp连接丢失。
http://www-01.ibm.com/support/docview.wss?uid=swg1IC69885
由于数据库load操作发生异常,进行load抛出的错误消息过多,导致message queue memory不足导致数据库实例down掉。
由于该库已经是9.5fp8版本,大于IBM发出的fp7补丁,因此修改程序指定load CPU_PARALLELISM为1,db2set DB2NTMEMSIZE=QUE:67108864扩大消息队列来避免该问题。
添加新评论0 条评论