转一个哥们的故障日志,属TSM后台DB2数据库故障,引起TSM Server无法正常使用,故障发生了两次。以下为关键时间点的diag日志段;请大神们帮忙诊断下 问题所在:
=======================2012年11月28号
2012-11-28-02.44.51.832182+480 E31751444A416 LEVEL: Error (OS)
PID : 655716 TID : 1 PROC : dsmserv
INSTANCE: tsminst1 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, Client-side (app) latches, sqloxltc_app, probe:15
MESSAGE : ZRC=0x83000016=-2097151978
CALLED : OS, -, unspecified_system_function
OSERR : EINVAL (22) "A system call received a parameter that is not valid."
2012-11-28-02.44.51.832591+480 I31751861A283 LEVEL: Severe
PID : 655716 TID : 1 PROC : dsmserv
INSTANCE: tsminst1 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, base sys utilities, sqleAttachCtx, probe:10
RETCODE : ZRC=0x83000016=-2097151978
2012-11-28-02.44.51.832988+480 E31752145A750 LEVEL: Critical
PID : 655716 TID : 1 PROC : dsmserv
INSTANCE: tsminst1 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, SQO Memory Management, sqloDiagnoseFreeBlockFailure, probe:10
MESSAGE : ADM14001C An unexpected and critical error has occurred: "Panic".
The instance may have been shutdown as a result. "Automatic" FODC
(First Occurrence Data Capture) has been invoked and diagnostic
information has been recorded in directory
"/home/tsminst1/sqllib/db2dump/". Please look in this directory for
detailed evidence about what happened and contact IBM support if
necessary to diagnose the problem.
2012-11-28-02.44.51.841687+480 E31752896A1303 LEVEL: Severe
PID : 655716 TID : 1 PROC : dsmserv
INSTANCE: tsminst1 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, SQO Memory Management, sqloDiagnoseFreeBlockFailure, probe:999
MESSAGE : Memory validation failure, diagnostic file dumped.
DATA #1 : String, 28 bytes
Corrupt pool free tree node.
DATA #2 : File name, 28 bytes
655716.1.mem_diagnostics.txt
CALLSTCK:
[0] 0x0900000001166CD4 pdLog + 0xA8
[1] 0x09000000012D93C0 diagnoseMemoryCorruptionAndCrash__13SQLO_MEM_POOLFUlCPCc + 0x278
[2] 0x09000000012D9070 diagnoseMemoryCorruptionAndCrash__13SQLO_MEM_POOLFUlCPCc@glue3B0 + 0x78
[3] 0x09000000019C7E20 .allocateMemoryBlock.fdpr.clone.0__13SQLO_MEM_POOLFCUlUlT1UiT1PPvPP17SqloChunkSubgroupPP12SMemLostNodeCP12SMemLogEvent + 0x10
[4] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
[5] 0x09000000019DEEDC sqlogmblkEx + 0x604
[6] 0x09000000019AB568 CLI_memAllocFromPool__FP13SQLO_MEM_POOLPPviP19CLI_ERRORHEADERINFOPcT3 + 0xA0
[7] 0x090000000176EAF0 CLI_errAllocateAdditionalErrorBlocks__FP21CLI_ERRORCONTROLBLOCK + 0x28
[8] 0x0900000001895D20 CLI_errStoreError__FiP19CLI_ERRORHEADERINFOlT1Uc + 0x64C
[9] 0x09000000011B6028 SQLFreeStmt2__FP17CLI_STATEMENTINFOsUcT3P5sqlcaP19CLI_ERRORHEADERINFO + 0x174
2012-11-28-02.44.51.842089+480 I31754200A1946 LEVEL: Warning
PID : 655716 TID : 1 PROC : dsmserv
INSTANCE: tsminst1 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, SQO Memory Management, sqlogmblkEx, probe:1000
MESSAGE : ZRC=0x820F0002=-2112946174=SQLO_INV_MEM "Invalid memory addr"
DIA8561C A invalid memory block was encountered.
DATA #1 : String, 43 bytes
Memory management block allocation failure.
DATA #2 : Codepath, 8 bytes
6:20:28
DATA #3 : Memory pool handle pointer, PD_TYPE_MEM_POOL_HANDLE_PTR, 8 bytes
0xdb22fff88fff22b8
DATA #4 : Requested size, PD_TYPE_MEM_REQUESTED_SIZE, 8 bytes
160
DATA #5 : Adjusted block size, PD_TYPE_MEM_ADJUSTED_SIZE, 8 bytes
192
DATA #6 : Options for requested block, PD_TYPE_GET_MEM_OPTIONS, 4 bytes
0x00000000
DATA #7 : Pointer to address that will be set by new allocation, PD_TYPE_PTR_TO_ADDRESS_OUT, 8 bytes
0x0fffffffffffdb60
DATA #8 : File name, PD_TYPE_OSS_MEM_FILE_NAME, 8 bytes
clierr.C
DATA #9 : Line of code, PD_TYPE_OSS_MEM_LINE_NUM, 8 bytes
743
DATA #10: Resource binding pointer, PD_TYPE_RESOURCE_BINDING_PTR, 8 bytes
0x0000000000000000
CALLSTCK:
[0] 0x0900000001166CD4 pdLog + 0xA8
[1] 0x0900000001878B9C pdLog@glue22F + 0x254
[2] 0x090000000111D2B0 sqlogmblkEx + 0x5A8
[3] 0x09000000019AB568 CLI_memAllocFromPool__FP13SQLO_MEM_POOLPPviP19CLI_ERRORHEADERINFOPcT3 + 0xA0
[4] 0x090000000176EAF0 CLI_errAllocateAdditionalErrorBlocks__FP21CLI_ERRORCONTROLBLOCK + 0x28
[5] 0x0900000001895D20 CLI_errStoreError__FiP19CLI_ERRORHEADERINFOlT1Uc + 0x64C
[6] 0x09000000011B6028 SQLFreeStmt2__FP17CLI_STATEMENTINFOsUcT3P5sqlcaP19CLI_ERRORHEADERINFO + 0x174
[7] 0x090000000180C4C8 CLI_allocCacheStmt__FP15CLI_CONNECTINFOP17CLI_STATEMENTINFOP19CLI_ERRORHEADERINFO + 0x8C
[8] 0x090000000180C28C CLI_allocCacheStmt__FP15CLI_CONNECTINFOP17CLI_STATEMENTINFOP19CLI_ERRORHEADERINFO@glueD4 + 0x7C
[9] 0x09000000017CE150 SQLFreeHandle + 0xFC
2012-11-28-02.46.42.004314+480 I31756147A555 LEVEL: Event
PID : 52625636 TID : 25446 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-59 APPID: *LOCAL.DB2.121120105450
AUTHID : TSMINST1
EDUID : 25446 EDUNAME: db2stmm (TSMDB1) 0
FUNCTION: DB2 UDB, access plan manager, sqlra_resize_pckcache, probe:150
CHANGE : APM : Package Cache : FROM "194754396" : TO "155849605" : success
IMPACT : None
DATA #1 : String, 29 bytes
Package Cache Resized (bytes)
2012-11-28-02.46.42.022386+480 I31756703A496 LEVEL: Event
PID : 52625636 TID : 25446 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000 DB : TSMDB1
APPHDL : 0-59 APPID: *LOCAL.DB2.121120105450
AUTHID : TSMINST1
EDUID : 25446 EDUNAME: db2stmm (TSMDB1) 0
FUNCTION: DB2 UDB, config/install, sqlfLogUpdateCfgParam, probe:20
CHANGE : STMM CFG DB TSMDB1: "Pckcachesz" From: "49018"
To: "39226"
====================2012年12月1号 12点50分
2012-12-01-12.51.14.333252+480 I32508705A2657 LEVEL: Error (OS)
PID : 48365582 TID : 13449 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000
EDUID : 13449 EDUNAME: db2pclnr (TSMDB1) 0
FUNCTION: DB2 Common, OSSe, ossErrorIOAnalysis, probe:100
CALLED : OS, -, aio_return
OSERR : ENOSPC (28) "No space left on device"
DATA #1 : String, 132 bytes
A total of 4 analysis will be performed :
- User info
- ulimit info
- Target file info
- File system
Target file handle = 419
DATA #2 : String, 188 bytes
Real user ID of current process = 1001
Effective user ID of current process = 1001
Real group ID of current process = 1001
Effective group ID of current process = 1001
DATA #3 : String, 379 bytes
Current process limits (unit in bytes except for nofiles) :
mem (S/H) = unlimited / unlimited
core (S/H) = unlimited / unlimited
cpu (S/H) = unlimited / unlimited
data (S/H) = unlimited / unlimited
fsize (S/H) = unlimited / unlimited
nofiles (S/H) = unlimited / unlimited
stack (S/H) = unlimited / unlimited
rss (S/H) = unlimited / unlimited
DATA #4 : String, 41 bytes
current sbrk(0) value: 0x0000000128153ce0
DATA #5 : String, 267 bytes
Target File Information :
Size = 306578948096
Link = No
Reference path = N/A
Type = 0x8000
Permissions = rw-------
UID = 1001
GID = 1001
Last modified time = 1354337474
DATA #6 : String, 383 bytes
File System Information of the target file :
Block size = 4096 bytes
Total size = 2684354560000 bytes
Free size = 501161984 bytes
Total # of inodes = 112599
FS name = N/A
Mount point = /home/tsminst1
FSID (major,minor)= 5, 2
FS type name = jfs2
DIO/CIO mount opt = None
Device type = N/A
FS type = 0x6
CALLSTCK:
[0] 0x090000000488E268 pdOSSeLoggingCallback + 0x34
[1] 0x0900000000C349E4 oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1C4
[2] 0x0900000000C34EC0 ossLogSysRC + 0xA0
[3] 0x0900000000C5444C ossErrorIOAnalysis__FCPC21OSSErrorAnalysisParam + 0xC8C
[4] 0x0900000000C564E8 ossErrorAnalysis@AF12_1 + 0x48
[5] 0x0900000005D75284 sqloSystemErrorHandler + 0x3DC
[6] 0x090000000476A4C0 sqloSystemErrorHandler@glue7CF + 0x124
[7] 0x0900000004793598 sqloCheckIoResults__20SQLO_LIO_HANDLE_DATAFUliP11SQLO_IO_REQ@glue3A6 + 0x298
[8] 0x090000000532F124 sqloLioAIOCollect__20SQLO_LIO_HANDLE_DATAFUlP23SQLO_LIO_COLLECT_STATUSPP11SQLO_IO_REQ13SQLO_LIO_PORT + 0x4C0
[9] 0x0900000008571194 sqloLioCollectNBlocks + 0x108
2012-12-01-12.51.14.333724+480 E32511363A781 LEVEL: Error
PID : 48365582 TID : 11650 PROC : db2sysc 0
INSTANCE: tsminst1 NODE : 000
EDUID : 11650 EDUNAME: db2pclnr (TSMDB1) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbClnrAsyncWriteCompletion, probe:0
MESSAGE : ADM6017E The table space "LGTMPTSP" (ID "7") is full. Detected on
container
"/home/tsminst1/db2data/tsminst1/NODE0000/TSMDB1/T0000007/C0000000.TM
P" (ID "0"). The underlying file system is full or the maximum
allowed space usage for the file system has been reached. It is also
possible that there are user limits in place with respect to maximum
file size and these limits have been reached.