硬件生产故障小机

小机蹊跷故障,特别蹊跷,特别蹊跷

7月30日客户小机磁盘报错,rootvg中的镜像盘hdisk1状态MISSING,且LV不同步,具体信息如下: # lsvg rootvgVOLUME GROUP: rootvgVG IDENTIFIER:000c861c0000d60000000115133343e9VG STATE: activePP SIZE:128 megabyte(s)VG PERMISSION: read/writeTOTAL PPs:1092 (139776 megab...显示全部
7月30日客户小机磁盘报错,rootvg中的镜像盘hdisk1状态MISSING,且LV不同步,具体信息如下:

# lsvg rootvg

VOLUME GROUP: rootvg
VG IDENTIFIER:
000c861c0000d60000000115133343e9

VG STATE: active
PP SIZE:
128 megabyte(s)

VG PERMISSION: read/write
TOTAL PPs:
1092 (139776 megabytes)

MAX LVs:
256
FREE PPs:
268 (34304 megabytes)

LVs: 12
USED PPs:
824 (105472 megabytes)

OPEN LVs: 11

QUORUM:
1

TOTAL PVs: 2
VG DESCRIPTORS: 3

STALE PVs: 1

STALE PPs:
61

ACTIVE PVs: 1
AUTO ON:
yes

MAX PPs per VG:
32512

MAX PPs per PV:
1016
MAX PVs:
32

LTG size (Dynamic): 256 kilobyte(s)
AUTO SYNC:
no

HOT SPARE: no
BB POLICY:
relocatable

# lsvg -p rootvg

rootvg:

PV_NAME
PV STATE
TOTAL PPs
FREE PPs
FREE DISTRIBUTION

hdisk0
missing
546
134
25..00..00..00..109

hdisk1
active
546

134
45..03..00..00..86

# lsvg -l rootvg

rootvg:

LV NAME
TYPE
LPs
PPs
PVs
LV STATE
MOUNT POINT

hd5
boot
1
2
2
closed/syncd
N/A

hd6
paging
32
64
2
open/syncd
N/A

hd8
jfs2log
1
2
2
open/stale
N/A

hd4
jfs2
16
32
2
open/stale
/

hd2
jfs2
160
320
2
open/stale
/usr

hd9var
jfs2
16
32
2
open/stale
/var

hd3
jfs2
64
128
2
open/stale
/tmp

hd1
jfs2
40
80
2
open/syncd
/home

hd10opt
jfs2
64
128
2
open/syncd
/opt

fwdump
jfs2
16

32
2
open/syncd
/var/adm/ras/platf

orm

loglv00
jfslog
1
2
2
open/syncd
N/A

lv00
jfs
1
2
2
open/syncd
/var/adm/csd

# lspv -l hdisk0

hdisk0:

LV NAME
LPs
PPs
DISTRIBUTION
MOUNT POINT

hd5
1
1
01..00..00..00..00
N/A

hd6
32
32
28..04..00..00..00
N/A

hd8
1
1
00..00..01..00..00
N/A

hd4
16
16
00..00..01..15..00
/

hd2
160
160
41..16..64..39..00
/usr

hd9var
16
16
00..00..01..15..00
/var

hd3
64
64
00..23..40..01..00
/tmp

hd1
40
40
00..00..01..39..00
/home

hd10opt
64
64
00..63..01..00..00
/opt

loglv00
1
1
00..01..00..00..00
N/A

fwdump
16
16
15..01..00..00..00
/var/adm/ras/platform

lv00
1
1
00..01..00..00..00
/var/adm/csd

# lspv -l hdisk1

hdisk1:

LV NAME
LPs
PPs
DISTRIBUTION
MOUNT POINT

hd5
1
1
01..00..00..00..00
N/A

hd6
32
32
00..32..00..00..00
N/A

hd8
1
1
00..00..01..00..00
N/A

hd4
16
16
00..00..16..00..00
/

hd2
160
160
00..00..92..68..00
/usr

hd9var
16
16
00..16..00..00..00
/var

hd3
64
64
64..00..00..00..00
/tmp

hd1
40
40
00..40..00..00..00
/home

hd10opt
64
64
00..00..00..41..23
/opt

loglv00
1
1
00..01..00..00..00
N/A

fwdump
16
16
00..16..00..00..00
/var/adm/ras/platform

lv00
1
1
00..01..00..00..00
/var/adm/csd

# lsdev -Cc disk

hdisk0 Available 03-08-00-5,0 16 Bit LVD SCSI Disk Drive

hdisk1 Available 03-08-00-8,0 16 Bit LVD SCSI Disk Drive

hdisk2 Available 05-08-01
1722-600 (600) Disk Array Device


错误日志:


LABEL:          DISK_ERR4
IDENTIFIER:     49A83216

Date/Time:       Sat Jul 30 14:41:53 BEIST 2011
Sequence Number: 11206
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            TEMP
Resource Name:   hdisk1         
Resource Class:  disk
Resource Type:   scsd
Location:        U787B.001.DNWC063-P1-T14-L4-L0
VPD:            
        Manufacturer................IBM   H0
        Machine Type and Model......HUS103073FL3800
        FRU Number..................03N5262     
        ROS Level and ID............52505152
        Serial Number...............00594D14
        EC Level....................H17923D   
        Part Number.................26K5573     
        Device Specific.(Z0)........000004129F00013E
        Device Specific.(Z1)........RPQR        
        Device Specific.(Z2)........0068
        Device Specific.(Z3)........06283
        Device Specific.(Z4)........0001
        Device Specific.(Z5)........22
        Device Specific.(Z6)........H17923D   
        Device Specific.(BR)........H0

Description
DISK OPERATION ERROR

Probable Causes
MEDIA
DASD DEVICE

User Causes
MEDIA DEFECTIVE

Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES

Failure Causes
MEDIA
DISK DRIVE

Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
PATH ID
           0
SENSE DATA
0A04 0000 2A00 0370 11A0 0000 0800 0000 0102 0000 7000 0B00 0000 0018 0000 0000
4703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 00CD 0000
---------------------------------------------------------------------------
LABEL:          LVM_IO_FAIL
IDENTIFIER:     E86653C3

Date/Time:       Sat Jul 30 14:41:46 BEIST 2011
Sequence Number: 11205
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            PERM
Resource Name:   LVDD            
Resource Class:  NONE
Resource Type:   NONE
Location:        

Description
I/O ERROR DETECTED BY LVM

Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE

Recommended Actions
RUN DIAGNOSTICS AGAINST THE FAILING DEVICE

Detail Data
PHYSICAL VOLUME DEVICE MAJOR/MINOR
8000 0016 0000 0000
ERROR CODE AS DEFINED IN sys/errno.h
           5
BLOCK NUMBER
              53219736
LOGICAL VOLUME DEVICE MAJOR/MINOR
8000 0032 0000 0006
PHYSICAL BUFFER TRANSACTION TIME
                     1
RESIDUAL COUNT
                  4096
NUMBER OF BLOCKS
                  4096
I/O TYPE
USER DATA     
SENSE DATA
0000 0000 0001 9608 000C 861C 0000 D600 0000 0131 79C5 FEC7 000C 861C 9091 4E0D
0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL:          DISK_ERR2
IDENTIFIER:     16F35C72

Date/Time:       Sat Jul 30 14:41:46 BEIST 2011
Sequence Number: 11204
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            PERM
Resource Name:   hdisk1         
Resource Class:  disk
Resource Type:   scsd
Location:        U787B.001.DNWC063-P1-T14-L4-L0
VPD:            
        Manufacturer................IBM   H0
        Machine Type and Model......HUS103073FL3800
        FRU Number..................03N5262     
        ROS Level and ID............52505152
        Serial Number...............00594D14
        EC Level....................H17923D   
        Part Number.................26K5573     
        Device Specific.(Z0)........000004129F00013E
        Device Specific.(Z1)........RPQR        
        Device Specific.(Z2)........0068
        Device Specific.(Z3)........06283
        Device Specific.(Z4)........0001
        Device Specific.(Z5)........22
        Device Specific.(Z6)........H17923D   
        Device Specific.(BR)........H0

Description
DISK OPERATION ERROR

Probable Causes
DASD DEVICE

Failure Causes
DISK DRIVE
DISK DRIVE ELECTRONICS

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
PATH ID
           0
SENSE DATA
0A04 0000 2A00 032C 1198 0000 0800 0000 0102 0000 7000 0B00 0000 0018 0000 0000
4703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 00A6 0009

Diagnostic Analysis
Diagnostic Log sequence number: 117
Resource tested: hdisk1
Resource Description: 16 Bit LVD SCSI Disk Drive
Location:  U787B.001.DNWC063-P1-T14-L4-L0
SRN:   2643-140
Description:  Error log analysis indicates poor signal quality.
Possible FRUs:
    n/a              FRU: n/a                  n/a            
                     SCSI cable/backplane
    sisscsia0        FRU: 10N6472              U787B.001.DNWC063-P1
                     PCI-X Dual Channel Ultra320 SCSI Adapter
    hdisk1           FRU: 03N5262            
    U787B.001.DNWC063-P1-T14-L4-L0
                     16 Bit LVD SCSI Disk Drive

首先怀疑磁盘故障,通过rmlvcopy删除hdisk1的LV,然后将hdisk1从rootvg中移除,然后删除hdisk1,拔盘,插新盘,新盘正常识别,这时准备将新盘加到rootvg中,并重新做mirror,但是新盘加入rootvg报错,大概意思是I/O错误,加入失败,强制加入也出此提示,然后在新盘上试验创建VG,也不能成功,报错信息仍然是I/O错误。由于盘是新的肯定没有VG信息和数据,因此确认盘应该没有问题,又因为新盘与原故障盘插入的槽位一样,因此估计会不会是磁盘背板故障,为了验证,又将新盘换槽位插入,该机背板是4槽非镜像背板,左数第二槽被没有报错的hdisk0占用,其他三个槽位均为空槽,试验结果是,无论盘插到剩余的哪个槽,在创建vg或加入vg时都会报I/O错误,操作失败。因此估计是背板问题,幸好客户有一台同型号机器,可以停机几天,为了应急将该机背板更换到故障机,起机后用diag诊断,诊断到磁盘的时候报磁盘path miss建议做磁盘校验,于是校验磁盘后再diag顺利通过,这时磁盘操作也能成功,那就重新做了rootvg的镜像,至此该机本地磁盘环境仍然为hdisk0和hdisk1组成rootvg,互做镜像。


蹊跷的问题发生在7月31日

7月31日早晨客户反映小机反映速度特慢,利用topas查看hdisk1磁盘使用率100%,hdisk0磁盘使用率为0,感到现场查看,发现hdisk0状态为missing,而错误日志中却没有hdisk0的报错,全是hdisk1的报错。磁盘状态及错误日志如下:

# lsvg rootvg

VOLUME GROUP: rootvg
VG IDENTIFIER:
000c861c0000d60000000115133343e9

VG STATE: active
PP SIZE:
128 megabyte(s)

VG PERMISSION: read/write
TOTAL PPs:
1092 (139776 megabytes)

MAX LVs:
256
FREE PPs:
268 (34304 megabytes)

LVs: 12
USED PPs:
824 (105472 megabytes)

OPEN LVs: 11

QUORUM:
1

TOTAL PVs: 2
VG DESCRIPTORS: 3

STALE PVs: 1
STALE PPs:
61

ACTIVE PVs: 1
AUTO ON:
yes

MAX PPs per VG:
32512

MAX PPs per PV:
1016
MAX PVs:
32

LTG size (Dynamic): 256 kilobyte(s)
AUTO SYNC:
no

HOT SPARE: no
BB POLICY:
relocatable

# lsvg -p rootvg

rootvg:

PV_NAME
PV STATE
TOTAL PPs
FREE PPs
FREE DISTRIBUTION

hdisk0
active
546
134
25..00..00..00..109

hdisk1
active
546
134
45..03..00..00..86

# lsvg -l rootvg

rootvg:

LV NAME
TYPE
LPs
PPs
PVs
LV STATE
MOUNT POINT

hd5
boot
1
2
2
closed/syncd
N/A

hd6
paging
32
64
2
open/ stale
N/A

hd8
jfs2log
1
2
2
open/stale
N/A

hd4
jfs2
16
32
2
open/stale
/

hd2
jfs2
160
320

2
open/stale
/usr

hd9var
jfs2
16
32
2
open/stale
/var

hd3
jfs2
64
128
2
open/stale
/tmp

hd1
jfs2
40
80
2
open/ stale
/home

hd10opt
jfs2

64
128
2
open/ stale
/opt

fwdump
jfs2
16
32
2
open/ stale
/var/adm/ras/platf

orm

loglv00
jfslog
1
2
2
open/syncd
N/A

lv00
jfs
1
2
2
open/ stale
/var/adm/csd

# lspv -l hdisk0

hdisk0:

LV NAME
LPs
PPs
DISTRIBUTION
MOUNT POINT

hd5
1
1
01..00..00..00..00
N/A

hd6
32
32
28..04..00..00..00
N/A

hd8
1
1

00..00..01..00..00
N/A

hd4
16
16
00..00..01..15..00
/

hd2
160
160
41..16..64..39..00
/usr

hd9var
16
16
00..00..01..15..00
/var

hd3
64
64
00..23..40..01..00
/tmp

hd1
40
40
00..00..01..39..00
/home

hd10opt
64
64
00..63..01..00..00
/opt

loglv00
1
1
00..01..00..00..00
N/A

fwdump
16
16
15..01..00..00..00
/var/adm/ras/platform

lv00
1
1
00..01..00..00..00
/var/adm/csd

# lspv -l hdisk1

hdisk1:

LV NAME
LPs
PPs
DISTRIBUTION
MOUNT POINT

hd5
1
1
01..00..00..00..00
N/A

hd6

32
32
00..32..00..00..00
N/A

hd8
1
1
00..00..01..00..00
N/A

hd4
16
16
00..00..16..00..00
/

hd2
160
160
00..00..92..68..00
/usr

hd9var
16
16
00..16..00..00..00
/var

hd3
64
64
64..00..00..00..00
/tmp

hd1
40
40
00..40..00..00..00
/home

hd10opt
64
64
00..00..00..41..23
/opt

loglv00
1
1
00..01..00..00..00
N/A

fwdump
16
16
00..16..00..00..00
/var/adm/ras/platform

lv00
1
1
00..01..00..00..00
/var/adm/csd

# lsdev -Cc disk

hdisk0 Available 03-08-00-5,0 16 Bit LVD SCSI Disk Drive

hdisk1 Available 03-08-00-8,0 16 Bit LVD SCSI Disk Drive

hdisk2 Available 05-08-01
1722-600 (600) Disk Array Device


错误日志:

LABEL:          DISK_ERR4
IDENTIFIER:     49A83216

Date/Time:       Sun Jul 31 14:06:12 BEIST 2011
Sequence Number: 13222
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            TEMP
Resource Name:   hdisk1         
Resource Class:  disk
Resource Type:   scsd
Location:        U787B.001.DNWC063-P1-T14-L8-L0
VPD:            
        Manufacturer................IBM   H0
        Machine Type and Model......HUS103073FL3800
        FRU Number..................03N5262     
        ROS Level and ID............52505152
        Serial Number...............00594D14
        EC Level....................H17923D   
        Part Number.................26K5573     
        Device Specific.(Z0)........000004129F00013E
        Device Specific.(Z1)........RPQR        
        Device Specific.(Z2)........0068
        Device Specific.(Z3)........06283
        Device Specific.(Z4)........0001
        Device Specific.(Z5)........22
        Device Specific.(Z6)........H17923D   
        Device Specific.(BR)........H0

Description
DISK OPERATION ERROR

Probable Causes
MEDIA
DASD DEVICE

User Causes
MEDIA DEFECTIVE

Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES

Failure Causes
MEDIA
DISK DRIVE

Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
PATH ID
           1
SENSE DATA
0A08 0000 2800 03EE C400 0000 4000 0000 0200 0200 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 00E0 0008
---------------------------------------------------------------------------
LABEL:          SCSI_ERR10
IDENTIFIER:     0BA49C99

Date/Time:       Sun Jul 31 14:06:12 BEIST 2011
Sequence Number: 13221
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            TEMP
Resource Name:   scsi0           
Resource Class:  driver
Resource Type:   sisscsi
Location:        U787B.001.DNWC063-P1-T14
VPD:            
        ROM Level.(alterable).......050A008a
        Product Specific.(CC).......570B

Description
SCSI BUS ERROR

Probable Causes
CABLE
CABLE TERMINATOR
DEVICE
ADAPTER

Failure Causes
CABLE LOOSE OR DEFECTIVE
DEVICE
ADAPTER

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLE AND ITS CONNECTIONS

Detail Data
SENSE DATA
0000 0800 1900 0120 0108 0000 0301 0000 050A 008A 0000 0100 570B 0000 0000 0000
0408 0000 0080 00D0 0000 2400 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0107 0000 D000 100E 0107 2800 03EE C400 0000 4000 0000 0000 8028 00D1 4000 0000
0000 0000 0406 0000 00CC DDDD AABB CCCC 0000 2400 4900 2400 0006 5C00 8300 0000
4002 0000 8000 0000 2860 0000 0000 13EE 0000 0000 0000 0000 8C00 000E 0000 0000
0000 0000 2C22 2800 03EE C400 0000 4000 0000 0000 0000 001E 0000 0001 0000 0800
0022 10A4 0000 0000 0000 0000 0000 0412 9F00 013E 4942 4D20 2020 4830 4855 5331
3033 3037 3346 4C33 3830 3020 5250 5152 3030 3539 3444 3134 0000 0000
---------------------------------------------------------------------------
LABEL:          SCSI_ERR10
IDENTIFIER:     0BA49C99

Date/Time:       Sun Jul 31 14:06:12 BEIST 2011
Sequence Number: 13220
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            TEMP
Resource Name:   scsi0           
Resource Class:  driver
Resource Type:   sisscsi
Location:        U787B.001.DNWC063-P1-T14
VPD:            
        ROM Level.(alterable).......050A008a
        Product Specific.(CC).......570B

Description
SCSI BUS ERROR

Probable Causes
CABLE
CABLE TERMINATOR
DEVICE
ADAPTER

Failure Causes
CABLE LOOSE OR DEFECTIVE
DEVICE
ADAPTER

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLE AND ITS CONNECTIONS

Detail Data
SENSE DATA
0000 0800 1900 0120 0108 0000 0301 0000 050A 008A 0000 0100 570B 0000 0000 0000
0408 0000 0080 00D0 0000 7E00 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0107 0000 D000 100E 0107 2800 03EE C400 0000 4000 0000 0000 8028 00B7 4000 0000
0000 0000 0406 0000 00CC DDDD AABB CCCC 0000 7E00 4900 8000 0006 0000 8300 0000
4002 0000 8000 0000 0860 0000 0000 13EE 0000 0000 0000 0000 8C00 000E 0000 0000
0000 0000 2C22 2800 03EE C400 0000 4000 0000 0000 0000 001E 0000 0001 0000 0800
0022 10A4 0000 0000 0000 0000 0000 0412 9F00 013E 4942 4D20 2020 4830 4855 5331
3033 3037 3346 4C33 3830 3020 5250 5152 3030 3539 3444 3134 0000 0000
---------------------------------------------------------------------------
LABEL:          SCSI_ERR10
IDENTIFIER:     0BA49C99

Date/Time:       Sun Jul 31 14:06:12 BEIST 2011
Sequence Number: 13219
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            TEMP
Resource Name:   scsi0           
Resource Class:  driver
Resource Type:   sisscsi
Location:        U787B.001.DNWC063-P1-T14
VPD:            
        ROM Level.(alterable).......050A008a
        Product Specific.(CC).......570B

Description
SCSI BUS ERROR

Probable Causes
CABLE
CABLE TERMINATOR
DEVICE
ADAPTER

Failure Causes
CABLE LOOSE OR DEFECTIVE
DEVICE
ADAPTER

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLE AND ITS CONNECTIONS

Detail Data
SENSE DATA
0000 0800 1900 0120 0108 0000 0301 0000 050A 008A 0000 0100 570B 0000 0000 0000
0408 0000 0080 00D0 0000 3400 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0107 0000 D000 100E 0107 2800 03EE C400 0000 4000 0000 0000 8028 00F1 4000 0000
0000 0000 0406 0000 00CC DDDD AABB CCCC 0000 3400 4900 3400 0006 4C00 8300 0000
4002 0000 8000 0000 2860 0000 0000 13EE 0000 0000 0000 0000 8C00 000E 0000 0000
0000 0000 2C22 2800 03EE C400 0000 4000 0000 0000 0000 001E 0000 0001 0000 0800
0022 10A4 0000 0000 0000 0000 0000 0412 9F00 013E 4942 4D20 2020 4830 4855 5331
3033 3037 3346 4C33 3830 3020 5250 5152 3030 3539 3444 3134 0000 0000
---------------------------------------------------------------------------
LABEL:          SCSI_ERR10
IDENTIFIER:     0BA49C99

Date/Time:       Sun Jul 31 14:06:11 BEIST 2011
Sequence Number: 13218
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            TEMP
Resource Name:   scsi0           
Resource Class:  driver
Resource Type:   sisscsi
Location:        U787B.001.DNWC063-P1-T14
VPD:            
        ROM Level.(alterable).......050A008a
        Product Specific.(CC).......570B

Description
SCSI BUS ERROR

Probable Causes
CABLE
CABLE TERMINATOR
DEVICE
ADAPTER

Failure Causes
CABLE LOOSE OR DEFECTIVE
DEVICE
ADAPTER

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLE AND ITS CONNECTIONS

Detail Data
SENSE DATA
0000 0800 1900 0120 0108 0000 0301 0000 050A 008A 0000 0100 570B 0000 0000 0000
0408 0000 0080 00D0 0000 7E00 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0107 0000 D000 100E 0107 2800 03EE C400 0000 4000 0000 0000 8028 00AE 4000 0000
0000 0000 0406 0000 00CC DDDD AABB CCCC 0000 7E00 0000 0000 0006 1000 8300 0000
4002 0000 8000 0000 0860 0000 0000 13EE 0000 0000 0000 0000 8C00 000E 0000 0000
0000 0000 2422 2800 03EE C400 0000 4000 0000 0000 0000 001E 0000 0001 0000 0800
0022 10A4 0000 0000 0000 0000 0000 0412 9F00 013E 4942 4D20 2020 4830 4855 5331
3033 3037 3346 4C33 3830 3020 5250 5152 3030 3539 3444 3134 0000 0000
---------------------------------------------------------------------------
LABEL:          DISK_ERR4
IDENTIFIER:     49A83216

Date/Time:       Sun Jul 31 14:06:10 BEIST 2011
Sequence Number: 13217
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            TEMP
Resource Name:   hdisk1         
Resource Class:  disk
Resource Type:   scsd
Location:        U787B.001.DNWC063-P1-T14-L8-L0
VPD:            
        Manufacturer................IBM   H0
        Machine Type and Model......HUS103073FL3800
        FRU Number..................03N5262     
        ROS Level and ID............52505152
        Serial Number...............00594D14
        EC Level....................H17923D   
        Part Number.................26K5573     
        Device Specific.(Z0)........000004129F00013E
        Device Specific.(Z1)........RPQR        
        Device Specific.(Z2)........0068
        Device Specific.(Z3)........06283
        Device Specific.(Z4)........0001
        Device Specific.(Z5)........22
        Device Specific.(Z6)........H17923D   
        Device Specific.(BR)........H0

Description
DISK OPERATION ERROR

Probable Causes
MEDIA
DASD DEVICE

User Causes
MEDIA DEFECTIVE

Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES

Failure Causes
MEDIA
DISK DRIVE

Recommended Actions
FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
PATH ID
           1
SENSE DATA
0A08 0000 2800 03FA C880 0000 3800 0000 0200 0200 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 00E0 0007
---------------------------------------------------------------------------
LABEL:          SCSI_ERR10
IDENTIFIER:     0BA49C99

Date/Time:       Sun Jul 31 14:06:10 BEIST 2011
Sequence Number: 13216
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            TEMP
Resource Name:   scsi0           
Resource Class:  driver
Resource Type:   sisscsi
Location:        U787B.001.DNWC063-P1-T14
VPD:            
        ROM Level.(alterable).......050A008a
        Product Specific.(CC).......570B

Description
SCSI BUS ERROR

Probable Causes
CABLE
CABLE TERMINATOR
DEVICE
ADAPTER

Failure Causes
CABLE LOOSE OR DEFECTIVE
DEVICE
ADAPTER

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLE AND ITS CONNECTIONS

Detail Data
SENSE DATA
0000 0800 1900 0120 0108 0000 0301 0000 050A 008A 0000 0100 570B 0000 0000 0000
0408 0000 0080 00D0 0000 6C00 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0107 0000 D000 100E 0107 2800 03FA C880 0000 3800 0000 0000 8028 00D4 4000 0000
0000 0000 0406 0000 00CC DDDD AABB CCCC 0000 6C00 4900 6C68 0006 0398 8300 0000
4002 0000 8000 0000 0860 0000 0000 13EE 0000 0000 0000 0000 8C00 000E 0000 0000
0000 0000 2C22 2800 03FA C880 0000 3800 0000 0000 0000 001E 0000 0001 0000 0800
0022 10A4 0000 0000 0000 0000 0000 0412 9F00 013E 4942 4D20 2020 4830 4855 5331
3033 3037 3346 4C33 3830 3020 5250 5152 3030 3539 3444 3134 0000 0000
---------------------------------------------------------------------------
LABEL:          SCSI_ERR10
IDENTIFIER:     0BA49C99

Date/Time:       Sun Jul 31 14:06:10 BEIST 2011
Sequence Number: 13215
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           H
Type:            TEMP
Resource Name:   scsi0           
Resource Class:  driver
Resource Type:   sisscsi
Location:        U787B.001.DNWC063-P1-T14
VPD:            
        ROM Level.(alterable).......050A008a
        Product Specific.(CC).......570B

Description
SCSI BUS ERROR

Probable Causes
CABLE
CABLE TERMINATOR
DEVICE
ADAPTER

Failure Causes
CABLE LOOSE OR DEFECTIVE
DEVICE
ADAPTER

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLE AND ITS CONNECTIONS

Detail Data
SENSE DATA
0000 0800 1900 0120 0108 0000 0301 0000 050A 008A 0000 0100 570B 0000 0000 0000
0408 0000 0080 00D0 0000 4000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0107 0000 D000 100E 0107 2800 03FA C880 0000 3800 0000 0000 8028 00F3 4000 0000
0000 0000 0406 0000 00CC DDDD AABB CCCC 0000 4000 4900 4000 0006 3000 8300 0000
4002 0000 8000 0000 0860 0000 0000 13EE 0000 0000 0000 0000 8C00 000E 0000 0000
0000 0000 2422 2800 03FA C880 0000 3800 0000 0000 0000 001E 0000 0001 0000 0800
0022 10A4 0000 0000 0000 0000 0000 0412 9F00 013E 4942 4D20 2020 4830 4855 5331
3033 3037 3346 4C33 3830 3020 5250 5152 3030 3539 3444 3134 0000 0000
---------------------------------------------------------------------------
LABEL:          LVM_SA_STALEPP
IDENTIFIER:     EAA3D429

Date/Time:       Sun Jul 31 14:03:49 BEIST 2011
Sequence Number: 13214
Machine Id:      000C861CD600
Node Id:         p55A1
Class:           S
Type:            UNKN
Resource Name:   LVDD            

Description
PHYSICAL PARTITION MARKED STALE

Detail Data
PHYSICAL VOLUME DEVICE MAJOR/MINOR
8000 0016 0000 0001
PHYSICAL PARTITION NUMBER (DECIMAL)
                   225
LOGICAL VOLUME DEVICE MAJOR/MINOR
8000 000A 0000 0008
SENSE DATA
000C 861C 0000 D600 0000 0115 1333 43E9 000C 861C 6239 85B5 0000 0000 0000 0000

由于上午客户觉得该机故障严重,因此需要备份rootvg中的关键数据,于是就没有动,但是由于机器实在太慢,终于自动宕机,

经过重新启动后发现磁盘状态又变了,上午missing的hdisk0变成了active状态,除了lv不同步之外没有其他问题。起机之后客户将关键数据备份出来,在其他机器上搭建环境,用其他机器先把业务顶起来,但是现在这台故障机器我还没有动手去弄,想和大家讨论一下,引起这个故障的原因,及解决办法。

收起
参与6

查看其它 4 个回答zhshen21的回答

zhshen21zhshen21工程师北京万维美思科技有限公司
:LLZ重发一帖吧。这么乱,哪个愿意给你细看啊?工程师,帖子也不会发?
IT分销/经销 · 2011-07-31
浏览2014

回答者

zhshen21
工程师北京万维美思科技有限公司
擅长领域: 服务器lpar小型机

zhshen21 最近回答过的问题

回答状态

  • 发布时间:2011-07-31
  • 关注会员:1 人
  • 回答浏览:2014
  • X社区推广