ITW : Invalid transmission words
一般来 Invalid transmit words 是指光纤网络中的传输错误,一个 transmission word 是一个 40 bit 一组数据,包括 4 个 10-bit 的传输数据,我们知道在光纤网络中使用的是 8b/10b 的编码规则。如果在光纤网络中我们拔插光纤线,可能导致 这个 ITW ( Invalid transmission words )上升,但是如果没有上述动作,链路正常使用的情况下,出现 ITW 持续上升的情况,就需要排查可能是链路本身出现故障了。
MAPS ( Monitoring and Alerting Policy Suite )是一个存储区域网络( SAN )的健康监视器。在 Fabric OS 7.2.0 及以后版本支持。通过 MAPS 可提供健康监控、预防性报警等功能,帮助管理员提前发现可能的故障问题,并做相应处置。
下面是一个实际的案例:
GL8510_001:FID128:admin> switchshow -slot 3 |grep 22
34 3 2 312200 id N16 Online FC F-Port 50:06:0e:80:07:2f:35:40
166 3 22 31a640 id N16 Online FC F-Port 10:00:00:10:9b:58:af:b5
GL8510_001:FID128:admin> nodefind 31a640
Local:
Type Pid COS PortName NodeName SCR
N 31a640; 3;10:00:00:10:9b:58:af:b5;20:00:00:10:9b:58:af:b5; 0x00000003
SCR: Fabric-Detected Nx-Port-Detected
Fabric Port Name: 20:a6:88:94:71:43:b6:47
Permanent Port Name: 10:00:00:10:9b:58:af:b5
Device type: Physical Unknown(initiator/target)
Port Index: 166
Share Area: Yes
Redirect: No
Partial: No
LSAN: No
Slow Drain Device: No
Device link speed: 16G
Connected through AG: No
Real device behind AG: No
FCoE: No
Aliases: EQUHST00005723_H1
可能的故障点是端口 31a640 ,上述命令列出了该端口的一些详细配置信息。
使用 mapsdb -- show all 检查日志信息,结果如下:
GL8510_001:FID128:admin> mapsdb --show all
1 Dashboard Information:
=======================
DB start time: Thu Aug 1 00:36:48 2019
Active policy: pab_cus_policy
Configured Notifications: RASLOG,SNMP,FENCE,SW_CRITICAL,SW_MARGINAL,SFP_MARGINAL
Fenced Ports : None
Decommissioned Ports : None
Fenced circuits : N/A
Quarantined Ports : None
Top Zoned PIDs : 0x3194c0(45) 0x3184c0(45) 0x31a4c0(38) 0x31b4c0(38) 0x311000(29)
2 Switch Health Report:
=======================
Current Switch Policy Status: HEALTHY
3.1 Summary Report:
===================
Category |Today |Last 7 days |
Port Health |No Errors |Out of operating range |
BE Port Health |No Errors |No Errors |
Extension GE Port Health |No Errors |No Errors |
Fru Health |In operating range |Out of operating range |
Security Violations |No Errors |No Errors |
Fabric State Changes |No Errors |Out of operating range |
Switch Resource |In operating range |In operating range |
Traffic Performance |In operating range |In operating range |
Extension Health |Not applicable |Not applicable |
Fabric Performance Impact|Out of operating range |Out of operating range |
3.2 Rules Affecting Health:
===========================
Category(Violation Count)|RepeatCount|Rule Name |Execution Time |Object |Triggered Value(Units)|
Port Health(9) |3 |defALL_OTHER_F_PORTSITW_40 |04/27/22 22:39:53|F-Port 3/22 |65 ITWs |
| | | |F-Port 3/22 |92 ITWs |
| | | |F-Port 3/22 |42 ITWs |
|4 |defALL_OTHER_F_PORTSITW_21 |04/27/22 22:39:53|F-Port 3/22 |65 ITWs |
| | | |F-Port 3/22 |92 ITWs |
| | | |F-Port 3/22 |32 ITWs |
| | | |F-Port 3/22 |25 ITWs |
|2 |defALL_OTHER_F_PORTSITW_21 |04/26/22 01:39:30|F-Port 3/22 |22 ITWs |
| | | |F-Port 3/22 |25 ITWs |
Fru Health(24) |2 |defALL_PORTSSFP_STATE_OUT |04/28/22 22:26:28|U-Port 3/22 |OUT |
| | | |U-Port 3/22 |OUT |
|2 |defALL_PORTSSFP_STATE_IN |04/28/22 22:27:00|U-Port 3/22 |IN |
| | | |U-Port 3/22 |IN |
|2 |defALL_PORTSSFP_STATE_IN |04/27/22 22:28:43|U-Port 3/22 |IN |
| | | |U-Port 3/22 |IN |
|2 |defALL_PORTSSFP_STATE_OUT |04/27/22 22:28:26|U-Port 3/22 |OUT |
| | | |U-Port 3/22 |OUT |
|8 |defALL_PORTSSFP_STATE_IN |04/22/22 19:43:00|U-Port 12/28 |IN |
| | | |U-Port 12/4 |IN |
| | | |U-Port 11/28 |IN |
| | | |U-Port 11/4 |IN |
| | | |U-Port 4/28 |IN |
Fabric State Changes(1) |1 |defSWITCHFLOGI_6 |04/24/22 20:09:30|Switch |8 Logins |
Fabric Performance Impact|2 |Tempr_ALL_HOST_PORTSTX_95 |04/29/22 05:14:46|F-Port 2/33 |95.72 % |
(106) | | | | | |
| | | |F-Port 2/33 |96.88 % |
|1 |Tempr_ALL_HOST_PORTSTX_95 |04/29/22 04:38:39|F-Port 2/33 |100.00 % |
|6 |defALL_PORTS_IO_LATENCY_CLE|04/29/22 04:32:22|E-Port 8/2 |IO_LATENCY_CLEAR |
| |AR | | | |
| | | |E-Port 8/0 |IO_LATENCY_CLEAR |
| | | |E-Port 5/3 |IO_LATENCY_CLEAR |
| | | |E-Port 5/2 |IO_LATENCY_CLEAR |
| | | |E-Port 5/1 |IO_LATENCY_CLEAR |
|5 |defALL_PORTS_IO_PERF_IMPACT|04/29/22 04:30:22|E-Port 8/2 |IO_PERF_IMPACT |
| |_UNQUAR | | | |
| | | |E-Port 5/2 |IO_PERF_IMPACT |
| | | |E-Port 8/0 |IO_PERF_IMPACT |
| | | |E-Port 5/3 |IO_PERF_IMPACT |
| | | |E-Port 5/1 |IO_PERF_IMPACT |
|1 |Tempr_ALL_HOST_PORTSTX_95 |04/29/22 01:02:52|F-Port 4/10 |95.24 % |
4 History Data:
输出的结果较长, 我们重点关注如下结果:
1.Configured Notifications: RASLOG,SNMP,FENCE,SW_CRITICAL,SW_MARGINAL,SFP_MARGINAL
这里显示了 configure 的一下变化, Fence, SW_CRITICAL, SW_MARGINAL, SFP_MARGINAL 从 Fence 状态到 SFP_MARGINAL , SFP 故障
2.Summary Report
我们可以看到过去 7 天的时间内, Port Health , Fru Health, Fabric State Change, Fabric Performance Impact 的状态是 Out of operating range 。 Out of operating range 代表运行状态偏离了正常的可接受范围,出现故障,结合起来看,有较大的可能性端口故障。
3.再看一下 Rules Affecting Health 这一段信息,较详细的列出了出现的故障内容,总体来看 Port Health 这块,主要是 Port 3/22 出现较多 ITWs 的错误,前面有技术器统计的数字,后面还有一些端口的 IN 和 OUT 的状态变化,以及期间的 Performance 的变化,影响,可忽略不计,总台来看,我们认为是端口的 SFP 故障导致出现的 ITWs 报错,以及关联影响到相关的一些报错信息。
综合上述分析,此故障是由端口 3/22 所引起的,通过更换 3/22 端口的 SFP ,并继续观察后,后续再无持续报错,问题得到解决。
如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!
赞3
添加新评论0 条评论