生产中心有几套VIOS环境,正常运行了1-2年.突然出现这种问题,首先想到的是变更。梳理了近期变更操作,近期新部署了PowerVC,VIOS进行了补丁升级。VIOS2.1升级到VIOS2.2.3.
首先,重启vios分区,在内存没有用完前赶紧检查那个进程使用的内存.
排名第一的是vio_daemon,观察了一会发现内存一会就被他占用完了
第二,元凶找到了,vio_daemon到底是干啥的,问问IBM800吧,IBM回复问我收集一下系统信息。
1. #ioslevel
2. /etc/security/limits的输出
反馈后,IBM告诉我,我遇到了bug
vios版本和 /etc/security/limits stack = -1完全符合这个bug特征。
其实这个bug是可以避免的,我们大多数实施AIX的时候,很容易顺手把 /etc/security/limits.都改成-1,在大多数情况下,没啥问题,但是就是在这个版本下就容易遇到这个问题。
default:
fsize = -1
core = -1
cpu = -1
data = -1
rss = -1
stack = -1
nofiles = -1
The problem can be due to a known issue inVIOS 2.2.3.0 thru 2.2.3.3 with vio_daemon having a memory leak that was fixedat 2.2.3.4 with IV64508,or it could be due to incorrect VIOS settings.
Answer
To check your VIOS level, as padmin, run:
$ ioslevel
If your VIOS level is 2.2.3.4 or higher, the problem may be due to the VIOShaving incorrect system settings in /etc/security/limits. If the\"stack\" size is set to \"unlimited\" (stack = -1),this exposes a condition where the system can be allowed to pin as much stackas desired causing vio_daemon to consume a lot of memory.
$ oem_setup_env
# vi /etc/security/limits ->check thedefault stanza
default:
fsize = -1
core = -1
cpu = -1
data = -1
rss = -1
stack = -1
nofiles = -1
In some cases, the issue with vio_daemonconsuming high memory is noticed after a VIOS update to 2.2.3.X. However, aVIOS update will NOT change these settings. It is strongly recommended not tomodify these default values as doing so is known to cause unpredictableresults. Below is an example of the default values:
default:
fsize = 2097151
core = 2097151
cpu = -1
data = 262144
rss = 65536
stack = 65536
nofiles = 2000
To correct the problem change the setting back to \"default\" values.Then reboot the VIOS at your earliest convenience.