在一台zabbix被监控服务器上启动zabbix_agent,发现进程无法启动,10050端口没有起来?

在一台zabbix被监控服务器上(64位centos6.8系统,64G内容)启动zabbix_agent,发现进程无法启动,10050端口没有起来!

启动zabbix_agent进程没有报错,但10050端口没有正常启动起来。
[root@ctl ~]# /usr/local/zabbix/sbin/zabbix_agentd
[root@ctl ~]# ps -ef|grep zabbix_agent
root 27506 27360 0 11:07 pts/5 00:00:00 grep --color zabbix
[root@ctl etc]# lsof -i:10050

查看/usr/local/zabbix/logs/zabbix_agentd.log日志,发现报错如下:
................
27667:20161027:111554.851 cannot allocate shared memory of size 657056: [28] No space left on device
27667:20161027:111554.851 cannot allocate shared memory for collector
..............

2回答

zhuhaiqiangzhuhaiqiang  项目经理 , 银行
yinxinchpps2000赞同了此回答
这是因为内核对share memory的限制造成的。 处理过程记录:[root@ctl logs]# ipcs -l ------ Shared Memory Limits --------max number of segments = 4096max seg size (kbytes) = 1940588max total shared memory (kbytes) = 8388608min seg size (bytes) = 1 ------ Se...显示全部

这是因为内核对share memory的限制造成的。

处理过程记录:
[root@ctl logs]# ipcs -l

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 1940588
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767

------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536

从上面命令结果可以看到:
max total shared memory设置的是2M,max seg size设置的是8M,这显然不够allocate(分配)zabbix_agent启动所使用的内存。

查看目前的共享内存设置,
[root@ctl logs]# sysctl -a|grep shm
kernel.shmmax = 1987162112
kernel.shmall = 2097152
kernel.shmmni = 4096
kernel.shm_rmid_forced = 0
vm.hugetlb_shm_group = 0

其中kernel.shmall代表总共能分配的共享内存,这里是2G,kernel.shmax代表单个段能allocate的内存(以字节为单位),这里是2M,所以肯定有问题!

然后查看/etc/sysctl.conf
[root@ctl logs]# cat /etc/sysctl.conf
........
kernel.shmall = 2097152
kernel.shmmax = 1987162112

显然在sysctl.conf文件里设置的kernel.shamll和kernel.shmmax参数的值小了。


本机是64位的centos 6.8系统,64G内存,查看其它同系统的被监控服务器发现:
[root@bastion-IDC ~]# cat /etc/sysctl.conf
........
kernel.shmmax = 68719476736
kernel.shmall = 4294967296

[root@ctl logs]# ipcs -l

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767

------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536

即64位的centos6系统(64G)的上面两个参数的默认值是64G和4G,设置的都是系统能识别的最大内存。

现在只需要在本机调大这两个参数值即可解决问题!
[root@ctl logs]# cat /etc/sysctl.conf
........
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.msgmnb = 65536
kernel.msgmax = 65536

执行sysctl -p生效
[root@ctl logs]# sysctl -p

再次查看发现已经修改成功了!
[root@ctl logs]# sysctl -a|grep shm
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.shmmni = 4096
kernel.shm_rmid_forced = 0
vm.hugetlb_shm_group = 0
[root@ctl logs]# ipcs -l

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767

------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536

最后重新启动zabbix,发现10050端口顺利启动了:
[root@ctl ~]# /usr/local/zabbix/sbin/zabbix_agentd
[root@ctl logs]# ps -ef|grep zabbix
zabbix 27776 1 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd
zabbix 27777 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: collector [idle 1 sec]
zabbix 27778 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection]
zabbix 27779 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection]
zabbix 27780 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection]
zabbix 27781 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
root 28188 27360 0 11:48 pts/5 00:00:00 grep --color zabbix
[root@ctl logs]# lsof -i:10050
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
zabbix_ag 27776 zabbix 4u IPv4 112357384 0t0 TCP :zabbix-agent (LISTEN)
zabbix_ag 27777 zabbix 4u IPv4 112357384 0t0 TCP
:zabbix-agent (LISTEN)
zabbix_ag 27778 zabbix 4u IPv4 112357384 0t0 TCP :zabbix-agent (LISTEN)
zabbix_ag 27779 zabbix 4u IPv4 112357384 0t0 TCP
:zabbix-agent (LISTEN)
zabbix_ag 27780 zabbix 4u IPv4 112357384 0t0 TCP :zabbix-agent (LISTEN)
zabbix_ag 27781 zabbix 4u IPv4 112357384 0t0 TCP
:zabbix-agent (LISTEN)
[root@ctl logs]#

总结:
其实不止是zabbix程序启动会碰到这个问题,很多程序出现此错误也能使用该方法解决,就是因为内核对资源的限制问题。

收起
 2020-01-08
浏览287
priestpriest  系统架构师 , None
zhuhaiqiang赞同了此回答
上面已经提示: [28] No space left on device 释放磁盘空间试下显示全部

上面已经提示: [28] No space left on device
释放磁盘空间试下

收起
 2020-01-16
浏览229

提问者

泊涯系统测试工程师, 高伟达公司

问题状态

  • 发布时间:2020-01-08
  • 关注会员:3 人
  • 问题浏览:844
  • 最近回答:2020-01-16