(环境是centos7+物理机,安装方式:packstack-》allinone)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager [req-a79e4003-9882-42e7-950b-266f84f9cdf8 d5c313f02ed3427f8e6630bb7653f5c8 cd72b3875ecc4e05923e2ae0b625012f - default default] Failed to schedule instances: NoValidHost_Remote: No valid host was found.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 226, in inner
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 139, in select_destinations
raise exception.NoValidHost(reason="")
NoValidHost: No valid host was found.
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager Traceback (most recent call last):
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 1116, in schedule_and_build_instances
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager instance_uuids, return_alternates=True)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 716, in _schedule_instances
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager return_alternates=return_alternates)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/utils.py", line 726, in wrapped
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager return func(args, *kwargs)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 53, in select_destinations
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager instance_uuids, return_objects, return_alternates)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager return getattr(self.instance, __name)(args, *kwargs)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager instance_uuids, return_objects, return_alternates)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 158, in select_destinations
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager return cctxt.call(ctxt, 'select_destinations', **msg_args)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 174, in call
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager retry=self.retry)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 131, in _send
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager timeout=timeout, retry=retry)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager retry=retry)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 550, in _send
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager raise result
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager NoValidHost_Remote: No valid host was found.
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager Traceback (most recent call last):
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 226, in inner
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager return func(args, *kwargs)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 139, in select_destinations
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager raise exception.NoValidHost(reason="")
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager NoValidHost: No valid host was found.
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager
2018-05-28 09:56:12.089 11023 WARNING nova.scheduler.utils [req-a79e4003-9882-42e7-950b-266f84f9cdf8 d5c313f02ed3427f8e6630bb7653f5c8 cd72b3875ecc4e05923e2ae0b625012f - default default] Failed to compute_task_build_instances: No valid host was found.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 226, in inner
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 139, in select_destinations
raise exception.NoValidHost(reason="")
NoValidHost: No valid host was found.
: NoValidHost_Remote: No valid host was found.
2018-05-28 09:56:12.092 11023 WARNING nova.scheduler.utils [req-a79e4003-9882-42e7-950b-266f84f9cdf8 d5c313f02ed3427f8e6630bb7653f5c8 cd72b3875ecc4e05923e2ae0b625012f - default default] [instance: 80df0e95-b96e-4cf1-ad69-cb628a679bde] Setting instance to ERROR state.: NoValidHost_Remote: No valid host was found.
从错误信息上看是由于Nova调度失败导致错误,建议打开nova-scheduler debug并查看nova-scheduler日志。
OpenStack Nova的经典调度算法是Filter scheduler, 该算法的原理是先通过一系列filters过滤掉不满足资源请求的主机,比如你申请的虚拟机内存为32GB,如果有三个宿主机A、B、C可用内存分别为16GB、32GB、64GB,则RamFilter会过滤掉主机A,剩下B、C。完成所有的filters过滤后,再通过各种weighers计算权重,比如剩余内存越大,分数越高。最后会挑出分数最高的前几名(比如前5),从中随机选择一台主机作为虚拟机候选主机,之所以不是挑最高的一台主机,而是随机从前几名中随机挑一台,这是为了引入随机化因素避免单台主机扎堆。
引入官方文档的一张图片:
更多关于OpenStack Filter scheduler算法可以参考官方文档:Compute schedulers
有了以上基础后,思路就很明了了,我们需要判断是哪个filter导致调度失败的。这在日志中是可以查看的。
比如有如下日志内容:
Filtering removed all hosts for the request with reservation ID 'r-wyf01la7' and instance ID '4035e2b9-1ccc-454e-b5a5-e0b785e78c92'. Filter results: ['RetryFilter: (start: 3, end: 3)', 'AvailabilityZoneFilter: (start: 3, end: 3)', 'RamFilter: (start: 3, end: 3)', 'DiskFilter: (start: 3, end: 0)']
There are 0 hosts available but 1 instances requested to build. select_destinations /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:87
注意查看Filter results
的start
和end
值,如果start > end
,则说明这个filter没有过。以上例子中我们发现DiskFilter
的start
为3,end
为0,说明是由于DiskFilter
没有通过导致虚拟机调度失败的,即所有的主机的磁盘空间都不满足虚拟机的请求。
为了确认确实资源不足导致调度失败的原因,你可以通过Nova命令查看各个宿主机的资源使用情况:
int32bit #nova hypervisor-list
+--------------------------------------+---------------------+-------+---------+
| ID | Hypervisor hostname | State | Status |
+--------------------------------------+---------------------+-------+---------+
| eadbbc25-9695-46d4-8553-3b0b9968a6a6 | devstack | up | enabled |
int32bit #nova hypervisor-show eadbbc25-9695-46d4-8553-3b0b9968a6a6
+---------------------------+------------------------------------------+
| Property | Value |
+---------------------------+------------------------------------------+
| cpu_info_arch | x86_64 |
| cpu_info_features | ["pge", "avx", "clflush", "sep", |
| | "syscall", "vme", "tsc", "xsave", |
| | "cmov", "fpu", "pat", "lm", "msr", |
| | "3dnowprefetch", "nx", "fxsr", "sse4.1", |
| | "pae", "sse4.2", "pclmuldq", "mmx", |
| | "osxsave", "cx8", "mce", "de", "rdtscp", |
| | "ht", "pse", "pni", "abm", "rdseed", |
| | "popcnt", "mca", "apic", "sse", |
| | "invtsc", "lahf_lm", "aes", "sse2", |
| | "hypervisor", "ssse3", "cx16", "pse36", |
| | "mtrr", "movbe", "rdrand"] |
| cpu_info_model | Westmere |
| cpu_info_topology_cells | 1 |
| cpu_info_topology_cores | 2 |
| cpu_info_topology_sockets | 1 |
| cpu_info_topology_threads | 1 |
| cpu_info_vendor | Intel |
| current_workload | 0 |
| disk_available_least | 2 |
| free_disk_gb | 17 |
| free_ram_mb | 3439 |
| host_ip | 10.0.2.15 |
| hypervisor_hostname | devstack |
| hypervisor_type | QEMU |
| hypervisor_version | 2008000 |
| id | eadbbc25-9695-46d4-8553-3b0b9968a6a6 |
| local_gb | 17 |
| local_gb_used | 0 |
| memory_mb | 3951 |
| memory_mb_used | 512 |
| running_vms | 0 |
| service_disabled_reason | None |
| service_host | devstack |
| service_id | 75a4597d-5c5a-4f9b-b9cc-9d85dfdc56b9 |
| state | up |
| status | enabled |
| vcpus | 2 |
| vcpus_used | 0 |
+---------------------------+------------------------------------------+
还有一些比较常见的filters导致调度失败:
nova service-list
查看所有的nova-compute服务状态。如果部署的是新版本OpenStack,则还需要结合nova-placement-api日志排查错误原因,比较麻烦的是,如果资源不足,nova-placemnt-api直接就返回0了,在nova-scheduler中可以看到Got no allocation candidates from the Placement API
错误,而找不到诸如RamFilter
、DiskFilter
之类的错误了,如下:
2017-05-28 02:25:56.078 23640 DEBUG nova.scheduler.manager [req-c90ffbde-1d48-48dd-8463-ee5516d17017 70828c56f2844a9090d286c29a1fb599 ae21d957967d4df0865411f0389ed7e8 - default default] Got no allocation candidates from the Placement API. This may be a temporary occurrence as compute nodes start up and begin reporting inventory to the Placement service. select_destinations /opt/stack/nova/nova/scheduler/manager.py:133
总之多看日志,实在不行就根据错误栈定位并打断点调试,关于如何使用Python pdb调试OpenStack可以参考我之前的一篇文章如何阅读OpenStack源码。
收起