使用openstack创建实例时提示:No valid host was found?

(环境是centos7+物理机,安装方式:packstack-》allinone)

2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager [req-a79e4003-9882-42e7-950b-266f84f9cdf8 d5c313f02ed3427f8e6630bb7653f5c8 cd72b3875ecc4e05923e2ae0b625012f - default default] Failed to schedule instances: NoValidHost_Remote: No valid host was found.
Traceback (most recent call last):

File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 226, in inner
return func(*args, **kwargs)

File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 139, in select_destinations
raise exception.NoValidHost(reason="")

NoValidHost: No valid host was found.
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager Traceback (most recent call last):
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 1116, in schedule_and_build_instances
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager instance_uuids, return_alternates=True)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 716, in _schedule_instances
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager return_alternates=return_alternates)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/utils.py", line 726, in wrapped
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager return func(*args, kwargs)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/init.py", line 53, in select_destinations
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager instance_uuids, return_objects, return_alternates)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/init.py", line 37, in __run_method
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager return getattr(self.instance, __name)(args, kwargs)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager instance_uuids, return_objects, return_alternates)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 158, in select_destinations
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager return cctxt.call(ctxt, 'select_destinations',
msg_args)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 174, in call
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager retry=self.retry)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 131, in _send
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager timeout=timeout, retry=retry)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager retry=retry)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 550, in _send
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager raise result
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager NoValidHost_Remote: No valid host was found.
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager Traceback (most recent call last):
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 226, in inner
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager return func(
args,
kwargs)
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 139, in select_destinations
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager raise exception.NoValidHost(reason="")
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager NoValidHost: No valid host was found.
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager
2018-05-28 09:56:12.008 11023 ERROR nova.conductor.manager
2018-05-28 09:56:12.089 11023 WARNING nova.scheduler.utils [req-a79e4003-9882-42e7-950b-266f84f9cdf8 d5c313f02ed3427f8e6630bb7653f5c8 cd72b3875ecc4e05923e2ae0b625012f - default default] Failed to compute_task_build_instances: No valid host was found.
Traceback (most recent call last):

File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 226, in inner
return func(*args, **kwargs)

File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 139, in select_destinations
raise exception.NoValidHost(reason="")

NoValidHost: No valid host was found.
: NoValidHost_Remote: No valid host was found.
2018-05-28 09:56:12.092 11023 WARNING nova.scheduler.utils [req-a79e4003-9882-42e7-950b-266f84f9cdf8 d5c313f02ed3427f8e6630bb7653f5c8 cd72b3875ecc4e05923e2ae0b625012f - default default] [instance: 80df0e95-b96e-4cf1-ad69-cb628a679bde] Setting instance to ERROR state.: NoValidHost_Remote: No valid host was found.

1回答

付广平付广平  研发工程师 , 民生银行
金椋qiuhaoshuwangyan_1等赞同了此回答
从错误信息上看是由于Nova调度失败导致错误,建议打开nova-scheduler debug并查看nova-scheduler日志。 OpenStack Nova的经典调度算法是Filter scheduler, 该算法的原理是先通过一系列filters过滤掉不满足资源请求的主机,比如你申请的虚拟机内存为32GB,如果有三个宿主机A、B...显示全部

从错误信息上看是由于Nova调度失败导致错误,建议打开nova-scheduler debug并查看nova-scheduler日志。

OpenStack Nova的经典调度算法是Filter scheduler, 该算法的原理是先通过一系列filters过滤掉不满足资源请求的主机,比如你申请的虚拟机内存为32GB,如果有三个宿主机A、B、C可用内存分别为16GB、32GB、64GB,则RamFilter会过滤掉主机A,剩下B、C。完成所有的filters过滤后,再通过各种weighers计算权重,比如剩余内存越大,分数越高。最后会挑出分数最高的前几名(比如前5),从中随机选择一台主机作为虚拟机候选主机,之所以不是挑最高的一台主机,而是随机从前几名中随机挑一台,这是为了引入随机化因素避免单台主机扎堆。
引入官方文档的一张图片:

Filtering Workflow

Filtering Workflow

更多关于OpenStack Filter scheduler算法可以参考官方文档:Compute schedulers

有了以上基础后,思路就很明了了,我们需要判断是哪个filter导致调度失败的。这在日志中是可以查看的。

比如有如下日志内容:

Filtering removed all hosts for the request with reservation ID 'r-wyf01la7' and instance ID '4035e2b9-1ccc-454e-b5a5-e0b785e78c92'. Filter results: ['RetryFilter: (start: 3, end: 3)', 'AvailabilityZoneFilter: (start: 3, end: 3)', 'RamFilter: (start: 3, end: 3)', 'DiskFilter: (start: 3, end: 0)']
There are 0 hosts available but 1 instances requested to build. select_destinations /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:87

注意查看Filter resultsstartend值,如果start > end,则说明这个filter没有过。以上例子中我们发现DiskFilterstart为3,end为0,说明是由于DiskFilter没有通过导致虚拟机调度失败的,即所有的主机的磁盘空间都不满足虚拟机的请求。

为了确认确实资源不足导致调度失败的原因,你可以通过Nova命令查看各个宿主机的资源使用情况:

int32bit #nova hypervisor-list
+--------------------------------------+---------------------+-------+---------+
| ID                                   | Hypervisor hostname | State | Status  |
+--------------------------------------+---------------------+-------+---------+
| eadbbc25-9695-46d4-8553-3b0b9968a6a6 | devstack            | up    | enabled |
int32bit #nova hypervisor-show eadbbc25-9695-46d4-8553-3b0b9968a6a6
+---------------------------+------------------------------------------+
| Property                  | Value                                    |
+---------------------------+------------------------------------------+
| cpu_info_arch             | x86_64                                   |
| cpu_info_features         | ["pge", "avx", "clflush", "sep",         |
|                           | "syscall", "vme", "tsc", "xsave",        |
|                           | "cmov", "fpu", "pat", "lm", "msr",       |
|                           | "3dnowprefetch", "nx", "fxsr", "sse4.1", |
|                           | "pae", "sse4.2", "pclmuldq", "mmx",      |
|                           | "osxsave", "cx8", "mce", "de", "rdtscp", |
|                           | "ht", "pse", "pni", "abm", "rdseed",     |
|                           | "popcnt", "mca", "apic", "sse",          |
|                           | "invtsc", "lahf_lm", "aes", "sse2",      |
|                           | "hypervisor", "ssse3", "cx16", "pse36",  |
|                           | "mtrr", "movbe", "rdrand"]               |
| cpu_info_model            | Westmere                                 |
| cpu_info_topology_cells   | 1                                        |
| cpu_info_topology_cores   | 2                                        |
| cpu_info_topology_sockets | 1                                        |
| cpu_info_topology_threads | 1                                        |
| cpu_info_vendor           | Intel                                    |
| current_workload          | 0                                        |
| disk_available_least      | 2                                        |
| free_disk_gb              | 17                                       |
| free_ram_mb               | 3439                                     |
| host_ip                   | 10.0.2.15                                |
| hypervisor_hostname       | devstack                                 |
| hypervisor_type           | QEMU                                     |
| hypervisor_version        | 2008000                                  |
| id                        | eadbbc25-9695-46d4-8553-3b0b9968a6a6     |
| local_gb                  | 17                                       |
| local_gb_used             | 0                                        |
| memory_mb                 | 3951                                     |
| memory_mb_used            | 512                                      |
| running_vms               | 0                                        |
| service_disabled_reason   | None                                     |
| service_host              | devstack                                 |
| service_id                | 75a4597d-5c5a-4f9b-b9cc-9d85dfdc56b9     |
| state                     | up                                       |
| status                    | enabled                                  |
| vcpus                     | 2                                        |
| vcpus_used                | 0                                        |
+---------------------------+------------------------------------------+

还有一些比较常见的filters导致调度失败:

  • ComputeFilter: 这是由于没有找到compute节点导致调度失败的,一般是由于nova-compute服务down掉或者disalbed导致的,可以使用nova service-list查看所有的nova-compute服务状态。
  • AvailabilityZoneFilter: 这可能是由于AvailabilityZone参数指定错误导致的,如果没有指定该参数,默认的AvailabilityZone为nova,请确认包含计算节点属于nova域。
  • RetryFilter:这是非常经典的错误。Nova有重试机制,当在一个计算节点启动虚拟机失败后,该计算节点会重新向nova-scheduler发起重新调度的请求,默认重试次数为3。换句话说,只有在nova-compute创建虚拟机出错才会导致该错误,直接查看nova-compute日志就可以发现错误原因,比如port绑定失败、下载镜像失败等等。

如果部署的是新版本OpenStack,则还需要结合nova-placement-api日志排查错误原因,比较麻烦的是,如果资源不足,nova-placemnt-api直接就返回0了,在nova-scheduler中可以看到Got no allocation candidates from the Placement API错误,而找不到诸如RamFilterDiskFilter之类的错误了,如下:

2017-05-28 02:25:56.078 23640 DEBUG nova.scheduler.manager [req-c90ffbde-1d48-48dd-8463-ee5516d17017 70828c56f2844a9090d286c29a1fb599 ae21d957967d4df0865411f0389ed7e8 - default default] Got no allocation candidates from the Placement API. This may be a temporary occurrence as compute nodes start up and begin reporting inventory to the Placement service. select_destinations /opt/stack/nova/nova/scheduler/manager.py:133

总之多看日志,实在不行就根据错误栈定位并打断点调试,关于如何使用Python pdb调试OpenStack可以参考我之前的一篇文章如何阅读OpenStack源码

收起
 2018-06-03
浏览6587
ncbh234 邀答

提问者

ncbh234软件开发工程师, ss

问题状态

  • 发布时间:2018-05-28
  • 关注会员:2 人
  • 问题浏览:6828
  • 最近回答:2018-06-03
  • 关于TWT  使用指南  社区专家合作  厂商入驻社区  企业招聘  投诉建议  版权与免责声明  联系我们
    © 2019  talkwithtrend — talk with trend,talk with technologist 京ICP备09031017号-30