Openstack Nova: не удается загрузить экземпляр из образа (создать новый том) из-за тайм-аута для больших образов

Хотя проблема по сути были исправлена в ранних релизах, и жестко закодированные таймауты ожидания готовности блочных криворуким пиностом были устранены, все же хочется напомнить “юным” инсталляторщикам сложных систем вид ошибки, которую они могу встретить и два параметра которые могут помочь.

Итак в лог может вывалиться следующий блок от менеджера Nova на узле виртуализации:

1
2
3
4
5
6
7
8
9
10
11
  48 2018-01-02 15:37:53.465 28582 ERROR nova.compute.manager
  [instance: d60edaa9-f3bc-4403-b4bf-db33e17811f7]     wait_func(context, volume_id)
  49 2018-01-02 15:37:53.465 28582 ERROR nova.compute.manager
  [instance: d60edaa9-f3bc-4403-b4bf-db33e17811f7]  File "/usr/lib/python2.7/site-packages/nova/c
  50 te/manager.py", line 1430, in _await_block_device_map_created
  51 2018-01-02 15:37:53.465 28582 ERROR nova.compute.manager
  [instance: d60edaa9-f3bc-4403-b4bf-db33e17811f7]   volume_status=volume_status)
  52 2018-01-02 15:37:53.465 28582 ERROR nova.compute.manager
  [instance: d60edaa9-f3bc-4403-b4bf-db33e17811f7] VolumeNotCreated: Volume 02d73f68-5e27-4799-ab9
  53 42e4eb50cd did not finish being created even after we waited 198 seconds or 61 attempts.
  And its status is creating.

Это уже обсуждали адепты пинотовских “обвязок” – https://bugzilla.redhat.com/show_bug.cgi?id=1019401

Так что самое время воспользоваться этим ответом, как минимум для релизов 2016-2017 годов и далее:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
 Lee Yarwood 2015-10-16 12:47:24 EDT

(In reply to Andres Toomsalu from comment #12)
> Just for feedback: this issue is still very much alive and causing problems
> in production deployments with volume (SAN) backends - where volume sizes
> are larger than in development environments. Affects heavily backup/snaphot
> restoration process - which easily run into timeout limits.

(In reply to jwang from comment #15)
> I hit this issue again on RHELOSP6.
>
> 1.
> Cinder backend is LVM
>
> 2.
> Glance image virtual size is 110G

Hello Dafna, Andres, jwang, Jack, can you confirm which version of nova you are using in your environments?

I believe the following change introduced configurables in Juno / RHEL OSP 6 and then Icehouse / RHEL OSP 5 (via 2014.1.4) that can be used here :

[juno] Make the block device mapping retries configurable
https://review.openstack.org/#/c/102891/

[stable/icehouse] Make the block device mapping retries configurable
https://review.openstack.org/#/c/129276/

~~~
Make the block device mapping retries configurable

When booting instances passing in block-device and increasing the
volume size, instances can go in to error state if the volume takes
longer to create than the hard code value (max_tries(180)/wait_between(1))
set in nova/compute/manager.py
def _await_block_device_map_created(self,
                                    context,
                                    vol_id,
                                    max_tries=180,
                                    wait_between=1):
To fix this, max_retries/wait_between should be made configurable.
Looking through the different releases, Grizzly was 30, Havana was
60 , IceHouse is 180.
This change adds two configuration options:
a)  `block_device_allocate_retries` which can be set in nova.conf
by the user to configure the number of block device mapping retries.
It defaults to 60 and replaces the max_tries argument in the above method.
b) `block_device_allocate_retries_interval` which allows the user
to specify the time interval between consecutive retries. It defaults to 3
and replaces wait_between argument in the above method.
~~~
Scroll to top