Хотя проблема по сути были исправлена в ранних релизах, и жестко закодированные таймауты ожидания готовности блочных криворуким пиностом были устранены, все же хочется напомнить “юным” инсталляторщикам сложных систем вид ошибки, которую они могу встретить и два параметра которые могут помочь.
Итак в лог может вывалиться следующий блок от менеджера Nova на узле виртуализации:
1 2 3 4 5 6 7 8 9 10 11 | 48 2018-01-02 15:37:53.465 28582 ERROR nova.compute.manager [instance: d60edaa9-f3bc-4403-b4bf-db33e17811f7] wait_func(context, volume_id) 49 2018-01-02 15:37:53.465 28582 ERROR nova.compute.manager [instance: d60edaa9-f3bc-4403-b4bf-db33e17811f7] File "/usr/lib/python2.7/site-packages/nova/c 50 te/manager.py", line 1430, in _await_block_device_map_created 51 2018-01-02 15:37:53.465 28582 ERROR nova.compute.manager [instance: d60edaa9-f3bc-4403-b4bf-db33e17811f7] volume_status=volume_status) 52 2018-01-02 15:37:53.465 28582 ERROR nova.compute.manager [instance: d60edaa9-f3bc-4403-b4bf-db33e17811f7] VolumeNotCreated: Volume 02d73f68-5e27-4799-ab9 53 42e4eb50cd did not finish being created even after we waited 198 seconds or 61 attempts. And its status is creating. |
Это уже обсуждали адепты пинотовских “обвязок” – https://bugzilla.redhat.com/show_bug.cgi?id=1019401
Так что самое время воспользоваться этим ответом, как минимум для релизов 2016-2017 годов и далее:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | Lee Yarwood 2015-10-16 12:47:24 EDT (In reply to Andres Toomsalu from comment #12) > Just for feedback: this issue is still very much alive and causing problems > in production deployments with volume (SAN) backends - where volume sizes > are larger than in development environments. Affects heavily backup/snaphot > restoration process - which easily run into timeout limits. (In reply to jwang from comment #15) > I hit this issue again on RHELOSP6. > > 1. > Cinder backend is LVM > > 2. > Glance image virtual size is 110G Hello Dafna, Andres, jwang, Jack, can you confirm which version of nova you are using in your environments? I believe the following change introduced configurables in Juno / RHEL OSP 6 and then Icehouse / RHEL OSP 5 (via 2014.1.4) that can be used here : [juno] Make the block device mapping retries configurable https://review.openstack.org/#/c/102891/ [stable/icehouse] Make the block device mapping retries configurable https://review.openstack.org/#/c/129276/ ~~~ Make the block device mapping retries configurable When booting instances passing in block-device and increasing the volume size, instances can go in to error state if the volume takes longer to create than the hard code value (max_tries(180)/wait_between(1)) set in nova/compute/manager.py def _await_block_device_map_created(self, context, vol_id, max_tries=180, wait_between=1): To fix this, max_retries/wait_between should be made configurable. Looking through the different releases, Grizzly was 30, Havana was 60 , IceHouse is 180. This change adds two configuration options: a) `block_device_allocate_retries` which can be set in nova.conf by the user to configure the number of block device mapping retries. It defaults to 60 and replaces the max_tries argument in the above method. b) `block_device_allocate_retries_interval` which allows the user to specify the time interval between consecutive retries. It defaults to 3 and replaces wait_between argument in the above method. ~~~ |