forked from cloudfoundry/docs-bosh
-
Notifications
You must be signed in to change notification settings - Fork 0
/
tips.html.md.erb
118 lines (84 loc) · 5.5 KB
/
tips.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: Tips
---
This document lists several common problems. If you are looking for CPI specific errors see:
- [AWS CPI errors](aws-cpi.html#errors)
- [OpenStack CPI errors](openstack-cpi.html#errors)
- [vSphere CPI errors](vsphere-cpi.html#errors)
---
## <a id="unreachable-agent"></a> Timed out pinging to ... after 600 seconds
<pre class="terminal extra-wide">
$ bosh deploy
...
Failed creating bound missing vms > cloud_controller_worker/0: Timed out pinging to 013ce5c9-e7fc-4f1d-ac24 after 600 seconds (00:16:03)
Failed creating bound missing vms > uaa/0: Timed out pinging to b029652d-14c3-4d68-98c7 after 600 seconds (00:16:12)
Failed creating bound missing vms > uaa/0: Timed out pinging to 1f56ddd1-7f2d-4afc-ae43 after 600 seconds (00:16:23)
Failed creating bound missing vms > loggregator_trafficcontroller/0: Timed out pinging to 28790bac-99a2-4703-89ad after 600 seconds (00:16:25)
Failed creating bound missing vms > health_manager/0: Timed out pinging to 720b805b-928c-4bb7-b6dd after 600 seconds (00:16:52)
Failed creating bound missing vms (00:16:53)
Error 450002: Timed out pinging to 013ce5c9-e7fc-4f1d-ac24 after 600 seconds
Task 45 error
For a more detailed error report, run: bosh task 45 --debug
</pre>
This problem can occur due to:
- blocked network connectivity between the Agent on a new VM and NATS (typically the Director VM)
- bootstrapping problem on the VM and/or wrong configuration of the Agent
- blocked or slow boot times of the VM
It's recommended to start a deploy again and SSH into one of the VMs and look at [the Agent logs](job-logs.html#agent-logs) while the Director waits for VMs to become accessible. We are planning to introduce a feature in the Director to leave unreachable VMs for easier debugging.
---
## <a id="failed-job"></a> ...is not running after update
<pre class="terminal extra-wide">
$ bosh deploy
...
Started updating job access_z1 > access_z1/0 (canary)
Done updating job route_emitter_z1 > route_emitter_z1/0 (canary) (00:00:13)
Done updating job cc_bridge_z1 > cc_bridge_z1/0 (canary) (00:00:20)
Done updating job cell_z1 > cell_z1/0 (canary) (00:00:40)
Failed updating job access_z1 > access_z1/0 (canary): `access_z1/0' is not running after update (00:02:13)
Error 400007: `access_z1/0' is not running after update
Task 47 error
For a more detailed error report, run: bosh task 47 --debug
</pre>
This problem occurs when one of the release jobs on a VM did not successfully start in a given amount of time. You can use [`bosh instances --ps`](sysadmin-commands.html#health) command to find out which process on the VM is failing. You can also [access logs](job-logs.html#vm-logs) to view additional information.
This problem may also arise when deployment manifest specifies too small of a [canary/update watch time](deployment-manifest.html#update) which may not be large enough for a process to successfully start.
---
## <a id="blobstore-out-of-space"></a> Running command: bosh-blobstore-dav -c ... 500 Internal Server Error
<pre class="terminal extra-wide">
$ bosh deploy
...
Failed compiling packages > dea_next/3e95ef8425be45468e044c05cc9aa65494281ab5: Action Failed get_task: Task bd35f7c1-2144-4045-763e-40beeafc9fa3 result: Compiling package dea_next: Uploading compiled package: Creating blob in inner blobstore: Making put command: Shelling out to bosh-blobstore-dav cli: Running command: 'bosh-blobstore-dav -c /var/vcap/bosh/etc/blobstore-dav.json put /var/vcap/data/tmp/bosh-platform-disk-TarballCompressor-CompressFilesInDir949066221 cd91a1c5-a034-4c69-4608-6b18cc3fcb2b', stdout: 'Error running app - Putting dav blob cd91a1c5-a034-4c69-4608-6b18cc3fcb2b: Wrong response code: 500; body: <html>
<head><title>500 Internal Server Error</title></head>
<body bgcolor="white">
<center><h1>500 Internal Server Error</h1></center>
<hr><center>nginx</center>
</body>
</html>
', stderr: '': exit status 1 (00:03:16)
</pre>
This problem can occur if the Director is configured to use built-in blobstore and does not have enough space on its persistent disk. You can use `bosh-init` to redeploy the Director with a larger persistent disk. Alternatively you can remove unused releases by running `bosh cleanup` command.
---
## <a id="director-db"></a> Debugging Director database
Rarely it's necessary to dive into the Director DB. The easiest way to do so is to SSH into the Director VM and use `director_ctl console`. For example:
<pre class="terminal">
$ ssh vcap@DIRECTOR-IP
$ /var/vcap/jobs/director/bin/director_ctl console
=> Loading /var/vcap/jobs/director/config/director.yml
=> ruby-debug not found, debugger disabled
=> Welcome to BOSH Director console
=> You can use 'app' to access REST API
=> You can also use 'cloud', 'blobstore', 'nats' helpers to query these services
irb(main):001:0> Bosh::Director::Models::RenderedTemplatesArchive.count
=> 3
</pre>
<p class="note">Note: It's not recommended to modify the Director database via this or other manual methods. Please let us know via GitHub issue if you need a certain feature in the BOSH CLI to do some operation.</p>
---
## <a id="canceled-task"></a> Task X cancelled
<pre class="terminal">
$ bosh deploy
...
Started preparing package compilation > Finding packages to compile. Done (00:00:01)
Started preparing dns > Binding DNS. Done (00:00:05)
Error 10001: Task 106 cancelled
Task 106 cancelled
</pre>
This problem typically occurs if the Director's system time is out of sync, or if the Director machine is underpowered.