Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(beta4) Restore issue #58

Open
BarthV opened this issue Jan 16, 2015 · 20 comments
Open

(beta4) Restore issue #58

BarthV opened this issue Jan 16, 2015 · 20 comments

Comments

@BarthV
Copy link

BarthV commented Jan 16, 2015

Steps :

  • Install fresh Chef 12.0.1-1
  • Install beta4 (only beta2 is provided in the bundle)
  • Create acme org, admin org-admin and jdoe user (with chef-server-ctl cli)
  • bootstrap some nodes with few cookbooks (2 servers running a chef-server wrapper in my case)
  • Do a backup and save it :-)
  • Trash and install chef-server again with gem beta4
  • Get back and restore previous backup

Restore :

# /opt/opscode/embedded/bin/knife ec restore ./backup
WARNING: No knife configuration file found
WARNING: chef_server_root not found in knife configuration; using chef_server_url
Restoring users ...
WARNING: Skipping pivotal user.  To overwrite pivotal, pass --overwrite-pivotal.
Restoring the org admin data
Restoring the rest of the org
Updated /clients/acme-validator.json
Created /clients/chef-01.json
Created /clients/chef-02.json
Created /clients/acme-validator.json
ERROR: internal server error
Response: internal service error

Result :

  • org is restored
  • users are restored
  • groups/acl are restored
  • clients and validator key are restored
  • the rest is missing (cookbooks, nodes, ...)
@stevendanna
Copy link
Contributor

Thanks for the report, Do you mind running the upload again with the -VV flag. This will produce debugging output that will help determine the problem.

@BarthV
Copy link
Author

BarthV commented Jan 19, 2015

@stevendanna
Copy link
Contributor

@BarthV Thanks for this. Unfortunately, it looks like you only used a single -V. The full debug output we need requires two: -VV. This output should list precisely what API request is failing which will help us dig into this further.

@BarthV
Copy link
Author

BarthV commented Jan 19, 2015

ooops sry.
doing double V right now :)

@BarthV
Copy link
Author

BarthV commented Jan 19, 2015

It's quite long, hope it will help you to see what's going wrong :)

https://gist.github.com/BarthV/066abe770ace36683935

@stevendanna
Copy link
Contributor

@BarthV Hrm, if you attempt the restore with --concurrency 1 set on the command line, does it work? (It will take a longggg time).

@mmzyk
Copy link

mmzyk commented Jan 19, 2015

So, I've got a couple of comments - @BarthV, @stevendanna is asking you to run with concurrency 1 to see if you've hit a race condition on the Chef Server when uploading cookbooks. We've found a few of these when doing backup restores, and while we've fixed one there still seems to be one nagging one still outstanding that we haven't found a reproducible case for. By setting the concurrency to 1 you force cookbook uploads to be serial and avoid the race condition, but as a result it takes a long time. I just wanted to make the reasoning for this clear.

@stevendanna My question for you is, based on the output given above, are we even in the cookbook upload code? The last statement is about clients. I realize this code isn't the best about it's output however, so we could be in the cookbook upload code. Or am I wrong on another front and the concurrency setting applies to more than just cookbook uploads?

@stevendanna
Copy link
Contributor

@mmzyk The last thing I see in the debug log is a sandbox upload. I don't actually see the error but sandbox uploads happen as part of cookbook uploads which is what got me thinking about the concurrency bug.

We definitely need to level up the logging in this tool. I'm not sure if the default Chef logger is actually thread-safe, so hopefully setting concurrency to 1 will also make the output a bit more reasonable.

@BarthV
Copy link
Author

BarthV commented Jan 20, 2015

no changes with --concurrency 1 :

DEBUG: ---- HTTP Request Header Data: ----
DEBUG: Content-Type: application/json
DEBUG: Accept: application/json
DEBUG: Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3
DEBUG: X-OPS-SIGN: algorithm=sha1;version=1.0;
DEBUG: X-OPS-USERID: admin
DEBUG: X-OPS-TIMESTAMP: 2015-01-20T13:09:19Z
DEBUG: X-OPS-CONTENT-HASH: oMRtV6loUDnbKJuGcW6nqBbF8ww=
DEBUG: X-OPS-AUTHORIZATION-1: ux3vix5FUmKOUI65QVckAu6jGYbxCVkz/X3q8Ulwb169InIXIjF2lXDW8H0k
DEBUG: X-OPS-AUTHORIZATION-2: XFUcnEpEStkkHyxR5bfGbljxfDNCmJaENmxBPI66CjOUIj/NtIWKUrNqO1Mp
DEBUG: X-OPS-AUTHORIZATION-3: /XpQt/aGFxm33xALd2VjdFkwaCJOEN9FjJFE7Ks6R2Z+qr4b0UyQOVCZU2qa
DEBUG: X-OPS-AUTHORIZATION-4: biDZc6sbvuoqNConpNFYm5XHcGRbdf+BYb0Y+GfUVgWsym9hVdJ9KGW+tshE
DEBUG: X-OPS-AUTHORIZATION-5: 1CT66BOKUdAuEPSoOmYbK9GOXt5nIp0ueWxHQ+yR0f+EKwZhLjZVYHzirwNJ
DEBUG: X-OPS-AUTHORIZATION-6: uYG3xUOuw6h9pYXHShajytoQAX4S1mNFLXGPXT87eA==
DEBUG: HOST: localhost:443
DEBUG: X-REMOTE-REQUEST-ID: f4e37e67-9b2d-4daf-b24e-3d78a20f24c8
DEBUG: Content-Length: 21
DEBUG: x-ops-request-source: web
DEBUG: ---- End HTTP Request Header Data ----
DEBUG: ---- HTTP Status and Header Data: ----
DEBUG: HTTP 1.1 500 Internal Server Error
DEBUG: server: ngx_openresty/1.4.3.6
DEBUG: date: Tue, 20 Jan 2015 13:09:19 GMT
DEBUG: content-type: application/json
DEBUG: content-length: 36
DEBUG: connection: close
DEBUG: x-ops-api-info: flavor=cs;version=12.0.0;oc_erchef=0.29.4
DEBUG: ---- End HTTP Status/Header Data ----
DEBUG: Chef::HTTP calling Chef::HTTP::ValidateContentLength#handle_response
DEBUG: Content-Length validated correctly.
DEBUG: Chef::HTTP calling Chef::HTTP::RemoteRequestID#handle_response
DEBUG: Chef::HTTP calling Chef::HTTP::Authenticator#handle_response
DEBUG: Chef::HTTP calling Chef::HTTP::Decompressor#handle_response
DEBUG: Chef::HTTP calling Chef::HTTP::CookieManager#handle_response
DEBUG: Chef::HTTP calling Chef::HTTP::JSONToModelOutput#handle_response
DEBUG: Chef::HTTP calling Chef::HTTP::JSONInput#handle_response
INFO: HTTP Request Returned 500 Internal Server Error: internal service error
/opt/opscode/embedded/lib/ruby/1.9.1/net/http.rb:2633:in `error!': 500 "Internal Server Error" (Net::HTTPFatalError)
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/http.rb:143:in `request'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/http.rb:116:in `put'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/cookbook_uploader.rb:105:in `upload_cookbooks'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system/cookbooks_dir.rb:152:in `upload_cookbook!'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system/cookbooks_dir.rb:112:in `block (2 levels) in upload_versioned_cookbook'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system/cookbooks_dir.rb:143:in `with_actual_cookbooks_dir'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system/cookbooks_dir.rb:111:in `block in upload_versioned_cookbook'
    from /opt/opscode/embedded/lib/ruby/1.9.1/tmpdir.rb:83:in `mktmpdir'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system/cookbooks_dir.rb:95:in `upload_versioned_cookbook'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system/cookbooks_dir.rb:74:in `upload_cookbook_from'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system/cookbooks_dir.rb:70:in `create_child_from'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system.rb:303:in `copy_entries'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system.rb:357:in `block in copy_entries'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/parallelizer.rb:95:in `call'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/parallelizer.rb:95:in `process_input'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/parallelizer.rb:70:in `each'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system.rb:422:in `to_a'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system.rb:422:in `parallel_do'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system.rb:356:in `copy_entries'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system.rb:145:in `block in copy_to'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/parallelizer.rb:95:in `call'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/parallelizer.rb:95:in `process_input'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/parallelizer.rb:70:in `each'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system.rb:422:in `to_a'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system.rb:422:in `parallel_do'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/chef_fs/file_system.rb:142:in `copy_to'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/knife-ec-backup-2.0.0.beta.4/lib/chef/knife/ec_restore.rb:219:in `block in upload_org_data'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/knife-ec-backup-2.0.0.beta.4/lib/chef/knife/ec_restore.rb:218:in `each'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/knife-ec-backup-2.0.0.beta.4/lib/chef/knife/ec_restore.rb:218:in `upload_org_data'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/knife-ec-backup-2.0.0.beta.4/lib/chef/knife/ec_restore.rb:50:in `block in run'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/knife-ec-backup-2.0.0.beta.4/lib/chef/knife/ec_restore.rb:124:in `block in for_each_organization'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/knife-ec-backup-2.0.0.beta.4/lib/chef/knife/ec_restore.rb:121:in `foreach'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/knife-ec-backup-2.0.0.beta.4/lib/chef/knife/ec_restore.rb:121:in `for_each_organization'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/knife-ec-backup-2.0.0.beta.4/lib/chef/knife/ec_restore.rb:46:in `run'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/knife.rb:492:in `run_with_pretty_exceptions'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/knife.rb:174:in `run'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/lib/chef/application/knife.rb:135:in `run'
    from /opt/opscode/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.2/bin/knife:25:in `<top (required)>'
    from /opt/opscode/embedded/bin/knife:23:in `load'
    from /opt/opscode/embedded/bin/knife:23:in `<main>'

@stevendanna
Copy link
Contributor

@BarthV Sorry to be a pain, but do you have any more of that log message? A bit further up it should have the URL that you were trying to hit with that request.

Also, if you have access to the server logs, it might be useful to run the chef-server-ctl gather-logs command. This will produce a tarball that you can send to me via email ([email protected])

@BarthV
Copy link
Author

BarthV commented Jan 20, 2015

god ... it just works after a cleanse and an ec restore :/

Destroying my VM and trying it againt "from scratch"

@BarthV
Copy link
Author

BarthV commented Jan 20, 2015

ok after a complete rebuild it fails again :
https://gist.github.com/BarthV/56754e0cd896ef00a928

I'll try to cleanse and reproduce the "good & complete" restore operation that I reach to make earlier.

@BarthV
Copy link
Author

BarthV commented Jan 20, 2015

after a cleanse + reconfigure, ec restore worked well (it was the same backup) :
https://gist.github.com/BarthV/afd4980ef2ff39a2a118

@stevendanna
Copy link
Contributor

@BarthV my hunch is that this is some sort of transient server-side failure occuring when opscode-erchef (the main API service) is talking to bookshelf (an internal service that provides an S3-compatible API for cookbook storage) rather than a bug in knife-ec-backup itself. That said, I'm happy to help look into it further if you would like. I think the next step from here would be to get the full server log dump from directly after the failure.

Since the restore was working for you as of your last update, I also understand if you'd like to drop this for now. In the worst case we have the information you provided here as a reference if we see this pop up again.

@BarthV
Copy link
Author

BarthV commented Jan 22, 2015

I'll try to do more tests and get chef server log if needed when i'll got some time :).

@BarthV
Copy link
Author

BarthV commented Feb 2, 2015

erchef log:

2015-02-02_10:50:02.42232 [error] Checking presence of file (checksum: <<"2236655cd281e2a508a871c9f88db493">>) for org <<"a343f9075b5e74104f240c9d3c73c675">> from bucket "bookshelf" (key: "organization-a343f9075b5e74104f240c9d3c73c675/checksum-2236655cd281e2a508a871c9f88db493") raised exception error:{aws_error,{http_error,403,{[{"server","ngx_openresty/1.4.3.6"},{"date","Mon, 02 Feb 2015 10:50:00 GMT"},{"content-length","658"},{"connection","close"},{"x-amz-request-id","g2gCZAATYm9va3NoZWxmQDEyNy4wLjAuMWgDYgAABY5iAA1W2GIABzUI"},{"x-amz-id-2","g2gCZAATYm9va3NoZWxmQDEyNy4wLjAuMWgDYgAABY5iAA1W2GIABzX+"}],[]}}}
2015-02-02_10:50:02.42234 
2015-02-02_10:50:02.42795 [error] Checking presence of file (checksum: <<"2f704dba870ccb6e4f5ecf55b9a9b5c2">>) for org <<"a343f9075b5e74104f240c9d3c73c675">> from bucket "bookshelf" (key: "organization-a343f9075b5e74104f240c9d3c73c675/checksum-2f704dba870ccb6e4f5ecf55b9a9b5c2") raised exception error:{aws_error,{http_error,403,{[{"server","ngx_openresty/1.4.3.6"},{"date","Mon, 02 Feb 2015 10:50:00 GMT"},{"content-length","658"},{"connection","close"},{"x-amz-request-id","g2gCZAATYm9va3NoZWxmQDEyNy4wLjAuMWgDYgAABY5iAA1W2GIAB0e5"},{"x-amz-id-2","g2gCZAATYm9va3NoZWxmQDEyNy4wLjAuMWgDYgAABY5iAA1W2GIAB0jY"}],[]}}}
2015-02-02_10:50:02.42796 
2015-02-02_10:50:02.43563 [error] Checking presence of file (checksum: <<"a981fb7bca131d68d30fc72460c18d2e">>) for org <<"a343f9075b5e74104f240c9d3c73c675">> from bucket "bookshelf" (key: "organization-a343f9075b5e74104f240c9d3c73c675/checksum-a981fb7bca131d68d30fc72460c18d2e") raised exception error:{aws_error,{http_error,403,{[{"server","ngx_openresty/1.4.3.6"},{"date","Mon, 02 Feb 2015 10:50:00 GMT"},{"content-length","658"},{"connection","close"},{"x-amz-request-id","g2gCZAATYm9va3NoZWxmQDEyNy4wLjAuMWgDYgAABY5iAA1W2GIAB2Tg"},{"x-amz-id-2","g2gCZAATYm9va3NoZWxmQDEyNy4wLjAuMWgDYgAABY5iAA1W2GIAB2Xw"}],[]}}}
2015-02-02_10:50:02.43564 
2015-02-02_10:50:02.43816 [error] Checking presence of file (checksum: <<"5d1856ae1b6b681826c4f6279b0ddab0">>) for org <<"a343f9075b5e74104f240c9d3c73c675">> from bucket "bookshelf" (key: "organization-a343f9075b5e74104f240c9d3c73c675/checksum-5d1856ae1b6b681826c4f6279b0ddab0") raised exception error:{aws_error,{http_error,403,{[{"server","ngx_openresty/1.4.3.6"},{"date","Mon, 02 Feb 2015 10:50:00 GMT"},{"content-length","658"},{"connection","close"},{"x-amz-request-id","g2gCZAATYm9va3NoZWxmQDEyNy4wLjAuMWgDYgAABY5iAA1W2GIAB2x0"},{"x-amz-id-2","g2gCZAATYm9va3NoZWxmQDEyNy4wLjAuMWgDYgAABY5iAA1W2GIAB21T"}],[]}}}
2015-02-02_10:50:02.43817 
2015-02-02_10:50:02.43916 [error] Checking presence of file (checksum: <<"84232213493b05e6e9067eaf82f8f205">>) for org <<"a343f9075b5e74104f240c9d3c73c675">> from bucket "bookshelf" (key: "organization-a343f9075b5e74104f240c9d3c73c675/checksum-84232213493b05e6e9067eaf82f8f205") raised exception error:{aws_error,{http_error,403,{[{"server","ngx_openresty/1.4.3.6"},{"date","Mon, 02 Feb 2015 10:50:00 GMT"},{"content-length","658"},{"connection","close"},{"x-amz-request-id","g2gCZAATYm9va3NoZWxmQDEyNy4wLjAuMWgDYgAABY5iAA1W2GIAB2re"},{"x-amz-id-2","g2gCZAATYm9va3NoZWxmQDEyNy4wLjAuMWgDYgAABY5iAA1W2GIAB2yG"}],[]}}}
2015-02-02_10:50:02.43917 
2015-02-02_10:50:02.44847 [error] {<<"method=PUT; path=/organizations/acme/sandboxes/0c9d3c73c6755d3dd732eed1d6ccfaf0; status=500; ">>,{error,{throw,{checksum_check_error,5},[{chef_wm_named_sandbox,validate_checksums_uploaded,2,[{file,"src/chef_wm_named_sandbox.erl"},{line,144}]},{chef_wm_named_sandbox,from_json,2,[{file,"src/chef_wm_named_sandbox.erl"},{line,99}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]},{webmachine_decision_core,accept_helper,1,[{file,"src/webmachine_decision_core.erl"},{line,612}]},{webmachine_decision_core,decision,1,[{file,"src/webmachine_decision_core.erl"},{line,517}]},{webmachine_decision_core,handle_request,2,[{file,"src/webmachine_decision_core.erl"},{line,33}]}]}}}

@sixela
Copy link

sixela commented Nov 13, 2017

Hello,

I just encountered the same symptoms with chef-server 12.14.0 and knife ec backup 2.2.3
If you're interested I could gather some more logs

EDIT: even after a cleanse + reconfigure
EDIT2: wrong chef-server version mentionned

@sixela
Copy link

sixela commented Nov 13, 2017

As it turned out, I was uploading /<orgname>/containers/<objects>.json
where each "object" was missing the "containerpath" field.

Not sure it's 100% related to the inital issue then

@devopstales
Copy link

@sixela I have the same problem with chef-server 12.15.0. How you solved this problem?

@sixela
Copy link

sixela commented Feb 1, 2021

Sorry devopstales but after 4y and not working with Chef in a while I don't recall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants